In an earlier post, i've presented some obvious statements about cache properties. In a nutshell, cache make tremendous difference in data access, but the fastest the cache, the smallest.
Now we are going to look more closely to its behavior.
The first important concept to keep in mind is the cache line. A memory cache does not keep in mind the last randomly accessed bytes. Data is considered structured into lines, which size depends on the CPU.
The most standard line size is 64 Bytes, that you'll find in all recent PC.
It means that whenever you are accessing a single byte in memory, you are not just loading one byte into cache, but a full line of 64 Bytes.
This is of fundamental importance. What it means is that if you ask for data at position 1, CPU will load the first line. Then, when position 2 will be needed, this data will already be in the cache, so will be position 3, and so on, up to position 63.
Given the very large difference in access time between L1 cache (3-7 cycles) and memory (100-200 cycles), it makes access to already loaded data almost free.
This is a property that any performance-oriented algorithm should keep in consideration. It requires adapted data structure.
Let's have a look at MMC and BST. They both store 2 pointers per position. These 2 pointers are stored next to each other, which means that when i will ask one, i will be able to access the other one "almost" for free.
Now, what about the data it is pointing to ?
Well, here it is very likely to fall outside of line cache, since only 8 positions are part of the same line : the next jump is almost certainly way outside of this range.
Taking advantage of cache line requires a completely different data structure. We should try to get as much hit as possible into the already loaded line, for example by keeping all level positions next to each other.
As said, this is a very different data architecture, and therefore would require a complete rewrite of search algorithm. But then, i'm here to learn...