At last !
God this was difficult, but finally i've completed a working version of MMC with multiple level promotions, and can now extract some useful statistics from it.
If you remember this earlier post, multiple levels promotions is about sorting the currently tested position immediately at its best level, while the previous version was limited to promoting by only one level per pass.
Being at "level L" ensure that all positions chained from the current one have at least L bytes in common. This is a pretty strong property, which greatly reduces remaining positions to test.
By promoting a position, the algorithm makes the next search for a similar pattern faster. In the end, it means less comparisons are necessary to make a full search across a file.
As usual, in order to properly test improvements, i've measured the number of comparisons required for different window size (search depth) and file types. And here are the results :
Fusion Multiple Improvement
Calgary 64K Searches 5.54M 4.86M 14%
Firefox 64K Searches 41.9M 35.2M 19%
Enwik8 64K Searches 187M 166M 13%
Calgary 512K Searches 10.9M 9.31M 17%
Firefox 512K Searches 92.1M 78.2M 18%
Enwik8 512K Searches 434M 370M 17%
Calgary 4M Searches 14.2M 12.0M 18%
Firefox 4M Searches 182M 160M 14%
Enwik8 4M Searches 751M 619M 21%
As can be seen, improvement is sensible, but not outstanding. There is a slow ramp-up with window search size, although it seems to decrease for Firefox : this is entirely due to collision hashes, which accounts for approximately 40% of all comparisons, which seems a symptom of a long enough compressed segment within the tar file.
Improvement is also relatively well distributed across all file types, and as a consequence, even enwik8 witness some solid gains for a change.
All in all, it translates into an incrementally faster search algorithm. It does not make it super-fast, but an appreciable improvement nonetheless.
Reaching higher speeds now is likely to require a completely new search methodology.
No comments:
Post a Comment