Tuesday, December 7, 2010

Multiple level promotions : Experimental Results

At last !
God this was difficult, but finally i've completed a working version of MMC with multiple level promotions, and can now extract some useful statistics from it.
If you remember this earlier post, multiple levels promotions is about sorting the currently tested position immediately at its best level, while the previous version was limited to promoting by only one level per pass.
Being at "level L" ensure that all positions chained from the current one have at least L bytes in common. This is a pretty strong property, which greatly reduces remaining positions to test.

By promoting a position, the algorithm makes the next search for a similar pattern faster. In the end, it means less comparisons are necessary to make a full search across a file.

As usual, in order to properly test improvements, i've measured the number of comparisons required for different window size (search depth) and file types. And here are the results :

                       Fusion  Multiple Improvement 
Calgary 64K Searches   5.54M    4.86M      14%
Firefox 64K Searches   41.9M    35.2M      19%
Enwik8  64K Searches    187M     166M      13%

Calgary 512K Searches  10.9M    9.31M      17%
Firefox 512K Searches  92.1M    78.2M      18%
Enwik8  512K Searches   434M     370M      17%

Calgary 4M Searches    14.2M    12.0M      18%
Firefox 4M Searches     182M     160M      14%
Enwik8  4M Searches     751M     619M      21%

As can be seen, improvement is sensible, but not outstanding. There is a slow ramp-up with window search size, although it seems to decrease for Firefox : this is entirely due to collision hashes, which accounts for approximately 40% of all comparisons, which seems a symptom of a long enough compressed segment within the tar file.
Improvement is also relatively well distributed across all file types, and as a consequence, even enwik8 witness some solid gains for a change.

All in all, it translates into an incrementally faster search algorithm. It does not make it super-fast, but an appreciable improvement nonetheless.
Reaching higher speeds now is likely to require a completely new search methodology.

No comments:

Post a Comment