Tuesday, February 22, 2011

Huff0 : Early detection offers better speed (v0.7)

 In my earlier release, i skipped the integration of a simple to understand but more difficult to execute feature : early detection of border-cases segments.

This definition regroups both "not compressible segments" and "single character segments". The previous algorithm was correctly detecting these cases, but during the construction of the Huffman tree.

The new algorithm ensures these situations are detected before any operation is done to build the tree. It does so by counting differently, in a new structure created for this purpose. The net result is a speed improvement for files which feature such situations :

Detailed performance assessment :

Huff0 v0.6

Huff0 v0.7

RatioCompress DecodingRatioCompressDecoding
Not compressible

enwik8.7z1.000810 MB/s1.93 GB/s1.000870 MB/s1.93 GB/s
Hardly compressible

win98-lz4hc-lit1.024465 MB/s600 MB/s1.024485 MB/s600 MB/s
audio11.070285 MB/s280 MB/s1.070285 MB/s280 MB/s

enwik8-lz4hc-lit1.290204 MB/s194 MB/s1.290204 MB/s194 MB/s
Lightly Ordered

enwik8-lz4hc-offh1.131197 MB/s184 MB/s1.131197 MB/s184 MB/s

enwik8-lz4hc-ml2.309214 MB/s195 MB/s2.309214 MB/s195 MB/s

office-lz4hc-run3.152218 MB/s202 MB/s3.152218 MB/s202 MB/s
enwik8-lz4hc-run4.959245 MB/s224 MB/s4.959245 MB/s224 MB/s
Ultra compressible

loong275785 MB/s2.93 GB/s275860 MB/s2.93 GB/s

This (mostly) closes the gap with Range0 regarding the detection speed of not compressible segments, which is the main case to consider for real-life situations.

You can download and test the new version here :

No comments:

Post a Comment