This definition regroups both "not compressible segments" and "single character segments". The previous algorithm was correctly detecting these cases, but during the construction of the Huffman tree.
The new algorithm ensures these situations are detected before any operation is done to build the tree. It does so by counting differently, in a new structure created for this purpose. The net result is a speed improvement for files which feature such situations :
Detailed performance assessment :
Huff0 v0.6 | Huff0 v0.7 | |||||
Ratio | Compress | Decoding | Ratio | Compress | Decoding | |
Not compressible | ||||||
enwik8.7z | 1.000 | 810 MB/s | 1.93 GB/s | 1.000 | 870 MB/s | 1.93 GB/s |
Hardly compressible | ||||||
win98-lz4hc-lit | 1.024 | 465 MB/s | 600 MB/s | 1.024 | 485 MB/s | 600 MB/s |
audio1 | 1.070 | 285 MB/s | 280 MB/s | 1.070 | 285 MB/s | 280 MB/s |
Distributed | ||||||
enwik8-lz4hc-lit | 1.290 | 204 MB/s | 194 MB/s | 1.290 | 204 MB/s | 194 MB/s |
Lightly Ordered | ||||||
enwik8-lz4hc-offh | 1.131 | 197 MB/s | 184 MB/s | 1.131 | 197 MB/s | 184 MB/s |
Ordered | ||||||
enwik8-lz4hc-ml | 2.309 | 214 MB/s | 195 MB/s | 2.309 | 214 MB/s | 195 MB/s |
Squeezed | ||||||
office-lz4hc-run | 3.152 | 218 MB/s | 202 MB/s | 3.152 | 218 MB/s | 202 MB/s |
enwik8-lz4hc-run | 4.959 | 245 MB/s | 224 MB/s | 4.959 | 245 MB/s | 224 MB/s |
Ultra compressible | ||||||
loong | 275 | 785 MB/s | 2.93 GB/s | 275 | 860 MB/s | 2.93 GB/s |
This (mostly) closes the gap with Range0 regarding the detection speed of not compressible segments, which is the main case to consider for real-life situations.
You can download and test the new version here :
http://fastcompression.blogspot.com/p/huff0-range0-entropy-coders.html
No comments:
Post a Comment