In an earlier note, i presented the basics of advanced parsing strategies, a method which allows to grab a few more percentage points of compression ratio in exchange of massive number of CPU cycles. Yes, let's face it, from a pure "compression/speed" perspective, the trade-off is debatable.
However, advanced parsing comes with an excellent property : it keeps the data format intact, meaning it can be decoded normally, by any compression program using the same format. Decoding speed is barely affected, and if it is, it's likely to be positively. In situations where compression time does not matter, but decompression speed does, this trade-off is acceptable.
In a bid to provide a living example of this strategy, i created ZhuffMax, a simple extension to ZhuffHC using the first form of parsing, lazy matching. For this example, i used a recursive level 1 evaluation, meaning that at each byte, i'm also looking for the next byte, select this new one if it is better, and start again the process.
There is definitely a positive impact on compression ratio. At a couple of % points, it's typical of what can be achieved with lazy strategy.
version | threads | Compression Ratio | Speed | Decoding | |
Zhuff | 0.7 | -t2 | 2.584 | 285 MB/s | 550 MB/s |
ZhuffHC | 0.1 | -t2 | 2.781 | 55.6 MB/s | 618 MB/s |
ZhuffMax | 0.1 | -t2 | 2.822 | 35.0 MB/s | 640 MB/s |
The impact on compression speed, on the other hand, is pretty massive : 60% slower.
Note the positive impact on decompression speed, since we are once again promoting larger matches.
Such trade-off is unlikely to be interesting in a symmetric scenario, but there are situations where asymmetry is the rule. You can imagine for example a communication between a large, powerful server and a small embedded sensor : the superior compression rate will translate into less data exchanged, which means less battery used for the sensor. Another typical situation is the "distribution" process, where the file is compressed once, and will be downloaded and decoded many times afterwards. In such situations, the extra % of compression ratio is welcomed, even if it means more compression power.
ZhuffMax does not yet completely deserve its name, since there are stronger (and slower) parsing strategies available (namely Optimal parsing). But if i do find time to implement new parsing concepts, i will implement them into ZhuffMax. No miracle should be expected : even optimal can bring only a few more % compression, so we are already close to maximum.
You can grab ZhuffMax at Zhuff's homepage.
As a sidenote, maybe i should consider merging all these versions of Zhuff together. After all, they all decode the same format...