Continuing on the
previous post analysis of the
lz4demo's current framing format, another side-effect created by this simple format is "
latency".
Since the fixed block size is 8MB, the codec must wait for the first block
to be completely filled before starting any decoding or compressing operation. It effectively defers processing by a few microseconds.
This issue may not seem large, especially if underlying I/O is fast. Nonetheless, not all I/O are fast, and even in such cases, an 8MB "starting time" is bound to be measurably worse than a 256KB one for example.
As a consequence, a framing format with a smaller block size would offer better and smoother processing throughput.
Which
leads us to a last and important issue : independent blocks.
While
this strategy is good for simplicity and multi-threading, it's bad for
compression : it translates into a worsened compression ratio on the first
64KB of each block.
With block sizes of 8MB, this effect is
negligible (affecting compression ratio by less than 0.1%). However, the
smaller the block size,
the worse the side-effect. With small block sizes, this effect can no longer be neglected.
Therefore, should the blocks remain independent ?
Indeed. By
making the next block depending on the previous one,
it nullifies the problem of worsened compression ratio. But it
also makes it impossible to decode a compressed block independently,
with negative consequences on multi-threading and partial decoding
capabilities.
Is that really an issue ? Well, it really depends on the use case.
In many circumstances,
such as simple file compression or stream compression, it does not matter,
since data is encoded and decoded sequentially anyway. Throwing away the
capability to multi-thread decompression seems bad, but in fact, most of the time, I/O is unable to
cope with LZ4 decoding speed (around 1GB/s). So a single decoding thread is enough to handle almost any I/O load.
Since there is little need for partial decoding, nor for
multithreaded decoding, the compression ratio gain looks more useful.
There is just a little remaining problem :
While the decoding function will need few adaptation to handle this new use case, most of the complexity being located into the buffer manager, the compression function on the other hand has to be adapted.
While each block were independant, compression could start with a pristine clean reference table.
But
with sequentially dependant blocks, the initialization becomes more
complex : the previous 64K needs to be copied in front of the next
block, and then loaded/hashed into the reference table, before starting compression. It obviously costs CPU and time.
A variant is to just "translate" the references already loaded into
the table as a result of compressing the previous block, but it's limited to "single thread" scenario.
OK, but now that we can reference data from previous blocks, how far should we go ? The natural maximum distance is the "copy window size". This size is, by default, 64KB for LZ4. But it could happen that the receiving end of the compressed stream has not enough memory to store that much data. In such case, the compressor must be careful in not using references beyond the memory capacity of the receiver. In other words, it must deliberately discard long-distance copy operations.
Should such a use case be part of the generic framing format or not ?
My answer would be : it's okay, as long as an easy solution can be found.
How could that work ? Once again, let's focus on the decoder side.
I'll imagine a controller with only 4K memory available as buffer.
A simple way to handle such case is by dividing this space into 2 parts : 2K for the "previous block", and 2K for the "block to decode". So we end up with :
- Block size = 2K = memory limit / 2
- Maximum copy reference = lower limit of previous block
Obviously, there are other possibilities, such as cutting data into even small parts (for example 1K blocks) and having a back reference of up to 3 previous blocks. But as a first approximation, it seems these variant will provide almost equivalent results while being more complex.
This situation can be summarized with a simple rule : never reference data beyond one block distance.
With only this simple rule in place, it seems the default LZ4 framing format could be made compatible even with environments with severely limited RAM, provided the encoder selects a suitable block size.