I feel quite close today from breaking this shell, with an interface definition which matches my definition of "clean enough" API. So let's share some preliminary results.
Current streaming interface
The current streaming API exposes the following functions :
void* LZ4_create (const char* inputBuffer);
int LZ4_compress_continue (void* LZ4_Data, const char* source, char* dest, int inputSize);
char* LZ4_slideInputBuffer (void* LZ4_Data);
int LZ4_free (void* LZ4_Data);
First, there is a need to create a data structure which tracks the behavior of the stream.
This is void* LZ4_data, which will be generated by LZ4_create().
It looks fine, but a first issue is that allocation happens inside LZ4_create(). In some circumstances, this may not be acceptable : sometimes allocation must be fully controlled by the host process. For example, standard functions such as malloc() may be unavailable. For this use case have been created 2 more functions :
int LZ4_sizeofStreamState(void);
int LZ4_resetStreamState(void* state, const char* inputBuffer);
It looks fine too, but is unfortunately still not usable for static allocation purpose (on stack, for example).
The second issue is the const char* inputBuffer argument, specified at creation stage, because it will be fixed during the entire compression process. It's not possible to jump between memory segments. This is a major limitation, preventing "in-place" compression of scattered memory segments.
LZ4_compress_continue() looks fine as a function definition, but what is less so is the documented requirement : up to 64KB of previously processed data must stand *before* the data to be compressed. This is a major constraint, typically requiring to prepare data into an intermediate buffer (which in some circumstances, might prove an unaffordable burden).
Then there is LZ4_slideInputBuffer(), which is arguably not at its right responsibility level. Its mission is to copy the last previous 64KB of source data to the beginning of the input buffer. It exists because the content of void* LZ4_data is not accessible.
No function is defined to simply load a dictionary : to achieve that goal, one has to compress a segment using LZ4_compress_continue(), throw the result, and compress the next segment using data accumulated so far.
To summarize, yes it can work, but it's cumbersome. One has to accept the idea that data to compress must be prepared into an intermediate buffer where above conditions can be met.
It's not convenient, and may sometimes not be possible, for example due to allocation limitations.
It also has a negative impact on performance, due to memory copy operations overhead.
New streaming interface definition
While defining the streaming API, I constantly struggled to find the right balance between a "fully automated" API, and a "user-controlled" one. The "fully automated API" would take in charge all kind of workload, generate data respecting the LZ4 Streaming Format with full headers and markers, and take care of allocation for required buffers. On the other hand, the "user-controlled" API would serve the needs for host programs which require full control over resource allocation.
I therefore settled with providing both.
The "user-controlled" version will be part of LZ4 library. It's the simpler one, and only takes care of generating compressed block format, chained or independent. It depends on the host process to pay attention to all memory operations (allocation, position and availability) and to provide its own custom format linking the successive blocks.
The "fully automated" API will be part of a new library, which will be called LZ4S for "LZ4 Streaming". It is basically a user program of the previous API.
In this blog post, we will therefore concentrate on the first API, since it is also an underlying foundation for the 2nd one.
typedef struct { unsigned int table[LZ4_STREAMSIZE_U32]; } LZ4_stream_t;
OK, let's focus on the most important differences with current API.
An LZ4_stream_t structure is exposed. It is provided to give a better control over memory management to the host program. For example, allocation can be static (stack, global) or dynamic, and use any user-defined allocation function.
Obviously, the host program is both in charge of allocating and freeing this memory space. It may also duplicate it "at will", which is a good idea for "static dictionary compression" scenarios.
Cleaning (or resetting) LZ4_stream_t is only a matter of memset() it to zero. It's a requirement to do it once before using this structure.
LZ4_stream_t is described as a table of unsigned int. This is intentional. The real usage of this memory bank remains hidden inside lz4.c. This way, internal variables within the structure cannot be used as a kind of implicit interface contract, allowing future modifications to happen without breaking compatibility with existing programs.
Once the streaming structure is created, you can start to populate it by loading a static dictionary using :
int LZ4_loadDict (void* LZ4_stream, const char* source, int size);
One strong assumption of LZ4_compress_continue() is that previously compressed data is still available. Unfortunately, this might not be the case, and this function cannot guess that.
Should previously compressed data be no longer accessible, for example because you need the space for something else, or you can't guarantee its availability, it's possible to solve that situation :
- You can "save" the relevant data segment from previous block (typically, the last 64KB, it can also be less than that) into a "safe" place. At which point, you will need to indicate this change of position.
Decompression
The current streaming decompression API is defined as follows :
int LZ4_decompress_safe_withPrefix64k (const char* source, char* dest, int compressedSize, int maxOutputSize);
int LZ4_decompress_fast_withPrefix64k (const char* source, char* dest, int originalSize);
Its documented requirement is that previously decompressed data must stand *right before* the memory address where the next data block is going to be decompressed. It works, but implies the creation of a temporary buffer to match the requirement.
The new streaming API get rid of this limitation, allowing previous data to stand anywhere within accessible memory :
int LZ4_decompress_safe_usingDict (const char* source, char* dest, int compressedSize, int maxOutputSize, const char* dictStart, int dictSize);
int LZ4_decompress_fast_usingDict (const char* source, char* dest, int originalSize, const char* dictStart, int dictSize);
The dictionary definition is part of the argument list (const char* dictStart, int dictSize), therefore there is no need for a tracking structure, in contrast with compression.
A first code implementing this API is currently available in the "dev" branch of LZ4 on github. It is still early stage, and probably the right time to be provide your comments should you have any.
[Edit] Added : LZ4_moveDict() as a potential replacement of LZ4_SetDictPos()
[Edit] Added : Paragraph on decompression
[Edit] Modified : move to LZ4_moveDict()
[Edit] Interface definition converges towards LZ4_compress_continue()
[Edit] "streaming" branch now merged into "dev" branch