RealTime Data Compression

100 billion is really huge, one will need a lot of...

2024-09-17T09:38:40.891+02:00

100 billion is really huge, one will need a lot of RAM to test that.
There is also the important topic of filtering the input, and ensure that it does not feature identical strings, for the test to even be useful.

That being said, the Massive Collision tester, bundled as part of the xxhash project, could get you close to that. It's able to generate 100 billions binary objects which are guaranteed to be different, and will test and report the collision rate.

XXH128 has been tested with this tool by the way, and, as expected, it produced 0 collisions.
https://github.com/Cyan4973/xxHash/wiki/Collision-ratio-comparison

Hi, my use case for strings which are of 100bytes ...

2024-09-17T06:13:45.665+02:00

Hi, my use case for strings which are of 100bytes maximum. but no collision is very imp, I am making a decision to use xx128 so that collision possibility is ruled out. Any tool with which i can test around 100 billion string generation and check if collision occurs ?

This comment only applies to linked blocks. If you...

2022-05-12T18:59:00.013+02:00

This comment only applies to linked blocks.
If you go for independent blocks, this is much simpler, you only need X KB for the largest block size you wish your implementation to handle.

If you go for linked blocks, this is more complex.
I would recommend you get a firm understanding of streaming of independent blocks first. Then you can add more complexity.

Yann, You have written this comment 6 years ago in...

2022-05-12T18:56:03.471+02:00

Yann, You have written this comment 6 years ago in this blog:
"Your choices will be limited though.
I would recommend a 64 KB block size.
If blocks are dependents, each block can use up to 64KB of previous block to better compress, the decoder will need space for 2 blocks at all times, hence 128 KB."

I am not able to understand why there is need of 128KB buffer(for compressor or decompressor). Since history window size is 64 KB. If current position is half way of current block then only i need to look from half way of previous block to current position for match.
Am I correct?

Sorry if you find these questions silly. And i am really thankful to you for your active response.

Use the LZ4 Frame format. The Block format is mere...

2022-05-12T17:45:46.977+02:00

Use the LZ4 Frame format.
The Block format is merely an internal component of the Frame format. It makes sense to use it in private environment, for which the frame specification is overkill.
But if you need any kind of interoperability, or any advanced mechanism such as streaming, the Frame format is more appropriate.
You can decrease the internal block size of the frame to 64 KB, which should be easier to handle for hardware implementations. Even then, 64 KB is a maximum, so any implementation is free to split input into blocks smaller than that.
Blocks can be independent or linked. Independent blocks are simplest. Linked blocks are more efficient though.

Continue... Since in asic, resources are very less...

2022-05-12T17:09:09.652+02:00

Continue...
Since in asic, resources are very less. Say memory buffer in few MBs( even in KBs).
What will you recommend the best way to implement LZ4.
Considering input file size can range from KBs to GBs. And decoder can be third party system. So requirement is only standard algorithm need to implement.

Yann, Thanks for the quick response. Regarding 2nd...

2022-05-12T16:59:39.126+02:00

Yann, Thanks for the quick response.
Regarding 2nd question, in block format specification nowhere written that add compressed block size at the start of compressed block data.
Since I need to implementing Lz4 in ASIC. And lz4 compressed data will be output from asic io pins. If there will be multiple output compressed blocks ( which will be transferred one after other in series through io bus), what specifications say about how to pass the block size information to other system(decoder).
What wil be the output format.
Will it be same as data blocks of frame format specification.
Hope I am able to explain my doubt.

1) Nope. If I remember correctly, only the compres...

2022-05-11T19:02:07.515+02:00

1) Nope. If I remember correctly, only the compressed block itself is used to calculate the hash.
2) If you have a format using multiple blocks, you need a way to know the size of each block. You need either the exact compressed size, or the exact decompressed size. And depending on which one you know, the decompression function needed is different. The simples solution is to know the compressed size.

Hi Cyan, I have 2 questions. 1. Does blocksize fi...

2022-05-11T17:56:52.174+02:00

Hi Cyan,

I have 2 questions.
1. Does blocksize field included in calculation of block checksum in frame format?

2. If I implement only block format, then whole input file will compressed and represented in one output compressed block or can formatted in multiple output compressed blocks.
If output is multiple compressed block then how to find where next block is starting. Is it required to add block size at start of each block as it is in Frame format .

-Wconversion was intentionally left out, as I do n...

2022-03-08T09:17:51.039+01:00

-Wconversion was intentionally left out, as I do not recommend to add this flag for an existing project which has previously been developed without it. It will generate way too many warnings, inviting some kind of mindless "mass type-casting" which is not good for code clarity nor intention.

The situation is different for a brand new code base starting from scratch. There it's possible, and even recommended, to use -Wconversion, since it will be possible to take care of these warnings one by one, and invite good questions : why do I need this conversion ? Is it really needed ? Am I using the right type at this place ?
These are subtle questions, and they are best answered when focus is possible. i.e. not during a mass-casting exercise.

1. you forgot -Wconversion in Summary. 2. -Wconver...

2022-03-08T09:04:53.384+01:00

1. you forgot -Wconversion in Summary.
2. -Wconversion doesn't work latest GCC version

Yes, inlining + size as a compile-time constant wi...

2022-03-04T02:21:04.835+01:00

Yes, inlining + size as a compile-time constant will always improve speed.
Even for XXH3.
The relative speed boost will be a bit less important for XXH3 (than XXH64 or XXH32), because it's better optimized for small data, but it will still be significant.

Hi Cyan Its always a pleasure to read your blogs a...

2022-03-04T00:01:48.730+01:00

Hi Cyan
Its always a pleasure to read your blogs and follow along. I was doing some experiments with your amazing XXH3 implementation.
I had one small question

In your blog you mention that with inlining and passing the length of the key to hash as a compile time constant, you achieve speeds closer to a dedicated hash function(which is around 3-4x faster) and hence there is no need to maintain a dedicated hash function. But the code of xxhash3 has dedicated hash functions for 1-3 bytes 4-9 bytes and so on.
Does that mean that even if we do not have inlining we can get good performance from xxhash3 for small key sizes ? Or by enabling inling is there supposed to be a 3-4x improvement ?

Hi Cyan Its always a pleasure to read your blogs a...

2022-03-04T00:01:22.888+01:00

zstd compression is possible within Linux kernel, ...

2021-11-30T17:11:30.207+01:00

zstd compression is possible within Linux kernel, which translates into very strict memory control requirements. So yes, it's possible, but it's important to pay attention to memory usage, hence the more memory hungry modes will be out of the picture.

is zstd compression is applicable for embedded sys...

2021-11-30T13:20:28.896+01:00

is zstd compression is applicable for embedded systems?

Yes, that's an interesting design worth trying...

2021-03-01T21:51:44.360+01:00

Yes, that's an interesting design worth trying anyway.

The xxHash repository also contains a large collision test, that would help to ensure the final design is resilient to basic collision expectations.

Varying the seeds across the blocks would be suffi...

2021-03-01T20:37:30.884+01:00

Varying the seeds across the blocks would be sufficient to differentiate them. I was also thinking that the final mixer would be XXH3_128(concat(H1, H1, ...)), which would also make the block hashes position-dependent.

In my use case this isn't for parallelism purposes as much as pipelining, where I want to process a large number of B1, then a large number of B2, and the number of pending results is large enough that the size of the XXH3 streaming state is prohibitive.

If I understand your question correctly, you would...

2021-03-01T19:44:32.879+01:00

If I understand your question correctly, you would divide a content into blocks (B1, B2, B3, etc), calculate a checksum on each block, and then just combine them at this end (as a xor or add operation).

The issue here is that this construction cannot detect changes in block order.
For example, if you have on one side (B1, B2, B3, etc.) and on the other side (B2, B3, B1, etc.), then they both get the same resulting hash. This is considered too weak.

This can be solved, by a stronger mixing stage, which ensures that (B1, B2) doesn't result in the same final hash than (B2, B1).

This is, by the way, how XXH3 works internally. One could already cut input into "blocks" (1 KB by default, can be extended by using custom seeds), compute them in parallel, and then combine them. The combine step though must be done serially, which ensures that each block must be provided in the correct order.

While this design is already possible, in practice, the reference library doesn't employ it, because the speed from a single thread is considered fast enough already (especially with vector instructions). But on weaker computing units, it could be a thing.

How much quality loss would there be from computin...

2021-03-01T15:30:32.934+01:00

How much quality loss would there be from computing a few (N) XXH3 128-bit hash values over portions of the original data (each with a separate seed), and then combining the result with XXH3? It seems like this a similar structure to the 4 internal state accumulators within a single XXH3 hash computation, except with a two-stage mixer at the end (N XXH3_128bits_digest() composed with 1 XXH3_128bits() rather than 1 XXH3_128bits_digest()).

What I mean is that if lets_call_division_by_somet...

2021-02-26T20:19:58.234+01:00

What I mean is that if lets_call_division_by_something() never gets passed a 0, there's still a warning. And if some piece of code calls lets_call_division_by_something() with a compile-time expression that evaluates to 0, you don't know where it is.

What would be much more useful would be to have a way of expressing the constraints so that the compiler would show the line of code where the bad value originates, along with the call sequence to the function setting the constraint. Compilers like clang/llvm and gcc have the information internally to be able to do this kind of thing, but there doesn't seem to be a way to take advantage of it in a normal build. Perhaps it is a task better suited to static analysis tools like scan-build.
https://clang-analyzer.llvm.org/installation

> Do you have any solution to the warning in &q...

2021-02-25T17:39:18.239+01:00

> Do you have any solution to the warning in "lets_call_division_by_something()"?

What do you mean ?
To me, this behaves as expected.
The situation is equivalent to this example : https://godbolt.org/z/4fsq6T ,
in which the variable has an undetermined value,
hence it cannot guarantee the condition,
hence it gets flagged with a warning
and the warning is correctly positioned at the place where the undetermined variable is passed as parameter.

Yeah, you do have to follow the error chain to the...

2021-02-25T15:45:46.169+01:00

Yeah, you do have to follow the error chain to the source line. Do you have any solution to the warning in "lets_call_division_by_something()"? I find most projects have abstractions so often have to go up the call stack a couple levels to find the invalid constant expression. I agree there can be downsides to lto, so the ideal would be the ability to enable specific parts of it like dead code elimination and constant propagation. I plan to do some comparisons between gcc and clang/llvm since I've mainly used gcc.

I remember now : The problem of inline functions i...

2021-02-24T23:26:04.709+01:00

I remember now :
The problem of inline functions is that
when the condition is not respected,
it indeed triggers a warning at compile time,
but it locates the warning _inside the inlined function_,
which unfortunately does not tell much about the caller using incorrect parameter
making debugging more difficult.

This is where bundled macro have an advantage :
they locate the problem at the position of the caller.

Yes, that's a very good point Ralph. TBH, I&#...

2021-02-24T21:02:23.652+01:00

Yes, that's a very good point Ralph.

TBH, I'm not a great fan of LTO for binary generation, for reasons too long to detail in this short answer,
but as an extender of compile-time verifications, it can have a role.

I will have to experiment a bit more about it to get a better sense of its capabilities and limitations.