It's a few monthes since the initial release of xxHash. The algorithm has almost fullfilled its initial objective, which is to provide a Hash function for error detection fast enough to use within LZ4 streaming interface.
Although the "fundamentals" were fine, a few details were still rough on the edges. The most important "missing feature" was the ability to provide input data in several consecutive blocks. When hashing a large file for example, the allocated buffer might not be large enough to store the whole input within a single block.
In order to accomodate this need, a new version of xxHash has been created, which is BSD licensed.
The way it works is by dividing the job into 3 parts :
XXH32_init() creates the context structure in which intermediate results will be stored.
This structure must be passed as an argument of function XXH32_feed(), which is used to provide input in several consecutive blocks. Any number of blocks is possible, there is no limit.
When all data has been provided, it's time to retrieve the result, using XXH32_result(). This function also takes care of de-allocating the context structure.
A "single pass" function is also provided, both for simplicity (a simple function call is enough) and for performance. The latter is important if the amount of data to hash is small (<100 bytes, also called "small keys"). In this case, since there is no intermediate structure to allocate & maintain, the savings are significant.
To simplify usage and interoperability, there is now a single xxHash version, which is "strong" (meaning it successfully pass all tests from SMHasher test suite). This is possible because the new version is also faster (5.4GB/s on my Core 2 Duo, to be compared with 4.2GB for the older one). The speed difference does no longer justify a "fast" version with lessened distribution guarantee.
The framework is also more extensible, meaning that versions for 64-bits, 128-bits and 256-bits can appear in the future. But for the time being, the focus is really on the 32-bits version. It's designed to be very fast on all kind of 32-bits CPU, including embedded ones (such as ARM), with still the objective to become a companion error checker for LZ4 streaming.
I was playing with xxHash this Saturday and found out a way to improve performance on x86 using SSE4 SIMD instructions by upto 15%. SMHasher reports a top speed of almost 6 GB/s. Unfortunately this changes the hash value because it processes 2 accumulators in interleaved strides of 16 bytes which are combined at the end. 2 accumulators are used to leverage Hyperthreading.
ReplyDeleteI am still working on the changes. In particular I want to enable an SSE2 version and also see if uniform hash values can be derived with or without the optimizations. SMHasher tests report the same results for the modified version as the original. A short report on my initial experiment is here:
http://moinakg.wordpress.com/2013/01/19/vectorizing-xxhash-for-fun-and-profit/