In previous article, we learned how to determine the number of bits to write, and which bits to write, when encoding a symbol using FSE compression. We are therefore left with the conversion exercise, from newState to oldState (remember that we encode in reverse direction). This is the most important step of the compression process.
Remember that we said we could simplify the determination of nbBits by a simple threshold comparison, simply by looking at sub-ranges map, such as this one :
Well, in fact, thanks to this map, we know quite a bit more : we know in which exact sub-range fit the current newState.
Remember that we have numbered these sub-ranges, starting with the larger ones :
What we need to know, is the ordered position of symbols into the state table. Since we have 5 sub-ranges, it simply means we also have 5 symbols, directly associated.
And that's it, we know the sub-range, we know the destination state of the encoding process, oldState.
Encoding is therefore just a repeat of this process :
- get Symbol to encode
- look at current state value
- determine nbBits, flush them
- determine sub-Range Id
- look for Symbol position of same Id : you get your next state
If you look into FSE source code, you'll find that the following lines of code perform this job :
If you feel that the correlation between the text and the code is not obvious, it's because it's not.
To reach this compact expression, a few non-trivial tricks have been used, reducing the amount of computation, and quite importantly, reducing the amount of memory required to describe the sub-ranges map.
These will be explained in a later post.