Remember that we said we could simplify the determination of nbBits by a simple threshold comparison, simply by looking at sub-ranges map, such as this one :
Well, in fact, thanks to this map, we know quite a bit more : we know in which exact sub-range fit the current newState.
Remember that we have numbered these sub-ranges, starting with the larger ones :
What we need to know, is the ordered position of symbols into the state table. Since we have 5 sub-ranges, it simply means we also have 5 symbols, directly associated.
And that's it, we know the sub-range, we know the destination state of the encoding process, oldState.
Encoding is therefore just a repeat of this process :
- get Symbol to encode
- look at current state value
- determine nbBits, flush them
- determine sub-Range Id
- look for Symbol position of same Id : you get your next state
If you look into FSE source code, you'll find that the following lines of code perform this job :
If you feel that the correlation between the text and the code is not obvious, it's because it's not (especially the last line).
To reach this compact expression, a few non-trivial tricks have been used, reducing the amount of computation, and more importantly, reducing the amount of memory required to describe the sub-ranges map.
These will be explained into a later post.