In this serie explaining the behavior of Finite State Entropy, I find it interesting to make a quick comparison with Huffman entropy. It's a direct result from the description of its decoding principles, nonetheless I believe it's useful to clearly establish the equivalence since it will be illustrative for future articles.
To better grasp the similarities, let's have a simple example. We'll use the following probabilities, suitable for an Huffman tree :
A : 50% ; B : 25% ; C : 12.5% ; D : 12.5%
This tree only needs 8 values to be properly represented, and gives the following linear representation :
This representation uses the "high bit first" methodology, which means we are starting our node travel from the highest bit.
If it's 0, then we have an A. We stop there, shift left the register by one bit and load a new one at the bottom.
If the highest bit is 1, we now we are not done, and need to read the second bit. If this second bit is 0, we know we have a B. If not we continue reading. And so on.
The decision tree looks as follows :
But we could have done differently : we could have used "low bit first". In this case, the linear representation of the tree looks like this :
Which seems a lot messier, but it's not : we simply reversed the bit order of the decision tree :
The 2 decision trees are therefore completely equivalent.
Now, look again at the "low bit first" linear representation. It's important to state that this is what FSE would produce, but it also embeds a few additional properties :
- The "newStateBaseline" does not need to be calculated and stored into the decoding table : it can be automatically determined from the state itself : newStateBaseline = currentState & ~mask(nbBits);
- nbBits is stable for a given character : for A, it's always 1, for B, it's always 2, etc. It simplifies the decoding table, which does not need to store this information.
So basically, an Huffman-low_bit_first table looks the same as an FSE table, and it guarantees a stable number of bits per character, and newStateBaseline is automatically deduced from remaining bits.
There is, however, a small but important difference.
An Huffman-low_bit_first would shift right the current state, and load the next bit(s) high. In contrast, the FSE decoder keeps them, and only reload the low bits. This creates a dependency on future state : basically, once you've reached state 2 with FSE for example, you can already guarantee that you won't see a D before you see a C. Which brings the question : how would the encoder select the right state, since it depends on future characters yet to be decoded ?
The answer : the encoder will work in backward direction, encoding characters from last to first, solving dependencies along the way.
To be continued...