## Thursday, January 5, 2012

### Binary Arithmetic Coder

There is a way to code an element using less than one bit of space. Although the idea might sound weird, it is in fact a proven method since a few decades now, under the name of Arithmetic Coding.

As a simple explanation, let's say that arithmetic coding is no longer about using the optimal number of bits to code an element, as does Huffman, since this method can't do better than one bit per element. We will now define "ranges", within which each element will take a share (such as 2%, 75%, etc.). This method is extremely precise, since each element will cost exactly its entropy, such as 2.53 bits (precisely, not an approximation to 2 or 3), or 0.17 bits.

For more details on how it works, i invite you to read the Wikipedia entry, which is fairly good.

The method, however, has some drawbacks. To begin with, defining the precise share of each element incurs a header cost. This situation is pretty visible with Range0, which is a block-based range coder. In order to compensate the (relatively) large size of the header, i had to accept block sizes of 128KB, which is much more than the equivalent Huff0 (16KB), resulting in a less precise entropy adaptation.

A way to remove the header cost is to not transmit any header. Then, entropy adaptation is done dynamically, by recalculating shares after each update.

Unfortunately, this method also has its costs. This time, speed is the price to pay. A lot of speed actually. Adjusting a single element requires renormalizing all others, which can be too much of a burden for a 256-element alphabet, such as a Byte.

This is where the Binary Arithmetic Coder can help.
In contrast with previous coders, this one is limited to a 2-elements alphabet, 1 or 0. It's basically a yes/no switch. With this restriction in mind, the Binary Arithmetic Coder comes with a crucial advantage : a trivial update mechanism.

A single value is enough to define the switch. It represents the share of 0, while 1 will take the rest. If 0 is very likely, the share will be high. If we believe that, on next update, the probability of 0 will be even higher, then it's enough to increase the value. Otherwise, decrease the value. That's it.

With such a trivial update mechanism now in place, it's time to concentrate on the update rule. The question asked is "by how much should i increase/decrease the value" ? Simple question, but the answer to it will need quite some developments. And we'll keep that for another blog entry.

1. 1. I think that this common description of arithmetic coding
using nested intervals and decimal fractions etc is very misleading.
Imho enumeration-based explanation is easier to comprehend,
especially taking into account that its possible to precisely
compute the number of possible strings with N=N0+N1+...Nm symbols 0..m.

2. Its possible to implement a good model for freqtable in a blockwise coder.
For example, in psrc I managed to even get some compression improvement vs

3. Actually there's no theoretical difference between binary coding and
alphabet coding. With non-binary alphabet we just do a part of coding
without renormalizations, for speed opt. It may look different, but
in the end we'd still have a linear scan or binary search in the decoder,
so its basically the same as unrolled binary rc.

2. Hi Shelwien

I fully agree with your assessments 1 & 2.
Although i don't fully understand point 3.

"With non-binary alphabet we just do a part of coding
without renormalizations, for speed opt. It may look different, but in the end we'd still have a linear scan or binary search in the decoder,"

3. In a decoder for an adaptive arithmetic code, we can't just find a symbol
from the code (like it can be done with LUTs for a static code).
Instead, we'd usually have a linear scan
(is it symbol 0? no = update the interval 0..r to n0/n*r..r)
which is an equivalent of unary coding, or binary search
(is it symbol 0..m/2? no = 0..r -> n[m/2]/n*r..r)
which is clearly the same as binary coding.
Of course, either can be implemented with a bitwise rc.

So, the main difference between bytewise coding (like in ppmd) and
bitwise coding is the presence of AC interval renormalization.
The algorithm of symbol coding is the same as if we'd do bitwise
arithmetic coding starting with low=0,range=N.
So the bytewise AC can be interpreted as a speed optimization
of bitwise AC, where the processing of bits within a symbol
is performed by a simplified renorm-free second AC, and then
intervals are combined with the main AC.
Of course its mostly useful as a speed optimization in freq-based
models, where we have to divide freq by total anyway.

But still its a valid option even for fully bitwise ACs.
For example, if renorm ensures that range>=2^24 is always true,
it means that with 12-bit (or less) probability precision
we can do the renorm once per 2 bits, which would be noticeable
for coders with unrolled byte coding loop.
Using a 64-bit rc and 7-bit probabilities, we can even completely
eliminate the rc renorm from byte coding.
(Although vecrc provides a way to do that with normal coders).

However, I do think that the idea of processing some bits with
2nd rc instance then merging the intervals is worth remembering.

4. > Instead, we'd usually have a linear scan

Using linear scan, could this concept be applied to nibble (4 bits) instead of byte ?
The advantage i see is that 16 probabilities can fit into a single cache line.

5. Most problems would stay with nibble alphabet too
(precision issues, extra arithmetics etc),
also using it with a tree would mean 2 pointers per symbol.
My own choice currently is a binary tree with bitmask compression.