There is a way to code an element using less than one bit of space. Although the idea might sound weird, it is in fact a proven method since a few decades now, under the name of Arithmetic Coding.

As a simple explanation, let's say that arithmetic coding is no longer about using the optimal number of bits to code an element, as does Huffman, since this method can't do better than one bit per element. We will now define "ranges", within which each element will take a share (such as 2%, 75%, etc.). This method is extremely precise, since each element will cost exactly its entropy, such as 2.53 bits (precisely, not an approximation to 2 or 3), or 0.17 bits.

For more details on how it works, i invite you to read the Wikipedia entry, which is fairly good.

The method, however, has some drawbacks. To begin with, defining the precise share of each element incurs a header cost. This situation is pretty visible with Range0, which is a block-based range coder. In order to compensate the (relatively) large size of the header, i had to accept block sizes of 128KB, which is much more than the equivalent Huff0 (16KB), resulting in a less precise entropy adaptation.

A way to remove the header cost is to not transmit any header. Then, entropy adaptation is done dynamically, by recalculating shares after each update.

Unfortunately, this method also has its costs. This time, speed is the price to pay. A lot of speed actually. Adjusting a single element requires renormalizing all others, which can be too much of a burden for a 256-element alphabet, such as a Byte.

This is where the Binary Arithmetic Coder can help.

In contrast with previous coders, this one is limited to a 2-elements alphabet, 1 or 0. It's basically a yes/no switch. With this restriction in mind, the Binary Arithmetic Coder comes with a crucial advantage : a trivial update mechanism.

A single value is enough to define the switch. It represents the share of 0, while 1 will take the rest. If 0 is very likely, the share will be high. If we believe that, on next update, the probability of 0 will be even higher, then it's enough to increase the value. Otherwise, decrease the value. That's it.

With such a trivial update mechanism now in place, it's time to concentrate on the update rule. The question asked is "by how much should i increase/decrease the value" ? Simple question, but the answer to it will need quite some developments. And we'll keep that for another blog entry.

1. I think that this common description of arithmetic coding

ReplyDeleteusing nested intervals and decimal fractions etc is very misleading.

Imho enumeration-based explanation is easier to comprehend,

especially taking into account that its possible to precisely

compute the number of possible strings with N=N0+N1+...Nm symbols 0..m.

2. Its possible to implement a good model for freqtable in a blockwise coder.

For example, in psrc I managed to even get some compression improvement vs

solid adaptive code by adding freqtables.

3. Actually there's no theoretical difference between binary coding and

alphabet coding. With non-binary alphabet we just do a part of coding

without renormalizations, for speed opt. It may look different, but

in the end we'd still have a linear scan or binary search in the decoder,

so its basically the same as unrolled binary rc.

Hi Shelwien

ReplyDeleteI fully agree with your assessments 1 & 2.

Although i don't fully understand point 3.

"With non-binary alphabet we just do a part of coding

without renormalizations, for speed opt. It may look different, but in the end we'd still have a linear scan or binary search in the decoder,"

Could you please develop ?

In a decoder for an adaptive arithmetic code, we can't just find a symbol

ReplyDeletefrom the code (like it can be done with LUTs for a static code).

Instead, we'd usually have a linear scan

(is it symbol 0? no = update the interval 0..r to n0/n*r..r)

which is an equivalent of unary coding, or binary search

(is it symbol 0..m/2? no = 0..r -> n[m/2]/n*r..r)

which is clearly the same as binary coding.

Of course, either can be implemented with a bitwise rc.

So, the main difference between bytewise coding (like in ppmd) and

bitwise coding is the presence of AC interval renormalization.

The algorithm of symbol coding is the same as if we'd do bitwise

arithmetic coding starting with low=0,range=N.

So the bytewise AC can be interpreted as a speed optimization

of bitwise AC, where the processing of bits within a symbol

is performed by a simplified renorm-free second AC, and then

intervals are combined with the main AC.

Of course its mostly useful as a speed optimization in freq-based

models, where we have to divide freq by total anyway.

But still its a valid option even for fully bitwise ACs.

For example, if renorm ensures that range>=2^24 is always true,

it means that with 12-bit (or less) probability precision

we can do the renorm once per 2 bits, which would be noticeable

for coders with unrolled byte coding loop.

Using a 64-bit rc and 7-bit probabilities, we can even completely

eliminate the rc renorm from byte coding.

(Although vecrc provides a way to do that with normal coders).

However, I do think that the idea of processing some bits with

2nd rc instance then merging the intervals is worth remembering.

> Instead, we'd usually have a linear scan

ReplyDeleteUsing linear scan, could this concept be applied to nibble (4 bits) instead of byte ?

The advantage i see is that 16 probabilities can fit into a single cache line.

Most problems would stay with nibble alphabet too

ReplyDelete(precision issues, extra arithmetics etc),

also using it with a tree would mean 2 pointers per symbol.

My own choice currently is a binary tree with bitmask compression.