We are interested in algorithms for lossless compression of sequences of data. The range variant of asymmetric numeral systems (ANS) is such an algorithm, and, like arithmetic coding (AC), it is close to optimal in terms of compression rate dudaAsymmetricNumeralSystems2009. The key difference between ANS and AC is that ANS is last-in-first-out (LIFO), or ‘stack-like’, while AC is first-in-first-out (FIFO), or ‘queue-like’.
ANS comprises two functions, which we denote push and pop, for encoding and decoding, respectively (the names refer to the analagous stack operations). The push function accepts some pre-compressed information (short for ‘message’), and a symbol to be compressed, and returns a piece of compressed information, . Thus it has the signiature
The new compressed state, , contains precisely the same information as the pair , and therefore push can be inverted, giving a decoder mapping. The decoder, pop, maps from back to :
The functions push and pop are inverse to one another, so and .
Encoding and decoding both require knowledge of some model over symbols. We use to denote the alphabet from which symbols
are drawn. We denote the probability mass function of the model. Later we will need to assume that all probability masses are quantized to some precision , i.e. that there exist integers such that for each .
2 Building an encoder/decoder pair
The problem which the ANS encoder solves is
Given a sequence of random variables
Given a sequence of random variables, find an invertible algorithm which will map any sample to a binary message , such that the length of is close to the information content .
Given that the algorithm is invertible, we can reformulate creftypecap 1 in terms of the inverse. This leads to a different, but equivalent, problem statement:
Given a sequence of random variables , find an invertible algorithm which maps a source of bits to a sequence , such that the number of bits observed is close to the information content .
Our description of the details of ANS focuses on the decoding algorithm, because this leads to a more straightforward presentation. We describe the decoder and show that it solves creftypecap 2, then we show how to invert it to form an encoder.
The decoding algorithm we describe will be formed from a series of ANS pop operations, its inverse will be formed from ANS push operations.
2.1 The structure of the message
We use a pair as the data structure for . The element is a stack of unsigned integers of some fixed precision . This stack has its own push and pop operations, which we denote stack_push and stack_pop respectively. The element is an unsigned integer with precision where . We need to be large to ensure our decoding is accurate, and so we also impose the constraint , more detail on how and why we do this is given below. In our implementation we use and . The stack , along with its pop operation, can model the ‘source of bits’ from our problem statement above.
2.2 Constructing the pop operation
Our strategy for performing a decode with pop will be to firstly to extract a symbol from . We do this using a bijective function , which takes an integer as input and returns a pair , where is an integer and is a symbol. Thus pop begins
We design the function so that if , then
where . We give details of and prove eq. 3 below. Note that for small we have , and thus this term is small.
After extracting a symbol using , we check whether is below , and if it is we stack_pop an integer from and move its contents into the lower order bits of , increasing the size of . We refer to this as ‘renormalization’. Having done this, we return the new message and the symbol . The full definition of pop is thus
Renormalization is necessary to ensure that the returned by pop satisfies and is therefore is large enough that eq. 3 holds at the start of any future pop operation. The renorm function has a while loop, which pushes elements from into the lower order bits of until is full to capacity. To be precise:
The condition guarantees that , and thus there can be no loss of information resulting from overflow. We also have
Applying this inequality repeatedly, once for each iteration of the while loop in renorm, we have
where as in the definition of pop.
2.3 Popping in sequence
We now directly tackle the setup described in creftypecap 2, performing a sequence of pop operations to decode a sequence of data. We suppose that we are given some initial ‘message’ , where and is a stack of infinite depth, modelling the ‘source of bits’ from the problem statement.
For , we let , where each pop operation uses the corresponding distribution .
where is the number of bits popped/observed from the stack in the pop step. Applying eq. 6 recursively, for , yields
which can be rearranged to give
since for small and . Thus rANS solves creftype 2: the number of bits observed from is ‘close’ to in the sense that the difference is no more than an additive constant, , plus a term which grows linearly with , but with a very small coefficient . In our implementation we use or less, and thus .
It now remains for us to describe the function and show that it satisfies eq. 3, as well as showing how to invert pop to form an encoder.
2.4 The function
The function must be a bijection, and we aim for to satisfy eq. 3, and thus . Achieving this is actually fairly straightforward. One way to define a bijection is to start with a mapping , with the property that none of the preimages are finite for . Then let be the index of within the (ordered) set , with indices starting at . Equivalently, is the number of integers with and .
With this setup, the ratio
is the density of numbers which decode to , within all the natural numbers less . For large we can ensure that this ratio is close to by setting such that numbers which decode to a symbol are distributed within the natural numbers with density close to .
To do this, we partition into finite ranges of equal length, and treat each range as a model for the interval , with sub-intervals within corresponding to each symbol, and the width of each sub-interval being equal to the corresponding symbol’s probability (see LABEL:fig:interval). To be precise, the mapping can then be expressed as a composition , where does the partitioning described above, and assigns numbers within each partition to symbols (sub-intervals). So
Using the shorthand , and defining
as the (quantized) cumulative probability of symbol ,
illustrates this mapping, with a particular probability distribution, for the range.