    A tutorial on the range variant of asymmetric numeral systems

This paper is intended to be an accessible introduction to the range variant of Asymmetric Numeral Systems (rANS). This version of ANS can be used as a drop in replacement for traditional arithmetic coding (AC). Implementing rANS is more straightforward than AC, and this paper includes pseudo-code which could be converted without too much effort into a working implementation. An example implementation, based on this tutorial, is available at https://raw.githubusercontent.com/j-towns/ans-notes/master/rans.py. After reading (and understanding) this tutorial, the reader should understand how rANS works, and be able to implement it and prove that it attains a near optimal compression rate.

Authors

01/05/2022

Understanding Entropy Coding With Asymmetric Numeral Systems (ANS): a Statistician's Perspective

Entropy coding is the backbone data compression. Novel machine-learning ...
06/11/2021

Encoding of probability distributions for Asymmetric Numeral Systems

Many data compressors regularly encode probability distributions for ent...
06/29/2019

On Asymmetric Unification for the Theory of XOR with a Homomorphism

Asymmetric unification, or unification with irreducibility constraints, ...
05/19/2020

Asymmetric scale functions for t-digests

The t-digest is a data structure that can be queried for approximate qua...
05/17/2020

Huffman coding is known to be optimal, yet its dynamic version may be ev...
09/11/2018

Deep Asymmetric Networks with a Set of Node-wise Variant Activation Functions

This work presents deep asymmetric networks with a set of node-wise vari...
10/18/2021

Wideband and Entropy-Aware Deep Soft Bit Quantization

Deep learning has been recently applied to physical layer processing in ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

We are interested in algorithms for lossless compression of sequences of data. The range variant of asymmetric numeral systems (ANS) is such an algorithm, and, like arithmetic coding (AC), it is close to optimal in terms of compression rate dudaAsymmetricNumeralSystems2009. The key difference between ANS and AC is that ANS is last-in-first-out (LIFO), or ‘stack-like’, while AC is first-in-first-out (FIFO), or ‘queue-like’.

ANS comprises two functions, which we denote push and pop, for encoding and decoding, respectively (the names refer to the analagous stack operations). The push function accepts some pre-compressed information (short for ‘message’), and a symbol to be compressed, and returns a piece of compressed information, . Thus it has the signiature

 push:(m,x)↦m′. (1)

The new compressed state, , contains precisely the same information as the pair , and therefore push can be inverted, giving a decoder mapping. The decoder, pop, maps from back to :

 pop:m′↦(m,x). (2)

The functions push and pop are inverse to one another, so and .

Encoding and decoding both require knowledge of some model over symbols. We use to denote the alphabet from which symbols

are drawn. We denote the probability mass function of the model

. Later we will need to assume that all probability masses are quantized to some precision , i.e. that there exist integers such that for each .

2 Building an encoder/decoder pair

The problem which the ANS encoder solves is

Problem 1

Given a sequence of random variables

, find an invertible algorithm which will map any sample to a binary message , such that the length of is close to the information content .

Given that the algorithm is invertible, we can reformulate creftypecap 1 in terms of the inverse. This leads to a different, but equivalent, problem statement:

Problem 2

Given a sequence of random variables , find an invertible algorithm which maps a source of bits to a sequence , such that the number of bits observed is close to the information content .

Our description of the details of ANS focuses on the decoding algorithm, because this leads to a more straightforward presentation. We describe the decoder and show that it solves creftypecap 2, then we show how to invert it to form an encoder.

The decoding algorithm we describe will be formed from a series of ANS pop operations, its inverse will be formed from ANS push operations.

2.1 The structure of the message

We use a pair as the data structure for . The element is a stack of unsigned integers of some fixed precision . This stack has its own push and pop operations, which we denote stack_push and stack_pop respectively. The element is an unsigned integer with precision where . We need to be large to ensure our decoding is accurate, and so we also impose the constraint , more detail on how and why we do this is given below. In our implementation we use and . The stack , along with its pop operation, can model the ‘source of bits’ from our problem statement above.

2.2 Constructing the pop operation

Our strategy for performing a decode with pop will be to firstly to extract a symbol from . We do this using a bijective function , which takes an integer as input and returns a pair , where is an integer and is a symbol. Thus pop begins

def pop(, ):
,

We design the function so that if , then

 logs′≥logs−h(x)+log(1−ϵ) (3)

where . We give details of and prove eq. 3 below. Note that for small we have , and thus this term is small.

After extracting a symbol using , we check whether is below , and if it is we stack_pop an integer from and move its contents into the lower order bits of , increasing the size of . We refer to this as ‘renormalization’. Having done this, we return the new message and the symbol . The full definition of pop is thus

def pop(, ):
,
,  renorm(, )
return (, ),

Renormalization is necessary to ensure that the returned by pop satisfies and is therefore is large enough that eq. 3 holds at the start of any future pop operation. The renorm function has a while loop, which pushes elements from into the lower order bits of until is full to capacity. To be precise:

def renorm(, ):
# While  has space for another element from
while :
# Pop an element  from
,  = stack_pop()
# and push  into the lower bits of
=  +
return ,

The condition guarantees that , and thus there can be no loss of information resulting from overflow. We also have

 log(2rt⋅s+ttop)≥rt+logs. (4)

Applying this inequality repeatedly, once for each iteration of the while loop in renorm, we have

 logs≥logs′+rt⋅[\# elements popped from t] (5)

where as in the definition of pop.

2.3 Popping in sequence

We now directly tackle the setup described in creftypecap 2, performing a sequence of pop operations to decode a sequence of data. We suppose that we are given some initial ‘message’ , where and is a stack of infinite depth, modelling the ‘source of bits’ from the problem statement.

For , we let , where each pop operation uses the corresponding distribution .

Now, applying eq. 3 and eq. 5 to the pop gives

 logsn+1≥logsn−h(xn|x1,…,xn−1)+bn+log(1−ϵ) (6)

where is the number of bits popped/observed from the stack in the pop step. Applying eq. 6 recursively, for , yields

 logsN≥logs0−h(x1,…,xN)+N∑n=1bn+Nlog(1−ϵ) (7)

which can be rearranged to give

 N∑n=1bn≤h(x1,…,xN)−Nlog(1−ϵ)+rt≈h(x1,…,xN)+Nϵ+rt (8)

since for small and . Thus rANS solves creftype 2: the number of bits observed from is ‘close’ to in the sense that the difference is no more than an additive constant, , plus a term which grows linearly with , but with a very small coefficient . In our implementation we use or less, and thus .

It now remains for us to describe the function and show that it satisfies eq. 3, as well as showing how to invert pop to form an encoder.

2.4 The function d

The function must be a bijection, and we aim for to satisfy eq. 3, and thus . Achieving this is actually fairly straightforward. One way to define a bijection is to start with a mapping , with the property that none of the preimages are finite for . Then let be the index of within the (ordered) set , with indices starting at . Equivalently, is the number of integers with and .

With this setup, the ratio

 s′s=|{n∈N:n

is the density of numbers which decode to , within all the natural numbers less . For large we can ensure that this ratio is close to by setting such that numbers which decode to a symbol are distributed within the natural numbers with density close to .

To do this, we partition into finite ranges of equal length, and treat each range as a model for the interval , with sub-intervals within corresponding to each symbol, and the width of each sub-interval being equal to the corresponding symbol’s probability (see LABEL:fig:interval). To be precise, the mapping can then be expressed as a composition , where does the partitioning described above, and assigns numbers within each partition to symbols (sub-intervals). So

 ~d1(s):=smod2rp. (10)

Using the shorthand , and defining

 cj:={0if j=1∑j−1k=1pkif j=2,…,I (11)

as the (quantized) cumulative probability of symbol ,

 ~d2(¯s):=ai where i:=max{j:cj≤¯s}. (12)

LABEL:fig:interval

illustrates this mapping, with a particular probability distribution, for the range

.