The Efficiency of the ANS Entropy Encoding

01/07/2022
by   Dmitry Kosolobov, et al.
0

The Asymmetric Numeral Systems (ANS) is a class of entropy encoders by Duda that had an immense impact on the data compression, substituting arithmetic and Huffman coding. The optimality of ANS was studied by Duda et al. but the precise asymptotic behaviour of its redundancy (in comparison to the entropy) was not completely understood. In this paper we establish an optimal bound on the redundancy for the tabled ANS (tANS), the most popular ANS variant. Given a sequence a_1,…,a_n of letters from an alphabet {0,…,σ-1} such that each letter a occurs in it f_a times and n=2^r, the tANS encoder using Duda's “precise initialization” to fill tANS tables transforms this sequence into a bit string of length (frequencies are not included in the encoding size): ∑_a∈ [0..σ)f_a·logn/f_a+O(σ+r), where O(σ + r) can be bounded by σlog e+r. The r-bit term is an encoder artifact indispensable to ANS; the rest incurs a redundancy of O(σ/n) bits per letter. We complement this bound by a series of examples showing that an Ω(σ+r) redundancy is necessary when σ > n/3, where Ω(σ + r) is at least σ-1/4+r-2. We argue that similar examples exist for any methods that distribute letters in tANS tables using only the knowledge about frequencies. Thus, we refute Duda's conjecture that the redundancy is O(σ/n^2) bits per letter. We also propose a new variant of range ANS (rANS), called rANS with fixed accuracy, that is parameterized by k ≥ 1. In this variant the integer division, which is unavoidable in rANS, is performed only in cases when its result belongs to [2^k..2^k+1). Hence, the division can be computed by faster methods provided k is small. We bound the redundancy for the rANS with fixed accuracy k by n/2^k-1log e+r.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/23/2018

Entropy bounds for grammar compression

In grammar compression we represent a string as a context free grammar. ...
research
06/16/2023

The Optimality of AIFV Codes in the Class of 2-bit Delay Decodable Codes

AIFV (almost instantaneous fixed-to-variable length) codes are noiseless...
research
07/11/2017

On the letter frequencies and entropy of written Marathi

We carry out a comprehensive analysis of letter frequencies in contempor...
research
09/19/2022

Optimality of Huffman Code in the Class of 1-bit Delay Decodable Codes

For a given independent and identically distributed (i.i.d.) source, Huf...
research
01/05/2022

Understanding Entropy Coding With Asymmetric Numeral Systems (ANS): a Statistician's Perspective

Entropy coding is the backbone data compression. Novel machine-learning ...
research
12/22/2021

On the Reverse-Complement String-Duplication System

Motivated by DNA storage in living organisms, and by known biological mu...
research
10/13/2021

Lossy Compression with Universal Distortion

A novel variant of lossy coding is considered in which the distortion me...

Please sign up or login with your details

Forgot password? Click here to reset