Towards Better Compressed Representations

11/07/2019
by   Michał Gańczorz, et al.
0

We introduce the problem of computing a parsing where each phrase is of length at most m and which minimizes the zeroth order entropy of parsing. Based on the recent theoretical results we devise a heuristic for this problem. The solution has straightforward application in succinct text representations and gives practical improvements. Moreover the proposed heuristic yields structure which size can be bounded both by |S|H_m-1(S) and by |S|/m(H_0(S) + ... + H_m-1), where H_k(S) is the k-th order empirical entropy of S. We also consider a similar problem in which the first-order entropy is minimized.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/03/2021

HOLZ: High-Order Entropy Encoding of Lempel-Ziv Factor Distances

We propose a new representation of the offsets of the Lempel-Ziv (LZ) fa...
research
03/11/2020

Entropy of tropical holonomic sequences

We introduce tropical holonomic sequences of a given order and calculate...
research
03/05/2019

Lempel-Ziv-like Parsing in Small Space

Lempel-Ziv (LZ77 or, briefly, LZ) is one of the most effective and widel...
research
02/06/2023

Optimal LZ-End Parsing is Hard

LZ-End is a variant of the well-known Lempel-Ziv parsing family such tha...
research
11/08/2022

Strictly Breadth-First AMR Parsing

AMR parsing is the task that maps a sentence to an AMR semantic graph au...
research
08/03/2022

Court Judgement Labeling Using Topic Modeling and Syntactic Parsing

In regions that practice common law, relevant historical cases are essen...
research
09/27/2020

Entropy versus influence for complex functions of modulus one

We present an example of a function f from {-1,1}^n to the unit sphere i...

Please sign up or login with your details

Forgot password? Click here to reset