A Characterization of Guesswork on Swiftly Tilting Curves

01/27/2018
by   Ahmad Beirami, et al.
0

Given a collection of strings, each with an associated probability of occurrence, the guesswork of each of them is their position in an ordered list from most likely to least likely, breaking ties arbitrarily. Guesswork is central to several applications in information theory: Average guesswork provides a lower bound on the expected computational cost of a sequential decoder to decode successfully the intended message; the complementary cumulative distribution function of guesswork gives the error probability in list decoding; the logarithm of guesswork is the number of bits needed in optimal lossless one-to-one source coding; and guesswork is the number of trials required of an adversary to breach a password protected system in a brute-force attack. In this paper, we consider memoryless string-sources that generate strings consisting of i.i.d. characters drawn from a finite alphabet, and characterize their corresponding guesswork. Our main tool is the tilt operation. We show that the tilt operation on a memoryless string-source parametrizes an exponential family of memoryless string-sources, which we refer to as the tilted family. We provide an operational meaning to the tilted families by proving that two memoryless string-sources result in the same guesswork on all strings of all lengths if and only if their respective categorical distributions belong to the same tilted family. Establishing some general properties of the tilt operation, we generalize the notions of weakly typical set and asymptotic equipartition property to tilted weakly typical sets of different orders. We use this new definition to characterize the large deviations for all atypical strings and characterize the volume of weakly typical sets of different orders. We subsequently build on this characterization to prove large deviation bounds on guesswork and provide an accurate approximation of its PMF.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/16/2021

Binary strings of finite VC dimension

Any binary string can be associated with a unary predicate P on ℕ. In th...
research
05/31/2021

Lower Bounds for the Number of Repetitions in 2D Strings

A two-dimensional string is simply a two-dimensional array. We continue ...
research
10/04/2018

Longest Property-Preserved Common Factor

In this paper we introduce a new family of string processing problems. W...
research
05/23/2018

Joint String Complexity for Markov Sources: Small Data Matters

String complexity is defined as the cardinality of a set of all distinct...
research
11/16/2017

On the Parikh-de-Bruijn grid

We introduce the Parikh-de-Bruijn grid, a graph whose vertices are fixed...
research
11/22/2018

The Statistical Dictionary-based String Matching Problem

In the Dictionary-based String Matching (DSM) problem, a retrieval syste...
research
12/20/2022

A Measure-Theoretic Characterization of Tight Language Models

Language modeling, a central task in natural language processing, involv...

Please sign up or login with your details

Forgot password? Click here to reset