# The Price of Uncertain Priors in Source Coding

We consider the problem of one-way communication when the recipient does not know exactly the distribution that the messages are drawn from, but has a "prior" distribution that is known to be close to the source distribution, a problem first considered by Juba et al. We consider the question of how much longer the messages need to be in order to cope with the uncertainty about the receiver's prior and the source distribution, respectively, as compared to the standard source coding problem. We consider two variants of this uncertain priors problem: the original setting of Juba et al. in which the receiver is required to correctly recover the message with probability 1, and a setting introduced by Haramaty and Sudan, in which the receiver is permitted to fail with some probability ϵ. In both settings, we obtain lower bounds that are tight up to logarithmically smaller terms. In the latter setting, we furthermore present a variant of the coding scheme of Juba et al. with an overhead of α+ 1/ϵ+1 bits, thus also establishing the nearly tight upper bound.

## Authors

• 19 publications
• 19 publications
09/01/2019

### Round Complexity of Common Randomness Generation: The Amortized Setting

We study the effect of rounds of interaction on the common randomness ge...
11/06/2018

### Interactive coding resilient to an unknown number of erasures

We consider distributed computations between two parties carried out ove...
02/02/2021

### Broadcast Rate Requires Nonlinear Coding in a Unicast Index Coding Instance of Size 36

Insufficiency of linear coding for the network coding problem was first ...
09/24/2019

### Private Aggregation from Fewer Anonymous Messages

Consider the setup where n parties are each given a number x_i ∈F_q and ...
05/24/2011

### Minimax Policies for Combinatorial Prediction Games

We address the online linear optimization problem when the actions of th...
01/26/2019

### Generalized Alignment Chain: Improved Converse Results for Index Coding

In this work, we study the information theoretic converse for the index ...
08/20/2018

### On the compression of messages in the multi-party setting

We consider the following communication task in the multi-party setting,...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In a seminal work, Shannon [1] considered the problem of how to encode a message so that it can be transmitted and decoded reliably across a channel that introduces errors. Shannon’s contribution in that work was two-fold: first, he identified how large any encoding of messages would need to be in the absence of noise – the “source coding” problem – and then identified the optimal length for encodings that can be decoded in spite of errors introduced by the channel. The difference between these two lengths – the number of extra, redundant bits from the standpoint of source coding – may be viewed as the “price” of noise-resilience.

Such work in information theory has all but settled the basic quantitative questions of noise-tolerant transmission in telecommunications. In natural communication, however, errors frequently arise not due to any kind of interference, but instead due to a lack of shared context. In the interest of understanding why natural language is structured so that such errors can occur and how they might be addressed, Juba et al. [2] introduced a model of communication with uncertain priors. This is a variant of the source coding problem in which the sender and receiver do not agree on the source distribution that the messages are drawn from. Thus, errors arise because the sender and receiver do not agree on which messages should be considered more likely, and should therefore receive shorter codewords so as to minimize the expected encoding length. We note that this problem also has applications to adaptive data compression, in which parties use their empirical observations of message frequencies to encode messages. This would be useful since the distribution over messages generally changes over time for a variety of reasons; this clearly occurs in natural language content, for example, as new words are introduced and old ones fall out of use. Since different parties on a network will in general observe different empirical distributions of messages, they must tolerate some (limited) inconsistency about the relative frequency of the different messages.

In the model of Juba et al., “uncertainty” about the priors is captured by the following kind of distance between the priors used by the sender and receiver. Namely, if the sender has a source distribution and the receiver expects a distribution , then we say that and are -close when for every message . Juba et al. then presented a scheme (building on the coding technique of Braverman and Rao [3]) in which every source can be encoded by bits so that every decoder using an -close prior will recover the message correctly. Thus, the Juba et al. scheme uses bits beyond the bits achieved by standard solutions to the basic source coding problem.

Juba et al. had noted that bits is necessary for such “uncertain priors” coding (at least for a prefix-coding variant), and asked whether the redundancy could be reduced to this lower bound. In this work, we address this question by showing that it cannot. Indeed, in the original error-free coding setting, we show that the redundancy must be at least up to terms of size , and hence the “price of uncertainty” is, up to lower-order terms, an additional bits. We also consider the variant of the problem introduced by Haramaty and Sudan [4] in which the decoding is allowed to fail with some positive probability . We also nearly identify the price of uncertainty in this setting: we note that the scheme of Juba et al./Braverman-Rao can be modified to give an uncertain priors coding of length , and show a lower bound on the redundancy of when . The price of uncertainty when error is allowed is thus essentially reduced to bits.

We obtain our results by reducing a one-way communication complexity problem to uncertain prior coding: Consider the problem where Alice receives as input a message from a domain of size roughly , Bob receives as input a set of size roughly containing Alice’s message, and Alice’s task is to send the message to Bob. For our reduction, we note that there is a low-entropy distribution with most of its mass on Alice’s input and an -close family of distributions corresponding to the sets that essentially capture this problem. Thus, a lower bound for the communication complexity problem translates directly to the desired lower bound on the redundancy since the entropy of is negligible compared to the overhead. Our lower bounds for the two variants, error-free and positive error, of the uncertain priors coding problem are then obtained from lower bounds for this same problem in the analogous variant of the one-way communication complexity model. Specifically, we obtain a lower bound in the error-free model and a lower bound in the model with error, yielding our main results.

### 1.1 Aside: why not the relative entropy/KL-divergence?

A common misconception upon first learning about the model of Juba et al. [2] is that (a) the problem had already been solved and (b) the correct overhead is given by the relative entropy (or “KL-divergence”), . Of course, our lower bounds imply that this is incorrect, but it is useful to understand the reason. Indeed, our problem is somewhat unusual in that the relative entropy is essentially the correct answer to a few similar problems, including (i) the problem where the sender does not know the source distribution , only an approximation and (ii) the problem where the communication is two-way (and the receiver can tell the sender when to stop)—this variant essentially follows from the work of Braverman and Rao [3]. The difference is that in our setting unlike (i), the sender does not know and unlike (ii), has no way to learn anything about it, apart from the fact that it is -close to . The sender’s message must simultaneously address decoding by all possible -close priors using only this knowledge, hence the connection to a worst-case communication complexity set-up.

We also note that the problem we consider is not addressed by the universal compression schemes of Lempel and Ziv [5]. Lempel and Ziv’s compression schemes provide asymptotically good compression of many messages from, e.g., a Markovian source. By contrast, the problem we consider here concerns the compression of a single message. The question of compressing multiple messages is still interesting, and we will return to it in Section 4.

## 2 The Model and Prior State of the Art

We now recall the model we consider in more detail and review the existing work on this model. Our work concerns the uncertain priors coding problem, originally introduced by Juba et al [2]. In the following we will let denote a set of messages and

denote the probability distributions on

.

###### Definition 1 (α-close [2])

We will say that a pair of distributions are -close for if for every ,

 1αQ(m)≤P(m)≤αQ(m).

In uncertain priors coding, one party (“Alice”) wishes to send a message drawn from a source distribution using a one-way (noiseless) binary channel to another party (“Bob”) who does not know exactly. Bob does know a distribution , however, that is guaranteed to be -close to , where Alice in particular knows . We assume that Alice and Bob share access to an infinitely long common random string . Our objective is to design an encoding scheme that, regardless of the pair of “prior” distributions and , enables Bob to successfully decode the message using as short a transmission as possible:

###### Definition 2 (Error-free uncertain priors coding [2])

An error-free uncertain priors coding scheme is given by a pair of functions and such that for every , , , , and -close , if , then . When is chosen uniformly at random and is chosen from , we will refer to as the encoding length of the scheme for and .

The key quantity we focus on in this work is a measure of the overhead introduced by uncertain priors coding, as compared to standard source coding where .

###### Definition 3 (Redundancy)

For , let denote the optimal one-to-one encoding length for , i.e., if is the set of one-to-one maps from to the binary strings, where the expectation is over drawn from and denotes the length of the encoding . The redundancy of an uncertain priors coding scheme for a given is given by the maximum over of the encoding length of for and minus the optimal one-to-one encoding length for . That is, the redundancy is

 maxP∈Δ(M){Em,R[|E(m,α,R,P)|]−ℓ(P)}

where is drawn from and is chosen uniformly at random in the expectation.

As an uncertain priors coding scheme gives a means to perform standard source coding (again, by taking , even when ) it follows that the redundancy is always nonnegative.

Huffman codes [6], for example give a means to encode a message in this setting using on average no more than one more bit than the entropy of , . It is well known that this coding length is essentially optimal if we require to be a prefix code (i.e., self-delimiting). But, it is possible to achieve a slightly better coding length when the end of the message is marked for us. Elias [7] observed that the entropy is an upper bound for the cost of this problem as well (which he credits to Wyner [8]), whereas Alon and Orlitsky [9] showed that the entropy of exceeds the optimal one-to-one encoding length by at most

. Furthermore, they observed that this bound is approached by a geometric distribution as the parameter approaches zero (and it is thus essentially tight).

Juba et al. [2], using a coding technique introduced by Braverman and Rao [3], exhibited an error-free uncertain priors coding scheme that achieved a coding length of the entropy plus a function of :

###### Theorem 4 (Juba et al. [2])

There is an error-free uncertain priors coding scheme that achieves encoding length at most .

Roughly, the scheme achieving this bound proceeds as follows. The shared random string is interpreted as specifying, for each possible message , an infinite sequence of independent random bits. Each prefix of the string for is taken as a possible encoding of : that is, to encode a message , Alice computes some sufficiently large index , and transmits the first bits of the common random string associated with to Bob. Specifically, if Bob’s prior is known to be -close, Alice chooses so that no other message that shares an -bit encoding with

or greater under . It then follows immediately that since Bob’s prior is -close to , is the unique message of maximum likelihood under with the given -bit encoding. So, it suffices for Bob to simply output this maximum-likelihood message.

Juba et al. also noted that a lower bound on the redundancy of is easy to obtain (in a prefix-coding model). This follows essentially by taking a distribution over possible source distributions that are -close to a common distribution , and noting that the resulting distribution over messages has entropy . Our first main theorem, in Section 3.1, will improve this lower bound to , so that it nearly matches the upper bound given by Theorem 4 (up to lower-order terms).

We also consider a variant of the original uncertain priors coding problem, introduced by Haramaty and Sudan [4], in which we allow an error in communication with some bounded but positive probability:

###### Definition 5 (Positive-error uncertain priors coding)

For any , an -error uncertain priors coding scheme is given by a pair of functions and such that for every , , and -close , when is chosen according to and is chosen uniformly at random,

 Prm,R[D(E(m,α,R,P),α,R,Q)=m]≥1−ϵ.

Again, we will refer to as the encoding length of the scheme for and .

The redundancy for positive-error uncertain priors coding is then defined in exactly the same way as for error-free coding.

We briefly note that the definition of Haramaty and Sudan [4] differs in two basic ways. First, their definition does not include a common random string since they were primarily interested in deterministic coding schemes. Second, they required that the decoder output a special symbol when it makes an error. Our one positive result (Theorem 7) can be easily modified to satisfy this condition, but we prove our lower bound, Theorem 8, for the slightly more lenient model stated here.

We also briefly note that Canonne et al. [10] have considered another variant of the basic model in which Alice and Bob do not share the common random string perfectly, only correlated random strings. In both this imperfect randomness model and the deterministic model of Haramaty and Sudan, the known encoding schemes feature substantially greater redundancy than —the redundancy for these schemes is linear in the entropy and, in the case of the deterministic schemes of Haramaty and Sudan, furthermore depends on the size of . It is an interesting open question for future work whether or not lower bounds of this form can be proved for these other settings.

## 3 The Price of Uncertainty

We now establish lower bounds on the redundancy for uncertain priors coding schemes, in both the error-free and positive error variants. We will see that both of these lower bounds are tight up to lower-order terms. Hence, at least to the first order, we identify the “price” incurred by uncertainty about a recipient’s prior distribution, beyond what is inherently necessary for successful communication in the absence of such uncertainty.

In both cases, our lower bounds are proved by exploiting the worst-case nature of the guarantee over priors and to embed a (worst-case) one-way communication complexity problem into the uncertain priors coding problem. We then essentially analyze both error-free and positive-error variants of the communication complexity problem to obtain our lower bounds for the respective uncertain priors coding problems.

### 3.1 Lower bound for error-free communication

We first consider the original error-free variant of uncertain priors coding, considered by Juba et al. [2]. Earlier, in Theorem 4, we recalled that they gave an error-free uncertain priors coding scheme with coding length . We now show that this is nearly tight, up to lower-order terms:

###### Theorem 6

For every error-free compression scheme for uncertain priors and every sufficiently large , there exists a pair of -close priors for which the scheme suffers redundancy at least .

Proof:   Given , we will first define a family of priors for the sender and receiver that will contain a pair of priors with high redundancy for every scheme. Let be the largest integer such that (so ) and consider a message set of size . The family of priors is parameterized by a distinguished message and a set of messages that includes .

For a given and , Alice’s prior gives message probability and gives the other messages probability . Bob’s prior , on the other hand, gives each message in probability , and gives the other messages uniform probability .

###### Lemma 1

For sufficiently large and , every pair in the family are -close.

Proof of Lemma:   We chose so that . Since the messages in have probability under , (which has probability between and under ) is -close, and we note that the rest of the messages in have probability under , so the rest of these messages in satisfy . The messages outside also have probability under , which is less than and certainly greater than for sufficiently large . Thus, actually, for such outside , again .

###### Lemma 2

Every prior in the family has entropy (and hence is as well).

Proof of Lemma:   With probability , a draw from gives the message with self-information , and gives each of the messages with self-information with probability . Thus, overall

 H(P) ≤(1−1logk)1logk+k21k2logk(2logk+loglogk) ≤1+loglogklogk+2. \vrule width 8.0% pt height 8.0pt depth 0.0pt

It therefore suffices to give a lower bound on the expected codeword length to obtain a lower bound on the redundancy up to an additive term, since the optimal encodings of the messages themselves only contribute bits.

###### Lemma 3

Any error-free coding scheme must have expected codeword length at least for some -close pair and as described above.

Proof of Lemma:   We note that by the min-max principle, it suffices to consider deterministic coding schemes for the case where and are chosen uniformly at random from the common domain of size . In slightly more detail, we are considering a zero-sum game between an “input” player and a “coding scheme” player, in which the pure strategies for the input player are pairs of priors and from our family, and the pure strategies for the coding player are error-free deterministic coding schemes. The payoff for the input player is the expected encoding length of the chosen coding scheme on the chosen pair of priors. Randomized coding schemes with shared random strings can be viewed as a random choice of a deterministic scheme; Von Neumann’s Min-max Theorem then guarantees that the expected encoding length of the best randomized coding scheme is equal to the expected encoding length of the best deterministic coding scheme for an optimal (hardest) distribution over priors. So, by showing that for some concrete choice of distribution over priors the expected encoding length must be at least , we can infer that for every randomized coding scheme there must exist a fixed choice of and under which the expected encoding length is at least , which will complete the proof of the lemma.

So, suppose for contradiction that the expected codeword length of some deterministic error-free coding scheme is less than when the parameters and defining and are chosen uniformly at random. Markov’s inequality then guarantees that there is a “collision”—an ambiguous codeword for two distinct messages: in more detail, since the probability of obtaining a codeword of length at least is at most

 2−2(loglogk)/logk2−(loglogk)/logk=1−12logkloglogk−1

we find that with probability at least , the code length is at most

. That is, recalling that we have a uniform distribution over the domain, at least

messages have such short codes. But, we know that any unique code for so many messages has a codeword of length at least

 log(k2+1)−log(2logkloglogk−1)−1>2logk−loglogk

for sufficiently large . Thus, these messages with short codes cannot have been uniquely encoded.

So, there must be two messages and that share a codeword and both appear in with positive probability. Conditioned on their both appearing in , both of these messages have equal probability of being the message drawn from given the common codeword. Therefore, whatever Bob chooses to output upon receiving his codeword is wrong with positive probability, contradicting the assumption that the scheme is error-free.

Now, since our choice of ( and ) ensures that is at least , we find that the redundancy is indeed likewise at least since, as shown in Lemma 2, is also .

We note that apart from the encoding length, this lower bound is essentially tight in another respect: If there are fewer than messages, then these can be indexed by using less than bits, and so clearly a better scheme is possible by ignoring the priors entirely and simply indexing the messages in this case. Thus, no hard example for uncertain priors coding can use a substantially smaller set of messages.

Although our hard example uses a source distribution with very low entropy, we also briefly note that it is possible to extend it to examples of hard source distributions with high entropy. Namely, suppose that there is a second, independent, high-entropy distribution , and that Alice and Bob wish to solve two independent source coding problems: one with their -close priors and , and the other a standard source coding problem for

. The joint distributions

and are then -close and have entropy essentially . The analysis is now similar to the Slepian-Wolf Theorem [11]: The “side problem” of source coding for cannot reduce the coding length for uncertain priors coding of and since Alice and Bob can simulate drawing an independent message from using the shared randomness; this shared message can then be used by Bob to decode a hash of the message Alice would send in the joint coding problem.

### 3.2 The price of uncertainty when errors are allowed

We now turn to the setting where Alice and Bob are allowed to fail at the communication task with positive probability . Haramaty and Sudan [4] introduced this setting, but only considered deterministic schemes for this problem.

#### 3.2.1 Efficient uncertain priors coding when errors are allowed

We first note that the original techniques from Braverman and Rao [3] (used in the original work of Juba et al. [2]) give a potentially much better upper bound when errors are allowed:

###### Theorem 7

For every and , there is an uncertain priors compression scheme with expected code length that is correct with probability when and are -close. (Actually, we only require for all .)

Proof:   The scheme is a simple variant of the original uncertain priors coding: we use the shared randomness to choose infinitely long common random strings for each message . Then, to encode a message , Alice sends the first bits of . Bob decodes this message by computing the set of messages such that the first bits of agree with the codeword, and outputs some that maximizes .

To see that with probability at least , note that it suffices to show that of the at most messages with probability at least , no is consistent with the first bits of . Now, since , Alice has sent at least bits of . The probability that some agrees with so many bits of is at most . Therefore, by a union bound over the possible high-likelihood messages , the probability that any has consistent with is at most . Thus, none are consistent and decoding is correct with probability at least .

Finally, we note that the expected codeword length is exactly

 EP[⌈logαP(m)ϵ⌉] ≤EP[log1P(m)]+logα+log1ϵ+1 =H(P)+logα+log1/ϵ+1

as promised.

##### Computational aspects

We observe that the scheme presented in the proof of Theorem 7 has another desirable property, namely that the encoding is extremely efficient in the following sense: Given a message , Alice only needs to compute (the density function for at ), and only needs to look up the prefix of of length . This is in contrast to the error-free encoding scheme of Juba et al. [2], which requires Alice to first identify the messages with , and then read up to bits of , until some index in which , for each such . Observe that this may be as many as other messages, which may be quite large (if the individual densities are small). So, in comparison, the positive-error coding is rather computationally efficient. The natural decoding procedure, unfortunately, still also requires examining the density of up to messages. We will return to these issues when we discuss open problems.

#### 3.2.2 Lower bound for uncertain priors coding with errors

We now give a lower bound showing that the redundancy is essentially , up to terms of size . Thus, in this -error setting, we also identify the “price” of uncertainty up to lower-order terms. The proof is a natural extension of the proof of Theorem 6, showing that when the codes are so short, a miscommunication is not only possible but must be likely.

###### Theorem 8

For every compression scheme for uncertain priors that is correct with probability and every sufficiently large , there is some pair of -close priors for which the scheme suffers redundancy at least .

Proof:   For each , we will consider the same family of priors as used in the proof of Theorem 6: that is, we let be the largest integer such that and consider pairs of priors on a message set of size , indexed by a distinguished message and a set of messages that includes ; Alice’s prior gives probability and the other messages probability , and Bob’s prior gives the messages in probability , and gives the other messages uniform probability. Lemma 1 shows that every pair and are -close, and Lemma 2 shows that for every in the family. It thus again remains only to show that for every coding scheme that is correct with probability , there exists a pair of priors in the family under which the expected code length must be large.

###### Lemma 4

Any coding scheme that is correct with probability at least must have expected codeword length at least for some -close pair and as described above.

Proof of Lemma:   We again note that by the min-max principle, it suffices to consider deterministic coding schemes when a pair of priors (given by and ) are chosen uniformly at random from our family, and consider any scheme in which the expected codeword length is at most . As before, Markov’s inequality guarantees that with probability at least , the code length is at most . Now, recalling that we have the uniform distribution over the messages, this means that there are messages with such short codes. But, now there are at most codewords of this length.

We now consider, for a uniformly chosen message from this set (conditioned on having such a short codeword), the expected number of messages that are coded by the same codeword. We will refer to each pair that share a codeword as a “collision.” Letting denote the number of these messages coded by the codeword , this is

 1ℓ∑x#{y with the same code as x}=1ℓ∑cN2c.

Noting that , we know that this expression is minimized when all have equal size. That is, the expected number of collisions in this conditional distribution is at least . Since a uniformly chosen message hits this conditional distribution with probability at least , overall we have that the expected number of collisions is at least

 ℓϵlog2k2k1logk+log1/ϵ2loglogk−1≥ℓϵlogk4k.

So, consider a message for Alice that is sampled by first choosing a random pair from the family of priors and then sampling from . Note that regardless of whether or not is in , there are at least (additional) members of that are chosen uniformly at random, and each one collides with with probability at least

 Ncℓ1logk+log1/ϵloglogk−1,

which is at least . Thus, for sufficiently large , the probability that no member of shares a code with is less than

 (1−ϵloglogk8k)k≤1−2ϵ.

Now, by symmetry of the uniform choice of and the colliding elements of , whenever there is at least one collision, Bob’s output is wrong with probability at least . Thus, Bob is wrong in this case with probability greater than , and so the scheme has error greater than .

Since again and , the theorem now follows immediately.

## 4 Suggestions for future work

We now conclude with a handful of natural questions that are unresolved by our work. First of all, in both the error-free and positive error settings, the known upper and lower bounds still feature gaps of size . It would be nice to tighten these results further, reducing the gap to if possible in particular (as is known for standard source coding). We stress that it may be that neither the known upper bounds nor the known lower bounds are optimal. In particular, since we have considered one-shot source coding, we do not need to use prefix codes, and so Kraft’s inequality does not necessarily apply. It is conceivable that the upper bound can be tightened somewhat as a consequence, along the lines of the tightening of standard source coding achieved by Alon and Orlitsky [9] for example.111We thank Ahmad Beirami for bringing this to our attention. We note that there are examples such as uniform distributions under which the entropy actually matches the optimal encoding length, so the scope for such improvement is limited to examples such as a geometric distribution in which there is a significant gap.

Of course, as studied by Shannon [1], the prefix-code bounds essentially determine the optimal encoding lengths when we wish to transmit more than one message sampled from the same source. A very

natural second set of related questions independently suggested to us by S. Kamath, S. Micali, and one of our reviewers concerns the analogous question when multiple messages are sampled from the same source, but the sender and receiver are uncertain about each others’ priors. It is clear that if the source is fixed, as the number of messages grows, the receiver can construct an empirical estimate of the source distribution. So in the limit, this obviates the need for schemes such as those considered here: standard compression techniques can be used and will achieve an optimal coding length (per message) that again approaches the entropy of the source. The interesting question concerns what happens when the number of messages sent is relatively small, or scales with the uncertainty

, or relatedly (c.f. our earlier observation that a set of messages of size roughly is necessary for our lower bound to hold), scales with the size of the distribution’s support. Indeed, after decoding a single message (which, with high probability, is the distinguished message ), the “hard” example distributions we constructed subsequently cost bits per message in expectation. One can construct distributions with more “high-probability” messages in the sender’s distribution, but this naturally seems to reduce the redundancy since the encodings inherently require more bits. Noting that in this work we find that redundancy approximately is necessary when messages are sent and that certainly the redundancy is when messages are sent, Kamath [12] conjectures:

###### Conjecture 9 (S. Kamath)

For all , when messages are sent, the optimal per-message redundancy for uncertainty that is is .

Finally, we note that since these bounds primarily concern small numbers of messages in a noninteractive setting, questions about the gap between one-to-one and prefix coding still arise. Analogous to the source coding of a single message, it is conceivable that we could achieve similar savings along the lines obtained by Szpankowski and Verdú [13] for source coding with multiple messages. Similar bounds on the savings for universal compression (with unknown source distributions) have also been established in works by Kosut and Sankar [14] and Beirami and Fekri [15].

Third, we do not have any better lower bounds for the deterministic or imperfect-randomness settings, respectively from Haramaty and Sudan [4] and Canonne et al. [10]. The known upper bounds in these settings are much weaker, featuring redundancy that grows at least linearly in the entropy of the source distribution, and in the case of the deterministic codes of Haramaty and Sudan, some kind of dependence on the size of the distribution’s support. Is this inherently necessary? Haramaty and Sudan give some reasons to think so, noting a connection between graph coloring and uncertain priors coding schemes: namely, they identify geometric distributions with the nodes of a graph and include edges between all -close geometric distributions. They identify the “colors” with the messages sent by the high-probability element in each distribution. They then point out that a randomized uncertain priors scheme is essentially a “fractional coloring” of this graph, whereas a deterministic scheme is a standard coloring. Thus, there is reason to suspect that this problem may be substantially harder, and so a stronger lower bound (e.g., depending on the size of ) may be possible for the deterministic coding problem.

As for the imperfectly shared randomness setting, Canonne et al. [10] give some lower bounds for a communication complexity task (sparse gap inner product) showing that imperfectly shared randomness may require substantially more communication than perfectly shared randomness. Although this does not seem to immediately yield a strong lower bound for the uncertain priors problem (as it is for a very specific communication problem), their technique may provide a starting point for a matching lower bound on the redundancy in that setting as well.

Finally, none of these works address the question of the computational complexity of uncertain priors coding, a question originally raised in the work of Juba et al. [2]. In particular, codes such as arithmetic coding (first considered by Elias in an unpublished work) can encode a message by making a number of queries to a CDF for the source distribution that is linear in the code length, and has similar computational complexity. Although the code length is not as tight as Huffman coding [6] for one-shot coding, it is nearly optimal. The question is whether or not a similarly computationally efficient, near-optimal compression scheme for uncertain priors coding exists. We observed in Section 3.2.1 that our encoding scheme for the positive error setting was both quite efficient, given the ability to look up the desired prefix of the random encoding efficiently, and furthermore was close to optimal in encoding length. But, the natural decoding strategy is quite inefficient in terms of the number of queries to the density function required in particular: it requires up to queries to decode a message , where is the receiver’s density function. In the case of error-free coding, the decoding strategy is the same (and hence just as inefficient), and moreover, the known encoding strategies are similarly inefficient. Thus, the use of either scheme at present is computationally prohibitive, which is a barrier to their use.

## References

• [1] C. Shannon, “A mathematical theory of communication,” Bell Sys. Tech. J., vol. 27, pp. 379–423, 623–656, 1948.
• [2] B. Juba, A. Kalai, S. Khanna, and M. Sudan, “Compression without a common prior: an information-theoretic justification for ambiguity in language,” in Proc. 2nd Innovations in Computer Science, 2011, pp. 79–86.
• [3] M. Braverman and A. Rao, “Information equals amortized communication,” IEEE Trans. Inform. Theory, vol. 60, no. 10, pp. 6058–6069, 2014.
• [4] E. Haramaty and M. Sudan, “Deterministic compression with uncertain priors,” Algorithmica, vol. 76, no. 3, pp. 630–653, 2016.
• [5] J. Ziv and A. Lempel, “A universal algorithm for sequential data compression,” IEEE Trans. Inform. Theory, vol. 23, no. 3, pp. 337–343, 1977.
• [6] D. Huffman, “A method for the construction of minimum-redundancy codes,” in Proc. I.R.E., 1952, pp. 1098–1102.
• [7] P. Elias, “Universal codeword sets and representations of the integers,” IEEE Trans. Inform. Theory, vol. IT-21, no. 2, pp. 194–203, 1975.
• [8] A. D. Wyner, “An upper bound on the entropy series,” Inform. Control, vol. 20, pp. 176–181, 1972.
• [9] N. Alon and A. Orlitsky, “A lower bound on the expected length of one-to-one codes,” IEEE Trans. Inform. Theory, vol. 40, no. 5, pp. 1670–1672, 1994.
• [10] C. L. Canonne, V. Guruswami, R. Meka, and M. Sudan, “Communication with imperfectly shared randomness,” in Proc. 6th Innovations in Theoretical Computer Science (ITCS), 2015, pp. 257–262.
• [11] D. Slepian and J. K. Wolf, “Noiseless coding of correlated information sources,” IEEE Trans. Inform. Theory, vol. 19, no. 4, pp. 471–480, 1973.
• [12] S. Kamath, Personal communication, 2014.
• [13] W. Szpankowski and S. Verdú, “Minimum expected length of fixed-to-variable lossless compression without prefix constraints,” IEEE Trans. Inform. Theory, vol. 57, no. 7, pp. 4017–4025, 2011.
• [14] O. Kosut and L. Sankar, “New results on third-order coding rate for universal fixed-to-variable source coding,” in Proc. Intl. Symp. Inform. Theory (ISIT), 2014, pp. 2689–2693.
• [15] A. Beirami and F. Fekri, “Fundamental limits of universal lossless one-to-one compression of parametric sources,” in Information Theory Workshop (ITW), 2014, pp. 212–216.