On a conditional inequality in Kolmogorov complexity and its applications in communication complexity

by   Andrei Romashchenko, et al.

Romashchenko and Zimand rom-zim:c:mutualinfo have shown that if we partition the set of pairs (x,y) of n-bit strings into combinatorial rectangles, then I(x:y) ≥ I(x:y | t(x,y)) - O( n), where I denotes mutual information in the Kolmogorov complexity sense, and t(x,y) is the rectangle containing (x,y). We observe that this inequality can be extended to coverings with rectangles which may overlap. The new inequality essentially states that in case of a covering with combinatorial rectangles, I(x:y) ≥ I(x:y | t(x,y)) - ρ - O( n), where t(x,y) is any rectangle containing (x,y) and ρ is the thickness of the covering, which is the maximum number of rectangles that overlap. We discuss applications to communication complexity of protocols that are nondeterministic, or randomized, or Arthur-Merlin, and also to the information complexity of interactive protocols.



page 1

page 2

page 3

page 4


An operational characterization of mutual information in algorithmic information theory

We show that the mutual information, in the sense of Kolmogorov complexi...

A Reverse Jensen Inequality Result with Application to Mutual Information Estimation

The Jensen inequality is a widely used tool in a multitude of fields, su...

Information Kernels

Given a set X of finite strings, one interesting question to ask is whet...

Sharp Bounds for Mutual Covering

Verdú reformulated the covering problem in the non-asymptotic informatio...

Communication Complexity of Private Simultaneous Quantum Messages Protocols

The private simultaneous messages model is a non-interactive version of ...

Complement-Free Couples Must Communicate: A Hardness Result for Two-Player Combinatorial Auctions

We study the communication complexity of welfare maximization in combina...

Fair Division Minimizing Inequality

Behavioural economists have shown that people are often averse to inequa...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Let us consider three strings and their Kolmogorov complexities , and respectively, . It is sometimes useful to use the Venn diagram in Figure 1 to visualize the information relations between the three strings.

Figure 1: Three strings and their joint information relation

For example, the region contained in the left circle which lies outside the right circle can be thought to represent , the region at the intersection of the left and right circles can be thought to represent the mutual information , and so on. There is however a nuisance: this visual representation is not always correct, the potential trouble maker being the darker region at the intersection of the three circles. This region is denoted and can be defined as (there are also some alternative definitions, which are equivalent up to an additive term; see Lemma 2.1). The problem is that can be negative. Romashchenko and Zimand [RZ18] have shown that if is a computable function of and and if furthermore this function has the “rectangle property” stating that implies , then actually is positive up to precision, where .

Theorem 1.1[Rz18]).

For every computable with the rectangle property, for every pair of -bit strings ,


where denotes , and hides a loss of precision bounded by .

A related result has been obtained by Kaced, Romashchenko and Vereshchagin [KRV18]. It uses Shannon entropy instead of Kolmogorov complexity, and it was an inspiration for [RZ18].

The inequality in Theorem 1.1 is particularly interesting in communication complexity because in this theory the rectangle property plays a prominent role. There are various models in communication complexity, the most basic one being the two-party model introduced in 1979 by Yao [Yao79]. Alice and Bob want to compute a function of two arguments, but Alice receives only , and Bob receives only . To achieve their goal, they run an interactive protocol (i.e., they exchange messages in several rounds, where each party computes the current message to be sent from his/her input and the previous messages) which allows them at the end to compute . The string which encodes in some canonical way the sequence of messages exchanged by Alice and Bob on input strings is denoted , and is called the transcript of the protocol. The key observation is that if on input pairs and the transcript of the protocol is the same string , then the transcript on input will be as well, and therefore, the transcript function has the rectangle property.

The above observation (which is standard in communication complexity, see [KN97]) shows that a protocol induces a partition of the domain of inputs into rectangles, where each part (i.e., rectangle) of the partition corresponds to a fixed value of the transcript, via the definition . Suppose now that the protocol allows Alice and Bob to compute . Then we can think that the domain of is formed by the cells and each cell is colored with the color . The rectangles of the partition induced by the protocol are -monochromatic, because all the cells that have the same transcript have the same color .

Thus, every deterministic interactive protocol induces a partition of the domain into monochormatic rectangles. But not every monochromatic partition into rectangles corresponds to an interactive deterministc protocol and, moreover, in some applications, one analyzes coverings of the domain of with monochromatic rectangles that can overlap (so the rectangles are not necessarily a partition of the domain). Such coverings have extremely interesting applications (for example, the breakthrough result of Fiorini et. al. [FMP15] uses them), and characterize nondeterministic communication complexity. Given a function of two arguments, it is of interest to determine the minimum number of -monochromatic rectangles that cover the domain of . The logarithm of this number is the nondeterministic communication complexity of . (We note that the standard definition of nondeterministic communication complexity is for boolean functions – see [KN97], Def 2.3 – and, for this class of functions, the definition is in terms of coverings with -monochromatic rectangles. In this work we focus mainly on non-boolean functions, for which the above definition is appropriate).

This is a “worst-case” type of complexity, but using the framework of Kolmogorov complexity adopted in this work, we can talk about the communication complexity for each individual input. A nondeterministic communication protocol can be described combinatorially, but also as an interactive computational procedure. In the combinatorial view, is simply a covering of with -monochromatic rectangles. In the procedural view, besides Alice and Bob, there is a third party, the Prover, also known as Merlin. Merlin knows both and (where is the input possessed by Alice, and is the input possessed by Bob). Merlin sends Alice and Bob a description of , which is one of the rectangles in the covering specified by that contains the cell . Alice checks that is a row of and Bob checks that is a column of , and, if both parties confirm that both checks were valid (which requires only two bits of communication), Alice and Bob derive the coveted which is just the color of .

Thus, it is natural to define the individual communication complexity of the protocol on input to be , i.e., the length of the shortest description of , given the rectangles of the protocol.

Our contributions. The center piece of this paper is an extension of Theorem 1.1, which is applicable to rectangle covers. The setting is as follows: is a set of rectangles, and , called the thickness of , is the maximum number of rectangles in that overlap. A pair is covered by if belongs to some rectangle of .

Theorem 1.2 (Main Inequality, informal and simplified form: the full version is in Theorem 3.1).

If is covered by and is a rectangle of containing , then


The proof of Theorem 1.2 is essentially the same as the proof of Theorem 1.1, but we believe that the new inequality (2) deserves attention because it is applicable to the communication complexity of nondeterministic protocols, and also of Arthur-Merlin (AM) protocols, which combine nondeterminism with randomness. This is intriguing because currently there is a lack of techniques for proving communication complexity lower bounds for AM protocols, and, consequently, finding AM-complexity lower bounds for any explicit function is a notoriously challenging open question in communication complexity [CCM15, GPW16, GPW18, Gav18].

Because of the term, the inequality (2) is meaningful only for nondeterministic and AM protocols with rectangle coverings having bounded thickness. Of course this is a limitation, but communication protocols with small thickness have their merits and have been studied since long with various monikers for the thickness parameter such as “few witnesses” or “limited number of advice bits” [KNSW94, HS96, HS00, GT03]. Yannakakis [Yan91] shows that communication-efficient nondeterministic protocols with thickness (these are called unambiguos nondeterministic protocols

) can be used to express certain combinatorial optimization problems as linear programs with few variables. Furthermore, recent works of Göös et. al. 

[GPW16] and Gavinsky [Gav18] highlight the importance of the thickness parameter in studying the complexity of AM communication protocols and analyze several models of AM-like protocols with thickness bounded in various ways. We believe that the information-theoretic inequality (2) can contribute to this research line and, more generally, to the understanding of AM protocols. In Section 5, we derive almost directly from a variant of (2), a lower bound for the communication cost of AM protocols, albeit for the easier case of a non-boolean function. We show that any AM protocol that computes (bitwise XOR) must have communication cost , where is the thickness of the protocol.

In Section 7, we present the analog of inequality (2) for multiparty communication protocols and give an application.

The information complexity of a -party interactive protocol ([CSWY01, BBCR10]) measures the amount of information each party learns about the other party’s input during the execution of the protocol. Information complexity has turned out to be a very useful concept in communication complexity (for example, see the survey paper [Wei15]). Intuitively, the information complexity should not be larger than the communication complexity, because each bit of the transcript carries at most one bit of information. This relation has been proven to hold by Braverman and Rao [BR11] in the case of randomized protocols, but it is natural to ponder about the relation between information complexity and communication complexity for nondeterministic protocols, where the intuitive view is less clear. In Section 6, we consider the Kolmogorov complexity version of information complexity. Relying again on the information-theoretical inequality (2), we show that in the case of a nondeterministic protocol , the information complexity of is at most the communication complexity of plus the logarithm of the thickness of , up to logarithmic precision.

2 Prerequisites, notation, and some useful lemmas

We assume familiarity with the basics of Kolmogorov complexity theory. We use standard notation in Kolmogorov complexity. We use etc. to denote finite binary strings, denotes the length of string

. We fix a universal Turing machine

, and we say that is a program (or description) of conditioned on if on input prints . The Kolmogorov complexity of conditional on is . If is the empty string, we write instead of .

We use to denote with a loss of precision bounded by , where the constant hidden in the notation depends only on the universal Turing machine used in the definition of Kolmogorov complexity. The parameter is defined in the context, and, by default, is the maximum length of the strings involved in the relation.

Throughout the paper we use the notation and .

The Kolmogorov-Levin theorem shows the validity of the chain rule

(here hidden in the notation, is , as explained above).

The mutual information of and conditioned on is denoted and is defined by . In case is the empty string, we simply write . For every strings , it holds that (the chain rule for mutual information).

Lemma 2.1.

Let . Then

  • .

  • .


Simple manipulations using the chain rule.    

2.1 Nondeterministic protocols

A rectangle is a set (usually a subset in ) of the form . We say that the set is the set of rows of , and is the set of columns of .

We define a nondeterministic communication protocol that allows two parties to compute a function , as being a covering with rectangles of the domain of together with a function that selects one of the rectangles containing . The formal definition is as follows.

Definition 2.2.
  1. A nondeterministic communication protocol for a function with two arguments is a pair , where and are as follows.

  2. The domain of the protocol is a set of the form , for some positive integers and and we let denote . We view as a table with rows and columns.

  3. is a covering of with rectangles. That is , for some natural number , where each is a rectangle, and .

  4. is a function which on input returns one of the rectangles in that contains the cell . We think of as being the transcript of the protocol on input .

  5. The communication complexity of the protocol on input is .

Henceforth, by “protocol” we mean a nondeterministic communication protocol, unless specified otherwise. In the next definition we define the thickness of a protocol, a parameter which quantifies how far from a partition is the covering induced by the protocol.

Definition 2.3.

Let be a protocol. The thickness of a cell , denoted is the number of rectangles in that contain the cell . When the protocol is clear from the context, we write more simply . The thickness of the protocol, denoted , is the maximum of the thickness of all cells in the domain , i.e., . The thickness of a rectangle , denoted , is .

We next define what it means for a protocol to compute a function exactly (i.e., on all inputs).

Definition 2.4.

A protocol computes the function of two arguments over domain , if every rectangle in is -monochromatic, i.e., for every rectangle , there is (the “color” of ) such that for all , .

We also define what it means for a protocol to compute a function with some error (i.e., it may err on a small fraction of inputs). This case is useful in the definition of Arthur-Merlin protocols. In case the protocol makes mistakes on some inputs, we can no longer assume that the rectangles are monochromatic, and Alice and Bob use functions (respectively ) to compute the output from their input and the rectangle provided by Merlin.

Definition 2.5.

A protocol that computes the function with error has the form where

  1. and are as in Definition 2.2.

  2. and are functions that map an input of Alice (respectively, an input of Bob) and a rectangle in into a element in the range of .

The protocol computes with error

, if with probability

, , where the probability is over chosen uniformly at random in the domain .

Lemma 2.6.

If is a protocol that computes the computable function , then for every :

  • .

  • .


Both statements follow from Lemma 2.1 (2) (relativized with taking into account that and .    

The above definitions can be extended for the case of protocols that compute relations. The idea is that Alice having and Bob having want to compute some value such that is in some given relation.

Let us consider a relation . We assume that for every there is some such that . A -monochromatic rectangle is a rectangle such that there exists with the property that for all cells . The smallest such (in some predefined linear order of ) is the color of .

Definition 2.7.
  1. A protocol computes the relation , if every rectangle in is -monochromatic.

  2. A protocol (see Definition 2.4(2)) computes the relation with error , if with probability , and , where the probability is over chosen uniformly at random in the domain .

3 The main inequality

The following result is an extension of Theorem 1.1. Its proof follows closely the proof of Theorem 1.1 from [RZ18].

Theorem 3.1.

For every protocol , and for every ,

Remark 1.

Similar-looking inequalities have been used implicitly in other papers studying interactive protocols (for example, [AC93, Lemma 2.2][BR11, Lemma 3.14]). Their proofs have an inductive structure based on the rounds of communication. For instance, consider a -round protocol where Alice sends message in Round , and Bob sends message in Round . We can show that as follows:

This approach does not work for nondeterministic protocols, where Merlin’s contribution to the transcript depends on both and and is delivered in “one-shot,” and not round-by-round. This is why we use a different method, using an idea from [RZ18], where it was employed for a different reason.


Let us fix . We say that is a clone of conditional on if (i) is a row of the rectangle and (ii) . Similarly, we say that is a clone of conditional on if (i) is a column of the rectangle and (ii) . Let denote the set of clones of (conditional on ) and denote the set of clones of (conditional on ). Let be the maximum between the length of and the length of .

Claim 3.2.

and .


(of claim) Given , and (with the observation that the latter can be written in a self-delimited way on bits), we can enumerate the clones of . Since , it follows that . Therefore

The other relation follows in the same way.    

Now let us take in which maximizes . Then






The reason for the last inequality is that belongs to at most rectangles of the covering . On the other hand, belongs to the rectangle and thus . This implies that . Combining inequalities (5) and (6),

Subtracting in the left hand side and (the smaller) in the right hand side, we obtain

which concludes the proof.    

4 Lower bounds for the communication complexity of 2-party protocols

The crux in communication complexity is proving lower bounds for concrete problems, because such lower bounds can be transferred to other domains (data structures, streaming algorithms, circuit complexity, and many other ones). Theorem 3.1 has some consequences, which can be used to prove lower bounds, as we will show below.

Consider a -party deterministic and computable protocol which allows Alice and Bob to compute a function on input , when Alice has and Bob has . The transcript has two parts: , comprising the messages sent by Alice, and , comprising the messages sent by Bob. Clearly, because can be computed from and . Similarly, . In this way we can compute lower bounds for and from bounds of the conditional complexity of . We would like to do the same thing for . In a deterministic protocol, we do have that , but we cannot say directly that because it is not clear if . In a nondeterministic communication protocol, is provided by Merlin and there are no and . Nevertheless, we show that for every protocol , . In fact, Theorem 4.1 shows that a stronger bound holds true.

Theorem 4.1.

For every computable function , for every protocol that computes over domain , and for every ,


Theorem 4.1 is an immediate corollary of the following lemma which relaxes the condition that rectangles are -monochromatic by requiring only that can be computed from the rectangle and and also from the rectangle and (and we can even allow help bits in the computation).

Lemma 4.2.

Let be a protocol over domain . For , let be a string such that and . Then,


To simplify the notation, we drop in all the , terms below the conditioning on . For example, with this notational convention, the conclusion becomes

From Theorem 3.1,

This can be written as

which taking into account that (because, given , can be computed from ) implies


In the right hand side,

We have used the fact that which follows from the lemma’s hypothesis regarding the complexity of . Similarly,

Plugging these inequalities in Equation (9),

which can be rewritten as


Similar bounds hold true for other mechanisms by which a communication protocol performs a computational task, i.e., for protocols computing functions with small error, or protocols that compute relations with and without error.

Theorem 4.3.
  1. In case the protocol computes over domain with error , then the inequality (7) holds with probability over chosen uniformly at random in .

  2. Suppose the protocol computes the relation and let be the color of rectangle . Then the inequality (7) holds true for every in .

  3. Suppose the protocol computes the relation with error and let (in case the latter two values are not equal, then is not defined). Then the inequality (7) holds true with probability over chosen uniformly at random in .

5 Application to Arthur-Merlin protocols

Equation (7) can be used to establish lower bounds for protocols that use randomness, or mix nondeterminism and randomness. In the latter type of protocols, Merlin provides a proof (like in standard nondeterministic protocols), and Alice and Bob (which together play the role of Arthur) probabilistically verify the proof and next compute the common output. There are actually two types of such protocols: Merlin-Arthur (MA) protocols in which the randomness is shared between Alice and Bob but is not visible to Merlin, and Arthur-Merlin (AM) protocols in which the randomness is shared between all parties, Alice, Bob, and Merlin. We provide here a lower bound for AM protocols, which is tight for protocols whose thickness is not too large. The lower bound is valid for all 2-party (i.e., without Merlin) randomized protocols. To our knowledge, this result does not seem to be attainable by other methods. We note that lower bounds for AM protocols are considered to be difficult, while, equipped with the results from the previous section, our proof is short and easy. For a discussion on AM protocols, the reader can see the survey paper [GPW18], where lower bounds for AM protocols are considered to be beyond reach at the current time; however, this consideration refers to lower bounds for boolean functions, and our example concerns a non-boolean function.

An AM protocol is essentially a distribution over nondeterministic protocols. More precisely, Alice, Bob and Merlin share a source of randomness. For each drawn from the source, there is a protocol as in Definition 2.5. The protocol computes the function if for every in the domain of , with probability of at least ,


The communication cost of an AM protocol is , i.e., the logarithm of the maximum number of rectangles, over all randomness . The thickness of an AM protocol is denoted and is by definition the maximum thickness of all protocols .

We now present our first application. The arguments below are valid for any group, but for concreteness, let us consider the group . Alice and Bob want to compute , where Alice has and Bob has .

A straightforward protocol consists in Merlin providing and . Alice and Bob (without actually using randomness) check and confirm to each other that Merlin has provided their inputs, after which they compute . This protocol has thickness , and the communication cost is (for and and for the two confirmation bits). There is an MA protocol with communication : Merlin sends a value claiming , and then Alice and Bob with additional communication and using their shared randomness (which is secret to Merlin) can check that and are equal using random fingerprinting in the standard way. This strategy does not work in an AM protocol, because if Merlin knows the randomness, he can cheat by using a wrong that passes the Alice/Bob test. Still, it is in principle conceivable that there may exist an AM protocol which is more communication-efficient than the trivial protocol above. We show that this is not possible for AM protocols whose thickness is not too large.

Claim 5.1.

Any AM protocol that computes must have communication cost at least . Thus, is essentially a lower bound for AM protocols with, say, thickness.

Let us consider an AM protocol that computes . For each randomness , let denote the set of pairs which satisfy the relations in equation (10). Such a pair is said to be correct with respect to .

Let us first attempt a direct argument which does not utilize the tools developed in the previous section. By a standard averaging argument there is some for which contains at least of the input pairs . No rectangle in can have more than pairs which are correct with respect to . The reason for this is that the number of rows in a rectangle is bounded by and each row in the rectangle can have at most one correct pair (because two cells and in row are assigned the same value by and it is not possible that both and are equal to ). Since the number of correct cells (i.e., the size of ) is at least , it follows that contains at least rectangles, and therefore the communication cost is at least , which is smaller than the claimed lower bound.

We can do better by using Lemma (4.2). Using this lemma and the same as in the argument above, we obtain that for every pair , which is correct with respect to (and recall that there are at least such pairs),


We next observe that for every and every string , and for any , for at least a fraction of elements in , because takes possible values and only of them can have complexity less than . The similar relation holds if we swap and

. Using these estimations for the first two terms in the right hand side of Equation (

11) (with in the role of ), we obtain that for at least a fraction of of all pairs , . This implies that the communication cost of any AM protocol is at least , as claimed.

Regarding the claim about randomized protocols made at the beginning of this section, we note that a 2-party randomized protocol is a distribution over 2-party deterministic protocols, and thus they can be viewed as AM protocols with thickness equal to . The above argument implies that any randomized protocol that computes (in the group ) with probability has communication cost for the majority of input pairs .

5.1 Comparison with conventional techniques

It is instructive to compare the technique discussed above, based on the inequality (3) and its variants (7) and (8), with more standard methods. Let us consider nondeterministic communication protocols (with no randomness). In this model it is easy to estimate the communication complexity of the function (for ). By definition, a nondeterministic communication protocol can be represented as a collection of “monochromatic” combinatorial rectangles that cover the set of all paris of inputs (i.e., the set ). The property of monochromaticity means that each rectangle should consist of pairs with one and the same value of . The communication complexity of the protocol is the logarithm of the number of rectangles in the cover.

No two pairs and can be in the same monochromatic rectangle. Indeed, either the pairs do not have the same sum, i.e., they have different colors; or they have the same sum, but in this case the “crossed” pairs and have a different sum. Therefore, the number of monochromatic rectangles has to be , and thus the communication complexity of the protocol is not less than . This proof is a version of the fooling set argument. The same bound can be obtained with a more explicit usage of the standard techniques of fooling sets or linear rank, which have been used to establish lower bounds for many communication problems. There are possible values of , so we must have rectangles colored in each of the possible values. Next, for a fixed value , the relation that consists of all pairs such that is isomorphic to the relation of identity ( and must be equal to each other). And for this predicate the fooling set or linear rank methods imply that the cover contains at least rectangles. Summing up the number of rectangles for all we conclude that the cover consists of at least monochromatic rectangles,

This argument works for protocols with any thickness. So with a very simple argument we have obtained a statement which is even stronger then the bound that follows from inequality (3). However, the conventional techniques are not stable with respect to a random perturbation. When we change the value of for a fraction of all pairs , we can corrupt all large enough fooling sets or dramatically reduce the linear rank. On the other hand, the bound based on the information inequality (3) remains valid (though we need to assume that the thickness of the protocol is bounded). Thus, the new technique has an advantage if we deal with “randomly perturbed” versions of well studied functions. Roughly speaking, with the new argument we gain the factor of compared to the simpler standard bounds: for protocol with low thickness we obtain a lower bound , while the trivial lower bound (the logarithm of the number of colors) is . We used this property of “robustness” to prove a lower bound for AM communication protocols in Section 5.

As an additional example, consider the communication complexity of computing an approximation of (for ). Thus, Alice and Bob want to compute as a -approximation of , meaning that the Hamming distance between and is bounded by , for . By Theorem 4.3, part , for any protocol computing such an approximation and for all pairs , we have . Since can be computed from and bits, where , the right hand side of the inequality is . For most pairs , and . We conclude that for most pairs , .

Remark 2.

There is another nice property of the technique based on the inequality (3): the bound holds true for protocols where the value is not embedded explicitly in the transcript of the protocol. As stated in Lemma 4.2, the bound applies to the protocols where Alice and Bob can compute given the transcript together with their inputs, or respectively (so Alice and Bob can find , while the external observer who accesses only the transcript possibly cannot reconstruct the value of ).

6 The information complexity of communication protocols

In the standard setting of information theory and Shannon entropy, there are two types of information complexity, internal and external. We focus on the first one, which has more applications, and define the analog concept in the framework of Kolmogorov complexity.

Definition 6.1.

The internal information cost of a protocol for input pair is

The internal information cost is the amount of information each party learns about the other party’s input from the transcript of the protocol. Intuitively, it should not be more than the complexity of the transcript.

The next theorem concerns the case of nondeterministic protocol and shows that the internal information complexity is bounded by the sum between the complexity of the transcript and the logarithm of the thickness. This validates the intuition mentioned in the above paragraph, up to a loss of a precision, for the class of protocols with polynomial thickness, which includes the class of deterministic protocols (because such a protocol has thickness equal to ). The theorem is the Kolmogorov complexity analog of a result of Braverman and Rao [BR11]. We note that the proof of Braverman and Rao cannot be adapted, because they consider only randomized protocols (so, without Merlin) and their proof works inductively on the number of rounds of the Alice/Bob interaction. In our setting, Merlin also contributes to the communication complexity and this component is not handled by the technique in  [BR11], as we have explained in Remark (1).

Theorem 6.2.

For every protocol and every input pair ,

  • .

  • .


To keep the notation simple, in all the and below we omit the conditioning on . Note that


In the third line we have used the fact that, given , can be computed from and . Similarly, .

Consequently, the internal information cost is equal to