The study of constraints on entropies is a central topic of research in information theory. In fact, more than 30 years ago, Pippenger  asserted that constraints on entropies are the “laws of information theory” and asked whether the polymatroidal axioms form the complete laws of information theory, i.e., whether every constraint on entropies can be derived from the polymatroidal axioms. These axioms consist of the following three types of constraints: (1) , (2) (monotonicity), and (3) (submodularity). It is known that the polymatroidal axioms are equivalent to Shannon’s basic inequalities, that is, to the non-negativity of the entropy, conditional entropy, mutual information, and conditional mutual information . In a celebrated result published in 1998, Zhang and Yeung  answered Pippenger’s question negatively by finding a linear inequality that is satisfied by all entropic functions, but cannot be derived from the polymatroidal axioms.
Zhang and Yeung’s result became the catalyst for the discovery of other information laws that are not captured by the polymatroidal axioms (e.g., [25, 34]). In particular, we now know that there are more elaborate laws, such as conditional inequalities, or inequalities expressed using
, which find equally important applications in a variety of areas. For example, implications between conditional independence statements of discrete random variables can be expressed as conditional information inequalities. In another example, we have recently shown that conjunctive query containment under bag semantics is at least as hard as checking information inequalities using. Despite the extensive research on various kinds of information inequalities, to the best of our knowledge nothing is known about the algorithmic aspects of the associated decision problem: check whether a given information law is valid.
In this paper, we initiate a study of algorithmic problems that arise naturally in information theory, and establish several results. To this effect, we introduce a generalized form of information inequalities, which we call Boolean information constraints, consisting of Boolean combinations of linear information inequalities, and define their associated decision problems. Since it is still an open problem whether linear information inequalities, which are the simplest kind of information laws, are decidable, we focus on placing these decision problems in the arithmetical hierarchy, also known as the Kleene-Mostowski hierarchy . The arithmetical hierarchy has been studied by mathematical logicians since the late 1940s; moreover, it directly influenced the introduction and study of the polynomial-time hierarchy by Stockmeyer . The first level of the arithmetical hierarchy consists of the collection of all recursively enumerable sets and the collection of the complements of all recursively enumerable sets. The higher levels and , , are defined using existential and universal quantification over lower levels. We prove a number of results, including the following.
Checking the validity of a Boolean information constraint arising from a monotone Boolean formula (in particular, a max information inequality) is in (Theorem 4.1).
Checking the validity of a conditional information inequality whose antecedents have “slack” and are group-balanced is in (Corollary 4.2.2).
Checking the validity of a group-balanced, max information inequality is Turing equivalent to checking the validity of an information inequality (Corollary 4.3).
While the decidability of linear information inequalities (the simplest kind considered in this paper) remains open, a separate important question is whether more complex Boolean information constraints are any harder. For example, some conditional inequalities, or some -inequalities can be proven from a simple linear inequality, hence they do not appear to be any harder. However, Kaced and Romashchenko  proved that there exist conditional inequalities that are essentially conditional, which means that they do not follow from a linear inequality. (We give an example in Equation (9).) We prove here that any conditional information inequality with slack is essentially unconditioned (Corollary 4.2.2; see also Equation(19)), and that any -inequality also follows from a single linear inequality (Theorem 4.3).
A subtle complication involving these results is whether by “validity” it is meant that the given Boolean information constraint holds for the set of all entropic vectors overvariables, denoted by , or for its topological closure, denoted by . It is well known that these two spaces differ for all . With the exception of (1) above, which holds for both and , our results are only for . A problem of special interest is the implication between conditional independence statements of discrete random variables, and this amounts to checking the -validity of a tight conditional information inequality; it is known that this problem is not finitely axiomatizable , and its decidability remains open. Our result (2) above does not apply here because it is a statement about -validity. However, we prove that the implication problem for conditional independence statements is in (Theorem 4.2.1).
2 Background and Notations
Throughout this paper, vectors and tuples are denoted by bold-faced letters, and random variables are capitalized. We write for the dot product of . For a given set , is convex if and implies ; is called a cone if and implies ; the topological closure of is denoted by ; and, finally, denotes the dual cone of . It is known that is always a closed, convex cone. We provide more background in Appendix A.
For a random variable with a fixed finite domain
and a probability mass function (pmf), its (binary) entropy is defined by
In this paper all logarithms are in base .
Fix a joint distribution overfinite random variables . For each , let denote the random (vector-valued) variable . Define the set function by setting , for all . With some abuse, we blur the distinction between the set and the set of variables , and write , , or interchangeably. We call the function an entropic function, and also identify it with a vector , which is called an entropic vector. Note that most texts and papers on this topic drop the component , which is always 0, leading to entropic vectors in . We prefer to keep the -coordinate to simplify notations. The implicit assumption is used through the rest of the paper.
The set of entropic functions/vectors is denoted by . Its topological closure, denoted by , is the set of almost entropic vectors (or functions). It is known  that for . In general, is neither a cone nor convex, but its topological closure is a closed convex cone .
Every entropic function satisfies the following basic Shannon inequalities:
called monotonicity and submodularity respectively. Any inequality obtained by taking a positive linear combination of Shannon inequalities is called a Shannon-type inequality.
Throughout this paper we will abbreviate the union of two sets of variables as . The quantities and are called the conditional entropy and the conditional mutual information respectively. It can be easily checked that and are Shannon-type inequalities.
The established notation [48, 51, 11] for the set of entropic vectors is unfortunate, because the star in this context does not represent the dual cone. We will continue to denote by the set of entropic vectors (which is not a cone!), and use explicit parentheses, as in , to represent the dual cone.
3 Boolean information Constraints
Most of this paper considers the following problem: given a Boolean combination of information inequalities, check whether it is valid. However in Section 5 we briefly discuss the dual problem, namely, recognizing whether a given vector is an entropic vector (or an almost entropic vector).
A Boolean function is a function . We often denote its inputs with variables , and write for the value of the Boolean function.
3.1 Problem Definition
A vector defines the following (linear) information inequality:
The information inequality is said to be valid if it holds for all vectors ; equivalently, is in the dual cone, . By continuity, an information inequality holds iff it holds . In 1986, Pippenger  defined the “laws of information theory” as the set of all information inequalities, and asked whether all of them are Shannon-type inequalities. This was answered negatively by Zhang and Yeung in 1998 . We know today that several applications require more elaborate laws, such as max-inequalities and conditional inequalities. Inspired by these new laws, we define the following generalization.
To each Boolean function with inputs, and every vectors , we associate the following Boolean information constraint:
For a set , a Boolean information constraint is said to be -valid if it holds for all . Thus, we will distinguish between -validity and -validity. Unlike in the case of information inequalities, these two notions of validity no longer coincide for arbitrary Boolean information constraints in general, as we explain in what follows.
Let be a Boolean function. The entropic Boolean information constraint problem parameterized by , denoted by , is the following: given integer vectors , where , check whether the constraint (3) holds for all entropic functions . In the almost-entropic version, denoted by , we replace by .
The inputs , to these problems are required to be integer vectors in order for and to be meaningful computational problems. Equivalently, one can require the inputs to be rational vectors .
Let be a Boolean function. can be written as a conjunction of clauses , where each clause is a disjunction of literals. Equivalently, a clause has this form:
where are distinct Boolean variables. Checking is equivalent to checking , for each clause of (and similarly for ); therefore and without loss of generality, we will assume in the rest of the paper that consists of a single clause (4) and study the problem along these dimensions:
Conditional and Unconditional Constraints When (i.e., when the antecedent is empty), the formula is monotone, and we call the corresponding Boolean information constraint unconditional. When , the formula is non-monotone, and we call the corresponding constraint conditional.
Simple and Max Constraints When and , then we say that defines a simple inequality; when and , then we say that defines a -inequality. The case when and is not interesting because is not valid, since the zero-vector violates the constraint.
3.2 Examples and Applications
This section presents examples and applications of Boolean Function Information Constraints and their associated decision problems. A summary of the notations is in Fig. 1.
|Conditional Independence||CI||(no name)|
3.2.1 Information Inequalities
We start with the simplest form of a Boolean information constraint, namely, the linear information inequality in Eq. (2), which arises from the single-variable Boolean formula . We will call the corresponding decision problem the information-inequality problem, denoted by IIP: given a vector of integers , check whether Eq. (2) is -valid or, equivalently,
-valid. Pippenger’s question from 1986 was essentially a question about decidability. Shannon-type inequalities are decidable in exponential time using linear programming methods, and software packages have been developed for this purpose[47, Chapter 13] (it is not known, however, if there is a matching lower bound in the complexity of this problem). Thus, if every information inequality were a Shannon-type inequality, then information inequalities would be decidable. However, Zhang and Yeung’s gave the first example of a non-Shannon-type information inequality . Later, Matúš  proved that, when variables, there exists infinitely many inequivalent non-Shannon entropic inequalities. More precisely, he proved that the following is a non-Shannon inequality, for every :
This ruined any hope of proving decidability of information inequalities by listing a finite set of axioms. To date, the study of non-Shannon-type inequalities is an active area of research [50, 31, 49], and the question whether IIP is decidable remains open.
3.2.2 Max Information Inequalities
Next, we consider constraints defined by a disjunction of linear inequalities, in other words , where . This is equivalent to:
and, for that reason, we call them Max information inequalities and denote the corresponding decision problem by MaxIIP. As before, -validity and -validity coincide.
Application to Constraint Satisfaction and Database Theory Given two finite structures and , we write for the set of homomorphisms from to . We say that dominates structure , denote by , if for every finite structure , we have that . The homomorphism domination problem asks whether , given and . In database theory this problem is known as the query containment problem under bag semantics . In that setting we are given two Boolean conjunctive queries , which we interpret using bag semantics, i.e., given a database , the answer is the number of homomorphisms . is contained in under bag semantics if for every database . It is open whether the homomorphism domination problem is decidable.
Kopparty and Rossman  described a MaxIIP problem that yields a sufficient condition for homomorphism domination. In recent work  we proved that, when is acyclic, then that condition is also necessary, and, moreover, the domination problem for acyclic is Turing-equivalent to MaxIIP. Hence, any result on the complexity of MaxIIP immediately carries over to the homomorphism domination problem for acyclic , and vice versa.
We illustrate here Kopparty and Rossman’s MaxIIP condition on a simple example. Consider the following two Boolean conjunctive queries: , ; interpreted using bag semantics, returns the number of triangles and the number of V-shaped subgraphs. Kopparty and Rossman proved that follows from the following max-inequality:
3.2.3 Conditional Information Inequalities
A conditional information inequality has the form:
Here we need to distinguish between -validity and -validity, and denote by ECIIP and AECIIP the corresponding decision problems. Notice that, without loss of generality, we can allow equality in the antecedent, because is equivalent to .
Suppose that there exist such that the inequality is valid; then Eq. (8) is, obviously, also valid. Kaced and Romashchenko  called Eq. (8) an essentially conditioned inequality if no such ’s exist, and discovered several valid conditional inequalities that are essentially conditioned.
Application to Conditional Independence Fix three set of random variables . A conditional independence (CI) statement is a statement of the form , and it asserts that and are independent conditioned on . A CI implication is a statement , where are CI statements. The CI implication problem
is: given an implication, check if it is valid for all discrete probability distributions. Since, the CI implication problem is a special case of ECIIP.
The CI implication problem has been studied extensively in the literature [30, 44, 18, 27]. Pearl and Paz  gave a sound, but incomplete, set of graphoid axioms, Studený  proved that no finite axiomatization exists, while Geiger and Pearl  gave a complete axiomatization for two restricted classes, called saturated, and marginal CIs. See [16, 21, 38] for some recent work on the CI implication problem. The decidability of the CI implication problem remains open to date.
While a CI implication problem is an instance of an entropic conditional inequality, one can also consider the question whether a CI implication statement holds for all almost entropic functions; for example the implication (9) holds for all almost entropic functions. Kaced and Romashchenko  proved that these two problems differ, by giving examples of CI implications that hold for all entropic functions but fail for almost entropic functions.
3.2.4 Group-Theoretic Inequalities
There turns out to be a way to “rephrase” IIP as a decision problem in group theory; This was a wonderful result by Chan and Yeung  (see also ). A tuple is called a group system if is a finite group and are subgroups. For any , define ; implicitly, we set . A vector defines the following group-theoretic inequality:
In particular, a positive or negative answer to the decidability problem for IIP immediately carries over to the validity problem of group-theoretic inequalities of the form (10). We note that the group-theoretic inequalities considered here are different from the word problems in group, see e.g. the survey ; the undecidability results for word problems in groups do not carry over to the group-theoretic inequalities and, thus, to information inequalities.
3.2.5 Application to Relational Query Evaluation
The problem of bounding the number of copies of a graph inside of another graph has a long and interesting history [17, 4, 14, 36]. The subgraph homomorphism problem is a special case of the relational query evaluation problem, in which case we want to find an upper bound on the output size of a full conjunctive query. Using the entropy argument from , Shearer’s lemma in particular, Atserias, Grohe, and Marx  established a tight upper bound on the answer to a full conjunctive query over a database. Note that Shearer’s lemma is a Shannon-type inequality. Their result was extended to include functional dependencies and more generally degree constraints in a series of recent work in database theory [19, 2, 3]. All these results can be cast as applications of Shannon-type inequalities. For a simple example, let be three binary relations (tables), each with tuples, then their join can be as large as tuples. However, if we further know that the functional dependencies and hold in the output, then one can prove that the output size is , by using the following Shannon-type information inequality:
While the tight upper bound of any conjunctive query can be proven using only Shannon-type inequalities, this no longer holds when the relations used in the query are constrained to satisfy functional dependencies. In that case, the tight upper bound can always be obtained from an information inequality, but Abo Khamis et al.  gave an example of a conjunctive query for which the tight upper bound requires a non-Shannon inequality.
3.2.6 Application to Secret Sharing
An interesting application of conditional information inequalities is secret sharing, which is a classic problem in cryptography, independently introduced by Shamir  and Blakley . The setup is as follows. There is a set of participants, a dealer , and an access structure . The access structure is closed under taking superset: and implies . The dealer has a secret , from some finite set , which she would like to share in such a way that every set of participants can recover the secret , but every set knows nothing about . The dealer shares her secret by using a secret sharing scheme, in which she gives each participant a share , where is some finite domain. The scheme is designed in such a way that from the tuple one can recover if , and conversely one cannot infer any information about if .
One way to formalize secret sharing uses information theory (for other formalisms, see ). We identify the participants with the set , and the dealer with the number . A secret sharing scheme on with access structure is a joint distribution on discrete random variables satisfying:
if ; equivalently, .
Intuitively, denotes the share given to the th participant, and is the unknown secret. It can be shown, without loss of generality, that can be replaced by the assumption that the marginal distribution on is uniform , which encodes the fact that the scheme does not reveal any information about the secret . Condition means one can recover the secret from the shares of qualified participants, while condition guarantees the complete opposite. A key challenge in designing a good secret sharing scheme is to reduce the total size of the shares. The only known [15, 10, 26] way to prove a lower bound on share sizes is to lower bound the information ratio . In order to prove that some number is a lower bound on the information ratio, we need to check that holds for all entropic functions satisfying the extra conditions (i), (ii), and (iii) above. Equivalently, is a lower bound on the information ratio if and only if the following Boolean information constraint is -valid:
4 Placing Ebic and Aebic in the Arithmetical Hierarchy
What is the complexity of / ? Is it even decidable? As we have seen there are numerous applications of the Boolean Information Constraint problem, hence any positive or negative answer, even for special cases, would shed light on these applications. While their (un)decidability is currently open, in this paper we provide several upper bounds on their complexity, by placing them in the arithmetical hierarchy.
We briefly review some concepts from computability theory. In this setting it is standard to assume objects are encoded as natural numbers. A set , for , is Turing computable, or decidable
, if there exists a Turing machine that, givendecides whether . A set is Turing reducible to if there exists a Turing machine with an oracle for that can decide membership in . The arithmetical hierarchy consists of the classes of sets and defined as follows. The class consists of all sets of the form , where is an -ary decidable predicate, Q if
is odd, andif is even. In a dual manner, the class consists of sets of the form . Then are the decidable sets, while consists of the recursively enumerable sets, and consists of the co-recursively enumerable sets. It is known that these classes are closed under union and intersection, but not under complements, and that they form a strict hierarchy, . For more background, we refer to . Our goal is to place the problems , , and their variants in concrete levels of the arithmetical hierarchy.
4.1 Unconditional Boolean Information Constraints
We start by discussing unconditional Boolean information constraints, or, equivalently, a Boolean information constraint defined by a monotone Boolean formula . The results here are rather simple; we include them only as a warmup for the less obvious results in later sections. Based on our discussion in Sections 3.2.1 and 3.2.2, we have the following result.
If is monotone, then and are equivalent problems.
Next, we prove that these problems are co-recursively enumerable, by using the following folklore fact. A representable set of random variables is a finite relation with rows and columns , where column contains rational probabilities in that sum to 1. Thus, defines random variables with finite domain and probability mass given by rational numbers. We denote its entropic vector. By continuity of Eq.(1), we obtain:
For every entropic vector and every , there exists a representable space such that .
The group-characterization proven by Chan and Yeung  implies a much stronger version of the proposition; we do not need that stronger version in this paper.
Let be a monotone Boolean formula. Then (and, hence, ) is in , i.e., it is co-recursively enumerable.
Fix and , . We need to check:
We claim that (12) is equivalent to:
Obviously (12) implies (13), and the opposite follows from Prop. 4.1: if (12) fails on some entropic vector , then it also fails on some representable close enough to . Finally, (13) is in because, the property after is decidable, by expanding the definition of entropy (1) in each condition , and writing the latter as , or, equivalently, , where are rational numbers, which is decidable. ∎
4.2 Conditional Boolean Information Constraints
We now consider non-monotone Boolean functions, in other words, conditional information constraints (8). Since - and -validity no longer coincide, we study and separately. The results here are non-trivial, and some proofs are included in the Appendix.
4.2.1 The Entropic Case
Our result for is restricted to the CI implication problem. Recall from Sec. 3.2.3 that this problem consists of checking whether an implication between statements of the form holds for all random variables with finite domain, and this is equivalent to checking whether a certain conditional inequality holds for all entropic functions. We prove that this problem is in by using Tarski’s theorem of the decidability of the theory of reals with .
The CI implication problem (Section 3.2.3) is in .
Tarski has proven that the theory of reals with is decidable. More precisely, given a formula in FO with symbols and , it is decidable whether that formula is true in the model of real numbers ; for example, it is decidable whether111 is a shorthand for and is a shorthand for . is true. We will write to denote the fact that is true in the model of reals.
Consider a conditional inequality over a set of joint random variables:
The following algorithm returns false if the inequality fails on some entropic function , and runs forever if the inequality holds for all , proving that the problem is in :
Iterate over all . For each , do the following steps.
Consider joint random variables where each has outcomes in the domain ; thus there are possible outcomes. Let be real variables representing the probabilities of these outcomes.
Construct a formula stating “there exist probabilities for these outcomes, whose entropy fails the conditional inequality”. More precisely, the formula consists of the following:
Convert each conditional independence statement in the antecedent into its equivalent statement on probabilities: .
Replace each such statement with a conjunction of statements of the form , for all combinations of values . If have in total random variables, then there are combinations of values , thus we create a conjunction of equality statements.
Each marginal probability is a sum of atomic probabilities, for example where are the probabilities of all outcomes that have and . Thus, the equality statement in the previous step becomes the following formula: . There is one such formula for every combination of values ; denote the conjunction of all these formulas. Thus, asserts .
Let . Let be the similar formula for the consequent: thus, asserts .
Finally, construct the formula .
Check whether . By Tarski’s theorem this step is decidable.
If is true, then return false; otherwise, continue with .
Tarski’s exponential function problem
One may attempt to extend the proof above from the CI implication problem to arbitrary conditional inequalities (8). To check if a conditional inequality is valid for all entropic functions, we can repeat the argument above: iterate over all domain sizes , and check if there exists probabilities that falsify the implication . The problem is that in order to express we need to express the vector in terms of the probabilities . To apply directly the definition of entropy in (1) we need to use the function, or, alternatively, the exponential function, and this takes us outside the scope of Tarski’s theorem. A major open problem in model theory, originally formulated also by Tarski, is whether decidability continues to hold if we augment the structure of the real numbers with the exponential function (see, e.g.,  for a discussion). Decidability of the first-order theory of the reals with exponentiation would easily imply that the entropic conditional information inequality problem ECIIP (not just the entropic conditional independence (CI) implication problem) is in , because every condition can be expressed using and the exponential function, by simply expanding the definition of entropy in Equation (1).
4.2.2 The Almost-Entropic Case
Suppose the antecedent of (8) includes the condition . Call tight if is -valid. When is tight, we can rewrite as . If is not tight, then there exists such that ; in that case we say that has slack. For example, all conditions occurring in CI implications are tight, because they are of the form , and more conveniently written , while a condition like has slack. We extend the definition of slack to a set. We say that the set has slack if there exists such that for all ; notice that this is more restricted than requiring each of to have slack. We present below results on the complexity of in two special cases: when all antecedents are tight, and when the set of antecedents has slack. Both results use the following theorem, which allows us to move one condition from the antecedent to the consequent:
The following statements are equivalent:
Moreover, if the set has slack, then one can set in Eq.(15).
We prove here only the implication from (15) to (14); the other direction is non-trivial and is proven in Appendix B using only the properties of closed convex cones. Assume condition (15) holds, and consider any s.t. . We prove that . For any , condition (15) states that there exists such that and therefore . Since is arbitrary, we conclude that , as required. ∎
By applying the theorem repeatedly, we can move all antecedents to the consequent:
Consider a conditional inequality (8). If all antecedents are tight, then the corresponding decision problem AECIIP is in
Recall that the implication problem for CI is a special case of a conditional inequality with tight antecedents. We have seen in Theorem 4.2.1 that the entropic version of the CI implication problem is in ; Corollary 4.2.2 proves that the almost entropic version is in .
Consider any conditional inequality (8) where the antecedents are tight. If this inequality holds for all almost entropic functions, then it can be proven by proving a family of (unconditional) inequalities (17). In fact, some conditional inequalities in the literature have been proven precisely in this way. For example, consider the CI implication (9) (Sec. 3.2.3), and replace each antecedent with . By Eq. (17), the following condition holds: such that
Antecedents Have Slack Next, we consider the case when the antecedents have slack, which is a recursively enumerable condition. In that case, condition (16) is equivalent to:
In other words, we have proven the following result of independent interest: any conditional implication with slack is essentially unconditioned. However, we cannot immediately use (19) to prove complexity bounds for , because the ’s in (19) are not necessarily rational numbers. When we derived Eq. (17) we used the fact that the antecedents are tight, hence , hence we could replace the ’s with some natural number larger than all of them. But now, the sign of is unknown. We prove below that, under a restriction called group balance, the ’s can be chosen in , placing the decision problem in . Group balance generalizes Chan’s notion of a balanced inequality, which we review below. In Appendix C we give evidence that some restriction is necessary to ensure the ’s are rationals (Example C), and also show that every conditional inequality can be strengthened to be group balanced (Prop C).
A vector is called modular if forall sets of variables . Every non-negative modular function is entropic , and is a non-negative linear combination of the basic modular functions , where when and is otherwise. Chan  called an inequality balanced if for every . He proved that any valid inequality can be strengthened to a balanced one. More precisely: is valid iff for all and is valid; notice that the latter inequality is balanced. For example, is balanced, while is not balanced, and can be strengthened to . We generalize Chan’s definition:
Call a set group balanced if (a) where is the matrix , and (b) there exists a non-negative modular function such that for all .
If then is group balanced iff is balanced, because the matrix has a single row , and its rank is 0 iff all entries are 0. We prove in Appendix C:
Consider a group balanced set of vectors with rational coefficients, . Suppose the following condition holds:
Then there exists rational with this property.
This implies that, if have slack and is group balanced, then there exist rational ’s for inequality (19). In particular:
Consider a conditional inequality (8). If the antecedents have slack and is group balanced, then the corresponding decision problem is in .
We end this section by illustrating with an example:
Consider the following conditional inequality: