How to Use Undiscovered Information Inequalities: Direct Applications of the Copy Lemma

01/22/2019 ∙ by Emirhan Gürpınar, et al. ∙ 0

We discuss linear programming techniques that help to deduce corollaries of non classic inequalities for Shannon's entropy. We focus on direct applications of the copy lemma. These applications involve implicitly some (known or unknown) non-classic universal inequalities for Shannon's entropy, though we do not derive these inequalities explicitly. To reduce the computational complexity of these problems, we extensively use symmetry considerations. We present two examples of usage of these techniques: we provide a reduced size formal inference of the best known bound for the Ingleton score (originally proven by Dougherty et al. with explicitly derived non Shannon type inequalities), and improve the lower bound for the optimal information ratio of the secret sharing scheme for an access structure based on the Vamos matroid.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

We can associate with an

-tuple of jointly distributed random variables its

entropy profile that consists of the values of Shannon’s entropy of each sub-tuple of the given tuple. We say that a point in is entropic if it represents the entropy profile of some distribution. The entropic points satisfy different information inequalities — the constraints that characterize the range of admissible entropies for jointly distributed variables. The most known and well understood information inequalities are so-called Shannon-type inequalities that are defined as linear combinations of several instances of the basic inequality , where are any subsets (possible empty) of random variables.

In 1998 Z. Zhang and R.W. Yeung discovered the first example of an (unconditional) non Shannon-type information inequality, which was a linear inequality for entropies of a quadruple of random variables that cannot be represented as a combination of basic inequalities [4]. Now many other non Shannon-type information are known (including several infinite families of non Shannon-type information inequality), see, e.g., [11, 6, 17].

The works on non-Shannon type information inequalities are focused on the following fundamental question: can we prove some particular property for Shannon’s entropy (usually expressed as an equality or an inequality), assuming some specific constraints (which are also given as equalities and inequalities for Shannon’s entropy, including Shannon type and non Shannon type universal information inequalities). Not surprisingly, these arguments typically use the technique of linear programming: at the first stage, to derive new information inequalities (and to prove that they are indeed new) and then, at the second stage, to apply the derived inequalities to some specific problem (e.g., in secret sharing, network coding, and so on). Many applications of this type involve heavy computations: the number of information inequalities in random variables grows exponentially with . It seems that in several problems the progress is stopped since the computational complexity of the corresponding linear programs is too high for the modern computers. So the question arises: can we reduce the computational complexity of the relevant problems of linear programming? In this paper we show that in some cases new progress can be made with a reasonable combination of previously known techniques.

Recently Farràs et al., [26], observed that the two stages of the scheme mentioned above (deriving new information inequalities and applying them) can be merged in a unique problem of linear programming: in a specific application of information inequalities, instead of enumerating explicitly all known non-Shannon information inequalities, we can include in a linear program the constraints from which those inequalities are (or can be) derived. The constraints used in [26] were instances of the Ahlswede–Körner lemma, [7]. As a tool of inference of non Shannon type inequalities, the Ahlswede–Körner lemma is essentially equivalent to the Copy Lemma from [17] “cloning” one single random variable, see [20]. In this paper we use a technique similar to [26], but employing the constraints obtained from the Copy Lemma. This permits us to extend the technique: we apply the Copy Lemma to make “clones” of pairs of jointly distributed random variables (instead of copies of individual random variables used in [26]).

To reduce the dimension of the corresponding problem of linear programming, we use symmetries of the problems under consideration. Our usage of symmetries is similar to the ideas proposed in [24], where the authors suggested to use symmetries to reduce the complexity of computing the outer bounds on network coding capacity. For a study of the general problem of Shannon and non Shannon type inequalities for symmetric entropy points we refer the reader to [25].

To illustrate the power of this technique, we discuss two examples. In the first one we give a formal inference of the best known bound for the Ingleton score (originally proven in [17]), with only four auxiliary random variable and three applications of the Copy Lemma. The obtained computer-assistant proof is very fast, and the “formally checkable” version of the proof can be reduced to a combination of 167 equalities and inequalities with rational coefficients. In our second example we improve the known lower bound for the optimal information ratio of the secret sharing schemes associated with the Vámos matroid (for the access structure denoted ).

Though the techniques used in this paper are rather simple, their combination proves to be fruitful. We believe that this approach can be used in other problems of information theory.


We denote . Given a tuple of jointly distributed random variable and a set of indices we denote by the Shannon entropy . We fix an arbitrary order on non-empty subsets of the set of indices and assign to each distribution its entropy profile

, i.e., the vector of

entropies (for all non-empty ).

We say that a point in the space is entropic if it represents the entropy profile of some distribution. Following [4], we denote by the set of all entropic points. We denote by the topological close ; its elements are called almost entropic.

We use the standard abbreviations for linear combinations of coordinates in the entropy profile:

We extend this notation to almost entropic points.

Ii Copy Lemma

The following lemma was used (somewhat implicitly) in the very first proof of a non Shannon type inequality in [4], see also [18, Lemma 14.8]. The general version of this lemma appeared in [9] and [17]. In [17] it was presented explicitly, with a complete and detailed proof.

Lemma 1 (Copy Lemma)

For every tuples of jointly distributed random variables there exists a distribution such that

  • the distribution coincides with the distribution of ; the distribution of coincides with the distribution of , and

  • .

In what follows we abuse the notation and identify with . That is, we say that the initial distribution can be extended111Formally speaking, such an “extension” may require a subdivision of the initial probabilistic space, so technically and are defined on different probabilistic spaces. to the distribution , where and have isomorphic joint distributions, and . Following [17], we say that is a -copy of over , denoted by

Note that Lemma 1 remains valid if are not individual variables but tuples of variables.

F. Matúš and L. Csirmaz proposed several extensions and generalizations of the Copy Lemma (polymatroid convolution, book extension, maximum entropy extension, see, e.g., [12, 22, 23]). However, to the best of our knowledge, the “classic” version of the Copy Lemma is strong enough for all known proofs of non Shannon type inequalities

All known proofs of (unconditional) non Shannon type information inequalities can be presented in the following style: we start with a distribution ; using the Copy Lemma we supplement to this distribution several new variable ; then we enumerate the basic Shannon’s inequalities for the joint distribution and show that a combination of these inequalities (together with the constraints from the Copy Lemma) gives the desired new inequality for (the entropy terms with new variable should be canceled). We refer the reader to [17] for many instances of this argument.

In what follows we use a very similar scheme of proof. The difference is that we superimpose constraints (equalities for the coordinates of the entropy profile of ) that are specific for the given specific problem. That is, in each application we prove a conditional non Shannon type inequality (which is valid only assuming some linear constraints for the entropy quantities). Each of these arguments can be theoretically translated in a more conventional style: we can at first prove some new unconditional non Shannon type inequalities, and then combine them with the given constraints and deduce the desired corollary. However, we do not need to reveal and formulate explicitly these unconditional inequalities hidden behind the proof.

Iii Symmetries

In this section we discuss a simple formalism of symmetry considerations that helps to reduce the dimension of a linear programs associated with problems concerning information inequalities (assuming that the problem has a particular symmetric structure). A similar formalism (action of permutations on multidimensional distributions and their entropy profiles, information inequalities for distributions with symmetric entropy profiles) was systematically studied in [25].

Let be a joint distribution of random variables. We denote by its entropy profile. Let be a permutation of indices (an -permutation). This permutation induces a natural transformation of the distribution ,

and therefore a transformation of the entropy profile

Example 1

If is the transposition of indices and , then the mapping defined above exchanges the entropy values and , the values and , etc., and does not change the values , , etc.

Thus, every permutation of indices induces a transformation of the set of all entropic points

. This transformation of the entropic points can be naturally extended to a linear transformation of the entire space

(defined by a suitable permutation of the coordinate axis). In what follows we denote this transformation by . (Note that is a permutation on elements, while is a transformation of the space of dimension .) In the usual algebraic terminology, we can say that we defined a representation of the symmetric group (the group of all -permutations) by linear transformations of the space .

We say that a point is -invariant if . Similarly, we say that a set of points is -invariant, if is a permutation of points in . For a subgroup of -permutations, we say that a point (or a subset) in is -invariant, if it is -invariant for every .

We denote the set of -invariant points in by . In what follows we typically study linear and affine spaces that are -invariant for some specific group of permutations . Note that every -invariant linear space in contains the set as a subspace.

The sets of all entropic and all almost entropic points ( and respectively) are obviously -invariant for every -permutation . For every group of -permutations, the intersections of or with any -invariant set gives another -invariant set. In what follows we discuss several simple examples of this type.

Example 2

Let be the set of points in such that

Since is defined by affine conditions, it can be represented as the intersection of with an affine subspace (of co-dimension ) in the entire space . It is easy to see that this subspace is invariant with respect to transpositions and ( exchanges the indices and , and exchanges the indices and ). Therefore, this subspace is -invariant, where is the subgroup of the symmetric group generated by the transpositions and (this subgroup consists of four permutations).

We say that a linear function

is -invariant, if for all . Similarly, for a subgroup of -permutations , we say that a linear function is -invariant, if it is -invariant for every .

Example 3

Ingleton’s quantity defined as

can be extended to a linear function on the space . This function is invariant with respect to the transposition of indices and , to the transposition of indices and , and with respect to the group (of size four) generated by these two transpositions.

Claim 1

Let be a subgroup of -permutations, and let be a point in . Then the center of mass of the points


is -invariant.


It is easy to see that the set of points (1) is -invariant for each . Therefore, the center of mass of this set is also -invariant.

Lemma 2

Let be a subgroup of -permutations, be a -invariant convex set in , and be a -invariant linear function. Then


It is enough to prove that for every there exists a such that

To this end we take the points

and define as the center of mass of these points. Since is -invariant, each point belongs to . From convexity of it follows that also belongs to , and Claim 1 implies that belongs to . Thus, the constructed point belongs to .

On the other hand, from -invariance of it follows that


Since is linear, we conclude that at the center of mass of is equal to the same values as the points in (2), and the lemma is proven.

Example 4

Assume we are looking for the minimum of (see Example 3) on the set of almost entropic points (points in ) that satisfy the linear constraints

(see Example 2). We define as the group of permutations generated by two transpositions and that exchange the indices and respectively.

We claim that the required extremal value of is achieved at some -invariant point. Indeed, the given constraints define in a -invariant affine subspace. The intersection of this subspace with the convex cone is a -invariant convex set. On the other hand, Ingleton’s quantity is a -invariant linear function. Therefore, we can apply Lemma 2 and conclude that desired extremum is achieved at some point in .

Iv The Standard Benchmark: Bounds for Ingleton’s quantity

It is widely believed that one of the most interesting questions on the structure of is the question on the Ingleton’s quantity: what is the worst possible violation of Ingleton’s’ inequality


This inequality is true for linearly representable entropic points but for some other entropic points it is violated (see [5]). The question is how far below can go the Ingleton score

(here we use the terminology from [17]). From the Shannon type inequalities it follows only that this score is greater than . The first non Shannon type inequality from [4] implies that this score is greater than . The best bound was discovered by Dougherty et al.:

Theorem 1 (originally proven in [17])

For every quadruple of jointly distributed random variables

We show that this bound can be derived with only three applications of the Copy Lemma (with four new variables), with the help of symmetry considerations from the previous section. This approach gives a pretty simple way to confirm the ratio with the help of a computer, and provides a reasonably short “formally checkable” proof (a linear combination of 167 simple equalities and inequalities with rational coefficients, see Appendix).


Our goal it to find the minimal value of the Ingleton score (3) for entropic points. It is equivalent to the minimal value of for almost entropic points satisfying the normalization condition .

The objective function and the normalization constraint are invariant with respect to the transpositions and that exchange the variables and respectively. Therefore, we can apply Lemma 2 and conclude that the required minimal value can be found in the space of points that are - and -invariant.

The invariance means that we can restrict ourselves to the points in such that


The crucial part of the proof is, of course, the Copy Lemma. We use the trick222It is remarkable that Ineq. 33. from [17] (even together with the symmetry constraints) does not imply Theorem 1. There is no contradiction: the instances of the Copy Lemma used in this proof imply besides Ineq. 33. several other non Shannon type inequalities, and only together these inequalities imply the required bound for the Ingleton score. from [17, proof of Ineq. 33, p. 19] and define four new random variables :


Technically, this means that we take the conditions (equalities) from the Copy Lemma (Lemma 1) that determine the entropies of the introduced variables and the properties of conditional independence for each instance of the Copy Lemma.

At last, we add to the normalization condition and the constraints (4) and (5) the Shannon type inequalities for . Thus, we obtain a linear program for the variables (corresponding to the coordinates of the the entropy profile of ) with the objective function

This problem is easily for the standard linear programming solvers (we made experiments with the solvers [27, 28, 29]

). We can extract from the dual solution of this linear program a rational combination of inequalities that gives the desired bound (without floating point arithmetics nor rounding). We present this “formally checkable” proof in Appendix.

V Secret Sharing Schemes on the Vámos Matroid

The notion of a secret sharing scheme was introduced by Shamir [2] and Blakley [1]. Nowadays, secret sharing is an important component of many cryptographic protocols. We refer the reader to [15] for an excellent survey of secret sharing and its applications.

The aim of perfect secret sharing is to distribute a secret value among a set of parties in such a way that only the qualified sets of parties can recover the secret value; the non-qualified sets should have no information about the secret. The family of qualified sets of parties is called the access structure of the scheme. This family of sets must be monotone (a superset of a qualified set is also qualified).

The information ratio of a secret sharing schemes is the maximal ratio between the size (the entropy) of a share and the size (the entropy) of the secret. The optimization of this parameter is the central problem of secret sharing. This question was extensively studied for several families of access structures. The main technique for proving lower bounds on complexity of secret sharing schemes is nowadays the linear programming applied to the constraints formulated in terms of classic and non-classic information inequalities, see [19].

In what follows we apply our technique (the Copy Lemma combined with symmetry considerations and linear programming) to one of the classic access structures — the access structure defined on the Vámos matroid (more technically, one of two access structures that can be defined on the Vámos matroid). We skip the general motivation (the Vámos matroid is particularly interesting as a simple example of a non-representable matroid, see [15] for details) and formulate directly the problem of the information ratio for this access structure in terms of Shannon’s entropy of the involved random variables.

We define an access structure with parties . The minimal qualified sets are the -sets , and all -sets not containing them, with three exceptions , , . We deal with jointly distributed random variables , where is the secret and are the shares given to the seven parties involved in the scheme. We require that

  • for every minimal qualified set listed above ;

  • if includes no minimal qualified set, then .

Under these constraints, we are looking for the extremal value

Since the set of almost-entropic points is a closed convex cone, we can add the normalization and rewrite the objective function as


It is known that the required minimum is not greater than , [14]. The history of improvements of the lower bound for this problem is presented in the table:

[3], 1992
[8], 2006 for a secret of size
[13], 2008
[16], 2011
[21], 2013
[26], 2018
this paper

Notice that the objective function, the normalization condition , and the linear constraints in (i) and (ii) above are invariant with respect to the permutation that

  • swap the indices and ,

  • swap the indices and ,

  • swap the indices and ,

  • swap the pairs and .

Therefore we can apply Lemma 2 and reduce the problem to those almost entropic points that satisfy the symmetries defined above.

We denote

and define four new variables by applying twice the Copy Lemma:

Remark 1

Each instance of the Copy Lemma includes the property of conditional independence. In the construction described above we have two applications of the Copy Lemma, and therefore two independence conditions:

These two conditions can be merged in one (more symmetric) constraint

Thus, we obtain a linear program with the following constraints:

  • the conditions (i) and (ii) that define the access structure of the secret sharing scheme,

  • the normalization ,

  • the equalities that follow from the symmetry of the access structure,

  • the conditions of the two instances of the Copy Lemma,

  • and the Shannon type inequalities for twelve random variables (i.e., for coordinates of their entropy profile).

The goal is to optimize the objective function (6).

Remark 2

We do not need to include in the linear program the variables and and their entropies — these two variables are used only as a notation in the Copy Lemma.

With a help of a computer (we used the solvers [27, 28, 29]) we find the optimal solution of this linear program: it is equal to . Thus we establish the following statement.

Theorem 2

The optimal information rate of the secret sharing scheme for the access structure on the Vámos matroid is not less than

Remark 3

The standard linear programming solvers use the floating point arithmetic, and we should keep in mind the rounding errors. Our linear program contains only constraints with integer coefficients, so the values of the variables for the optimal solution of this linear program (the primal and the dual) can be taken rational. These rational values can be found by exact computing, without rounding. To compute the exact solution of a linear program we use the rational linear programming solver QSopt_ex, see [10, 27].

The found exact dual solutions (rational linear combinations of constraints) for this linear program consist of more than equalities and inequalities. The found rational linear combinations of equalities and inequalities provide a formal mathematical proof of Theorem 2, though such a proof hardly can be called “human-readable.”

Computing the exact solution of a linear program is rather time consuming. If we only need a lower bound for the optimal solution, it is enough to find a feasible solution of the dual problem that approximates (well enough) the optimal solution. Such an approximation can be found much faster than the exact ration solution, with more conventional floating point LP solvers.

Remark 4

The linear program constructed in this section would have the same optimal value if we omit the symmetry conditions (note that, unlike the previous section, in the proof of Theorem 2 the Copy Lemma is applied in symmetric settings). However, these symmetry conditions are not useless. In our experiments with exact computations, the symmetry constraints help to find a rational solution of the dual problem with smaller denominators. Also, in several experiments with floating point solvers, the symmetry constraints make the computation (or approximation) of the optimal solution faster. However, this is not a universal rule: in some experiments the symmetry constraints make the computation only slower — it depends on the used linear programming algorithm and on the chosen scaling of the objective function and of the constraints.

Vi Conclusion

When we derive balanced information inequalities, the technique of the Copy Lemma with “clones” of individual random variables is equivalent to the technique of the Ahlswede–Körner lemma, see [20]. In this paper we used the Copy Lemma in a stronger form, making “clones” of pairs of correlated variables. In the proofs of Theorem 1 and Theorem 2 it was crucial that we can “duplicate” in one shot a pair of correlated random variables, not only a single random variable

The Ahlswede–Körner lemma used in [26] has a clear intuitive meaning: it can be interpreted in terms of extraction of the common information, see [7] and [6]

. It would be interesting to find a structural property of the probability distribution that would give an intuitive explanation of the efficiency of the Copy Lemma with

two (or more) copied variables.


  • [1] G. R. Blakley, Safeguarding cryptographic keys. AFIPS Conference Proceedings 48, (1979) 313-317.
  • [2] A. Shamir, How to share a secret. Commun. of the ACM, (1979) 22, 612-613.
  • [3] P. D. Seymour. On secret-sharing matroids. J. of Combinatorial Theory, Series B, 56:69–73, 1992.
  • [4] Z. Zhang and R. W. Yeung, On characterization of entropy function via information inequalities. IEEE Transactions on Information Theory, 44(4), (1998) 1440-1452.
  • [5] D. Hammer, A. Romashchenko, A. Shen, and N. Vereshchagin, Inequalities for Shannon entropy and Kolmogorov complexity. Journal of Computer and System Sciences, (2000) 60(2), 442-464.
  • [6] K. Makarychev, Yu. Makarychev, A. Romashchenko, and N. Vereshchagin, A new class of non-Shannon-type inequalities for entropies. Communications in Information and Systems, (2002) 2(2), 147-166.
  • [7] R. Ahlswede and J. Körner, On common information and related characteristics of correlated information sources, in General Theory of Information Transfer and Combinatorics, (2006) 664-677. Springer.
  • [8] A. Beimel and N. Livne, On matroids and non-ideal secret sharing. In Proc. Theory of Cryptography Conference, (2006) 482-501.
  • [9] R. Dougherty, C. Freiling, and K. Zeger, Six new non-Shannon information inequalities. In Proc. IEEE International Symposium on Information Theory, (2006) 233-236.
  • [10] D. G. Espinoza, On linear programming, integer programming and cutting planes. PhD thesis, Georgia Institute of Technology. 2006.
  • [11] F. Matúš, Infinitely many information inequalities. In Proc. IEEE International Symposium on Information Theory, 2007, 41-44.
  • [12] F. Matúš, Adhesivity of polymatroids. Discrete Mathematics, (2007) 307(21), 2464-2477.
  • [13] A. Beimel, N. Livne, C. Padró. Matroids Can Be Far From Ideal Secret Sharing. Theory of Cryptography, (2008) 194-212.
  • [14] J. Martí-Farré, and C. Padró, On secret sharing schemes, matroids and polymatroids. J. Math. Cryptol. (2010) 4, 95-120.
  • [15] A. Beimel, Secret-Sharing Schemes: A Survey. In Proc. International Conference on Coding and Cryptology 2011. LNCS, vol. 6639, (2011) 11-46.
  • [16] J. R. Metcalf-Burton, Improved upper bounds for the information rates of the secret sharing schemes induced by the Vámos matroid. Discrete Math. (2011) 311, 651-662.
  • [17] R. Dougherty, C. Freiling, and K. Zeger, Non-Shannon information inequalities in four random variables. (2011) arXiv:1104.3602.
  • [18] R. W. Yeung, A first course in information theory. (2012) Springer Science & Business Media.
  • [19] C. Padró, L. Vázquez, A. Yang, Finding Lower Bounds on the Complexity of Secret Sharing Schemes by Linear Programming. Discrete Applied Mathematics, (2013) 161, 1072-1084.
  • [20] T. Kaced, Equivalence of two proof techniques for non-Shannon-type inequalities. In Proc. IEEE International Symposium on Information Theory, 2013, 236-240.
  • [21] M. Gharahi, On the Complexity of Perfect Secret Sharing Schemes. Ph.D. Thesis (in Persian), Iran Univ. of Science and Technology (2013).
  • [22] L. Csirmaz, Book Inequalities. IEEE Trans. Information Theory, (2014) 60(11), 6811-6818.
  • [23] F. Matúš, and L. Csirmaz, Entropy region and convolution. IEEE Trans. Inf. Theory, (2016) 62(11), 6007-6018.
  • [24] J. Apte and J. M. Walsh, Symmetry in network coding. In Proc. IEEE International Symposium on Information Theory, (2015) 376-380.
  • [25] Q. Chen and R. W. Yeung, Partition-symmetrical entropy functions. IEEE Transactions on Information Theory, (2016) 62(10), 5385-5402.
  • [26] O. Farràs, T. Kaced, S. Martín, and C. Padró, Improving the linear programming technique in the search for lower bounds in secret sharing. In Proc. Annual International Conference on the Theory and Applications of Cryptographic Techniques, 2018, 597-621.
  • [27] QSopt_ex: an exact linear programming solver. The authors of the solver are David Applegate, William Cook, Sanjeeb Dash, and Daniel Espinoza. The code is distributed under GNU GPL v2.1.
    Website of the project: A fork of QSopt_ex on GitHub:
  • [28] GNU Linear Programming Kit, developed by Andrew O. Makhorin. The code is distributed under GNU GPL v3. Website of the project:
  • [29] Gurobi Optimizer: a proprietary optimization solver for mathematical programming. Website: