DeepAI

# Equality in the Matrix Entropy-Power Inequality and Blind Separation of Real and Complex sources

The matrix version of the entropy-power inequality for real or complex coefficients and variables is proved using a transportation argument that easily settles the equality case. An application to blind source extraction is given.

• 13 publications
• 14 publications
09/08/2021

### Quantitative form of Ball's Cube slicing in ℝ^n and equality cases in the min-entropy power inequality

We prove a quantitative form of the celebrated Ball's theorem on cube sl...
04/18/2020

### Prove Costa's Entropy Power Inequality and High Order Inequality for Differential Entropy with Semidefinite Programming

Costa's entropy power inequality is an important generalization of Shann...
12/22/2020

### A generalization of Costa's Entropy Power Inequality

Aim of this short note is to study Shannon's entropy power along entropi...
07/10/2019

### Entropy and Compression: A simple proof of an inequality of Khinchin

We prove that Entropy is a lower bound for the average compression ratio...
04/01/2021

### Hereditary rigidity, separation and density In memory of Professor I.G. Rosenberg

We continue the investigation of systems of hereditarily rigid relations...
10/23/2018

### A Statistical Approach to Adult Census Income Level Prediction

The prominent inequality of wealth and income is a huge concern especial...
01/10/2022

### Decision Trees with Soft Numbers

In the classical probability in continuous random variables there is no ...

## I Introduction

The entropy power inequality (EPI) was stated by Shannon [1] in 1948 and is well known to be equivalent to the following minimum entropy inequality [2, 3, 4]:

 h(a1X1+a2X2)≥h(a1X∗1+a2X∗2) (1)

for any real numbers

and any independent real random variables

111For convenience all considered random variables have densities that are continuous and positive inside their support interval, with zero mean and differential entropies . , where are independent normal random variables having the same entropies as :

 h(X∗1)=h(X1)h(X∗2)=h(X2). (2)

Equality holds in (1) if and only if either or are normal. Recently, a normal transport argument was used in [5] to provide a simple proof of Shannon’s EPI, including the necessary and sufficient condition for equality.

Shannon’s EPI was generalized to a matrix version [6, 7]:

 h(AX)≥h(AX∗) (3)

for any matrix

and any random (column) vector

of independent components , where is a normal vector with independent components of the same entropies:

 h(X∗i)=h(Xi)(i=1,…,n). (4)

Available proofs of (3) are either by double induction on  [6] or by integration over a path of Gaussian perturbation of the corresponding inequality for Fisher’s information using de Bruijn’s identity [7] or via the I-MMSE relation [8]. A necessary and sufficient condition for equality in (3) has not been settled so far, however, by the previous methods. Such a condition is important in applications such as blind source separation (BSS) based on minimum entropy [9]. Also, BSS may involve real or complex signals [10] and minimum entropy methods for complex sources would require the extension of EPIs to complex-valued variables and coefficients.

In this paper, we adapt the proof of [5] to the matrix case and derive (3) with a normal transport argument. This allows us to easily settle the equality case: We define the notion of “recoverability” and show that equality holds in (3) if all unrecoverable components of present in are normal. We then extend the proofs to complex-valued  and . As an application, we derive the appropriate contrast functions for partial BSS (a.k.a. blind source extraction) where out of independent sources are to be extracted.

## Ii A simple proof of the matrix EPI by transport

We extend the proof in [5] to the matrix EPI, based on the same ingredients: (a) a transportation argument from normal variables, that takes the form of a simple change of variables; (b) a rotation performed on i.i.d. normal variables, which preserves the i.i.d. property; (c) concavity of the logarithm, appropriately generalized to the matrix case. The proof breaks into several elementary steps:

### Ii-a Reduce to full rank m<n

If the rank of is then some rows are linearly dependent, there is a deterministic relation between some components of and and equality holds trivially. Thus we can assume that is of full rank . If has rank then is invertible and by the change of variable formula in the entropy [1, § 20.9], where denotes the absolute value of the determinant of . Therefore, one may always assume that has full rank .

### Ii-B Reduce to equal individual entropies

Without loss of generality, one may assume that the components of have equal entropies. For if it were not the case, then by the scaling property of entropy[1, § 20.9], one can find non zero coefficients (e.g., ) such that all have equal entropies. Then applying (3) to and matrix where is a diagonal matrix with diagonal elements , gives the desired EPI.

Notice that with the additional constraint that the have equal entropies, we have : The independent zero-mean normal variables also have equal entropies, and are, therefore, independent and identically distributed (i.i.d.).

### Ii-C Reduce to orthonormal rows

Without loss of generality, one may assume that the rows of

are orthonormal. For if it were not the case, one can orthonormalize the rows by a Gram-Schmidt process. This amounts to multiplying

on the left by an lower-triangular invertible matrix

. Thus, one can apply (3) for matrix . Again by the change of variable in the entropy[1, § 20.9], and . The terms cancel to give the desired EPI. Thus we are led to prove (3) for an matrix with orthonormal rows (, the identity matrix).

### Ii-D Complete the orthogonal matrix

Extend by adding orthonormal rows of a complementary matrix such that is an orthogonal matrix, and define the Gaussian vector as

 (5)

Since the components of are i.i.d. normal and is orthogonal, the components of are also i.i.d. normal. In particular the subvectors and are independent. The inverse transformation is the transpose:

 (6)

### Ii-E Apply the normal transportation

###### Lemma 1 (Normal Transportation [5, 11])

Let be a scalar normal random variable. For any continuous density , there exists a differentiable transformation with positive derivative such that has density .

From Lemma 1, we can assume that the components of and are such that for all , where the ’s are transformations with positive derivatives . For ease of notation define

 T(X∗)=(T1(X∗1),T2(X∗2),…,Tn(X∗n))t (7)

Thus is a transformation whose Jacobian matrix is diagonal with positive diagonal elements:

 T′(X∗)=diag(T′1(X∗1),…,T′n(X∗n)). (8)

Now (3) can be written in terms of the normal variables only:

 h(AT(X∗))≥h(AX∗) (9)

and by (6) it can also be written in term of the tilde normal variables:

 h(AT(At˜X+A′t˜X′))≥h(˜X). (10)

### Ii-F Conditioning on the complementary variables

Since conditioning reduces entropy [1, § 20.4],

 h(AT(At˜X+A′t˜X′))≥h(AT(At˜X+A′t˜X′)∣˜X′). (11)

### Ii-G Make the change of variable

By the change of variable formula in the entropy [1, § 20.8], and, therefore, by (4),

 ElogT′j(X∗j)=0(j=1,2,…,n). (12)

By the change of variable formula (vector case) [1, § 20.8] in the conditional entropy in the r.h.s. of (11),

 h(A T(At˜X+A′t˜X′)∣˜X′) =h(˜X∣˜X′)+Elog|AT′(At˜X+A′t˜X′)At| (13) =h(˜X)+Elog|AT′(X∗)At| (14)

where we have used that and are independent.

### Ii-H Apply the concavity of the logarithm

The following lemma was stated in [7] as a consequence of (3). A direct proof was given in [8], and is simplified here.

###### Lemma 2

For any matrix with orthonormal rows and any diagonal matrix with positive diagonal elements ,

 log|AΛAt|≥tr(A[logΛ]At) (15)

where and denotes the trace.

Equality holds e.g. when the ’s are equal. The precise equality case will appear elsewhere.

###### Proof:

It is easily checked that is positive definite and that both sides of (15) do not change if we replace by where is any orthogonal matrix. Choose

as an orthogonal eigenvector matrix of

, so that is diagonal with positive diagonal elements and still has orthonormal rows.

Thus, substituting for we may always assume that is diagonal with diagonal entries equal to for , where denotes the entries of . Then

 log|AΛAt| =m∑i=1logn∑j=1A2ijλj (16) ≥m∑i=1n∑j=1A2ijlogλj (17) =tr(A[logΛ]At). (18)

where (17) follows from Jensen’s inequality and the concavity of the logarithm, since has orthonormal rows.

From Lemma 2 and (12) we obtain

 Elog|AT′(X∗)At| ≥Etr(A[logT′(X∗)]At) (19) =tr(AE[logT′(X∗)]At)=0. (20)

Combining this with (11)–(14) proves (10) and the desired matrix EPI (3).

## Iii The Equality Case

To settle the equality case in (3), from the remarks in § II-A we may already assume that has full rank .

###### Definition 1

A component of is

• present in if depends on ;

• recoverable from if there exists a row vector of length such that .

###### Remark 1

Without loss of generality we always omit the components that are not present in and their associated zero columns of without affecting the entropy .

###### Remark 2

Since the considered variables are not deterministic, Definition 1 depends only on the matrix : is present in if and only if the th column of is not zero; and is recoverable from if and only if there exists such that with in the th position. A recoverable component is necessarily present.

###### Remark 3

Definition 1 is also invariant by left multiplication of by any invertible matrix : if the th column of is zero, so is the th column of ; and implies .

The following property was used in [12, Appendix] for deriving a sufficient condition for equality in a matrix form of the Brunn–Minkowski inequality, which is the analog of the EPI for Rényi entropies of order zero [3].

###### Lemma 3

Reordering the components of if necessary so that the first components are recoverable and the last components are unrecoverable, we may always put in the canonical form

 A=([c|c]Ir00Au) (21)

where is an matrix. The number of recoverable components is the maximum number such that can be put in the form (21) by left multiplication by an invertible matrix.

###### Proof:

Write where has recoverable components and has unrecoverable ones. By Definition 1 (recoverability) there exists a matrix such that . Since must have rank , this shows in particular that : no more than components can be recovered from the linear mixtures. We can use additional row operations so that is of the desired form. Since is an invertible matrix, by the change of variable formula in the entropy [1, §20.9], . Therefore, the matrix EPI (3) is equivalent to the one obtained by substituting for . Clearly, is maximum in this expression since otherwise one could recover more than components, hence transfer some of the components from the block to the block.

We can now settle the equality case in (3).

###### Theorem 1

Equality holds in (3) if and only if all unrecoverable components present in are normal.

###### Proof:

Write as in the proof of Lemma 3 and accordingly write . If is in canonical form (21), then (3) reads

 h(Xr)+h(AuXu)≥h(X∗r)+h(AuX∗u). (22)

where . The announced condition is, therefore, sufficient: if is normal with (zero-mean) components satisfying (4), then is identically distributed as and .

Conversely, suppose that (3) is an equality with as in (21). From § II C, we may assume (applying row operations of a Gram-Schmidt process if necessary) that has orthonormal rows in (21), that is, . Then equality holds in (3) if and only if both (11) and (19) are equalities.

Consider equality in (19) which results from the application of Lemma 2 (inequality (15)) to . We have

 AΛAt=([c|c]Λr00AuΛuAtu) (23)

where and . Thus, we may choose in the proof of Lemma 2 in the form where is an orthogonal matrix such that is diagonal. Then is still of the form (21) where has orthonormal rows.

Therefore, equality in (15) is equivalent to equality in (17) where we may again assume that is of the form (21) where is maximal and has orthonormal rows. By Remark 1, we may assume that all columns of are nonzero. Notice that any row of in (21) should have at least two nonzero elements. Otherwise, there would be one row of of the form with the nonzero element in the th position. Since the rows are orthonormal, the other elements in the th column would necessarily equal zero, and the corresponding component of would be recoverable, which contradicts the maximality of .

Now since the logarithm is strictly concave, equality holds in (17) if and only if for all , all the for which are equal. Because no column of is zero and any row of in (21) has at least two nonzero elements, this implies that for any such that , is equal to another where , . Since Lemma 2 was applied to it follows that

 T′j(X∗j)=T′k(X∗k) a.e.(r

Because and are independent, this implies that both and are constant and equal a.e., hence for some constant222This is similar to what appeared in an earlier transportation proof of the EPI [5]. By (12), we necessarily have if we assume that all individual entropies are equal as in § II-B. . Therefore is linear and is normal for all . This completes the proof.333This implies, in particular, that equality in (19) implies equality in (11). This can also be seen directly: if for all , then for of the form (21) in (11), is independent of .

## Iv Extension to Complex Matrix and Variables

A complex random variable can always be viewed as a two-dimensional real random vector . Therefore, by the vector form of the EPI [2, 3, 4], (1) holds for scalar coefficients when are independent complex random vectors and are independent white normal random vectors satisfying (2). Here “white normal” amounts to say that is proper normal or circularly symmetric normal [13] (c-normal in short): , that is, .

That (1) also holds for complex coefficients is less known but straightforward. To see this, define444There is an ambiguity of notation easily resolved from the context: is a matrix when is a constant and is a vector when is random. for any , so that . Then . Hence (2) implies and . In addition, if then . Therefore, by the vector EPI applied to and we see that (1) holds for complex coefficients when are independent c-normal variables satisfying (2).

The extension of the matrix EPI (3) to complex and is more involved. We need the following notions (see, e.g., [14] and [15, chap. 10]). Define by stacking the for each component of , and define as the real matrix with entries where are the complex entries of . It is easily checked that , , where is the conjugate transpose, and where denotes the modulus of the determinant of .

We also need the following extension of Lemma 1:

###### Lemma 4 (2D Brenier Map[16, 17])

Let be a (white) normal random vector. For any given continuous density  over , there exists a differentiable transformation with symmetric positive definite Jacobian (noted ) such that has density .

Courtade et al. [18] noted that the Brenier map can be used in the transportation proof of [5] to prove Shannon’s vector EPI. We find it also convenient to prove the complex matrix EPI:

###### Theorem 2

The matrix EPI (3) holds for any complex matrix and any random vector of independent complex components , where is a c-normal vector with independent components satisfying (4). If equality holds in (3) then all unrecoverable components present in (in the sense of Definition 1) are normal.

The exact necessary and sufficient condition for equality is more involved and will appear elsewhere.

###### Proof:

We sketch the proof by going through the above proofs in Sections II and III and pointing out the differences:

§II-A: The scaling property of entropy now reads .

§II-B: Since for , independent with equal entropies are i.i.d.

§II-C: The Gram-Schmidt orthonormalization takes place in with .

§II-D: is now an unitary matrix. Recall that a circularly symmetric is such that for any . Since is i.i.d., is also i.i.d. and the inverse transformation is the conjugate transpose .

§II-E: Lemma 4 replaces Lemma 1 and (8) becomes

 T′(ˆX∗)=diag(T′1(ˆX∗1),…,T′n(ˆX∗n)) (25)

in block-diagonal form where each block is symmetric positive definite.

§II-G: In terms of the hat variables:

 Elog|T′j(ˆX∗j)|=0(j=1,2,…,n). (26)

where denotes the absolute value of the determinant, and

 h(ˆA T(ˆAtˆ˜X+ˆA′tˆ˜X′)∣ˆ˜X′) =h(˜X)+Elog|ˆAT′(ˆX∗)ˆAt| (27)

§II-H: We show that Lemma 2 still holds when is block-diagonal with diagonal blocks (symmetric positive definite). Write

 λj=ˆujdjˆujt (28)

where is diagonal with positive diagonal elements and is a rotation matrix, corresponding to a complex unit . Then the block-diagonal is orthonormal and is diagonal. We can now apply Lemma 2 to and :

 log|ˆAΛˆAt| ≥tr(ˆAˆU[logD]ˆUtˆAt) (29) =tr(ˆA[logΛ]ˆAt) (30)

where is the (block diagonal) logarithm of . Thus

 tr(ˆA[logΛ]ˆAt) =∑itr(∑jˆAi,j[logλj]ˆAi,jt) (31) =∑i∑j|Ai,j|2tr(logλj) (32)

where since is symmetric positive definite. Thus we obtain

 to0.0pt$E$log|ˆAT′(X∗)ˆAt|≥∑i∑j|Ai,j|2Elog|T′j(ˆX∗j)|=0 (33)

which is the final step to prove the (complex) matrix EPI (3).

Assume that equality holds in (3) as in the converse part of the proof of Theorem 1 (Section III). That proof is unchanged up to the point where one considers the equality condition in Lemma 2 applied to and diagonal , that is, in (29). By the strict concavity of the logarithm, equality holds in (29) if and only if for any two nonzero elements in the same row of , the corresponding two diagonal elements of are equal. Since , the nonzero elements of are at the same places as those of , where is of the form (21). Therefore, due to the structure of , for any such that , the two diagonal elements of are equal to the two diagonal elements of another where , , which implies . This gives (24) from which one concludes as before that for all , is linear, and, therefore, is normal.

## V Application to Blind Source Extraction

The theoretical setting of the blind source extraction problem is as follows [9]. We are given (zero-mean) independent (real or complex) “sources” which are mixed using an invertible (real or complex) matrix , resulting in the observation . The covariance matrix of

can be estimated but both

and are unknown. Since one can introduce arbitrary scaling factors in and for the same observation , we can assume an arbitrary normalization of the sources. For convenience we assume here that they have the same entropies:

 h(X1)=h(X2)=⋯=h(Xn). (34)

Blind source extraction (or partial BSS) of sources () aims at finding a (full rank) matrix such that is composed of (out of ) original sources, up to order and scaling. In other words should have exactly one nonzero element per row.

###### Definition 2 (Contrast function [9])

A contrast is a function that is invariant to permutation and scaling of the rows of , and such that it achieves a minimum if only if has one nonzero element per row.

###### Theorem 3

Assume that at most one source is normal. Then

 C(W)=m∑i=1h(wiY)−12log|WKYWt| (35)

where are the rows of , is a contrast function.

Such a contrast function was first proposed by Pham [19] (see also [20]) in the real case with a different proof that uses the classical EPI for and Hadamard’s inequality. It is particularly interesting to rewrite it in terms of the matrix EPI:

###### Proof:

The real and complex cases being similar, we prove the result in the real case. Let and let be as in (3). For i.i.d. components we can rewrite [6, Eq. (13)] as where is the common value of (34). Since , up to an additive constant we may decompose as

 C(W)=Ch(W)+Ci(W)+Cst.−1ex (36)

where

 Ch(W) =h(AX)−h(AX∗)≥0 (37) Ci(W) =∑ih(Zi)−h(Z)≥0 (38)

The term is minimum if and only if the components of are independent.

The is minimum if and only if equality holds in the matrix EPI (3). Since at most one source is normal, at most one source present in can be unrecoverable. But if one (normal) source is not recoverable, the canonical form (21) implies that at most one column of is nonzero, which contradicts the maximality of in Lemma 3. Therefore, and the canonical form of becomes .

With the additional constraint that components of are independent, it follows from the Darmois–Skitovich theorem [21] (see [14] in the complex case) that has exactly one nonzero per row.

Interestingly, the contrast function in the form (36) represents a transition between the two well-known extreme cases:

• , for which where each source is extracted one by one using the classical EPI (minimize );

• , for which , where all

sources are separated simultaneously; we are then reduced to an independent component analysis (ICA) problem

[21, 14] in which the multivariate “mutual information”  is minimized.

## References

• [1] C. E. Shannon, “A mathematical theory of communication,” Bell Syst. Tech. J., vol. 27, pp. 623–656, Oct. 1948.
• [2] E. H. Lieb, “Proof of an entropy conjecture of Wehrl,” Commun. Math. Phys., vol. 62, pp. 35–41, 1978.
• [3] A. Dembo, T. M. Cover, and J. A. Thomas, “Information theoretic inequalities,” IEEE Trans. Inf. Theory, vol. 37, no. 6, pp. 1501–1518, Nov. 1991.
• [4] O. Rioul, “Information theoretic proofs of entropy power inequalities,” IEEE Trans. Inf. Theory, vol. 57, no. 1, pp. 33–55, Jan. 2011.
• [5] ——, “Yet another proof of the entropy power inequality,” IEEE Trans. Inf. Theory, vol. 63, no. 6, pp. 3595–3599, Jun. 2017.
• [6] R. Zamir and M. Feder, “A generalization of the entropy power inequality with applications,” IEEE Trans. Inf. Theory, vol. 39, no. 5, pp. 1723–1728, Sep. 1993.
• [7]

——, “A generalization of information theoretic inequalities to linear transformations of independent vector,” in

Proc. Sixth Joint Swedish-Russian International Workshop on Information Theory, Mölle, Sweden, Aug. 1993, pp. 254–258.
• [8] D. Guo, S. Shamai (Shitz), and S. Verdú, “Proof of entropy power inequalities via MMSE,” in Proc. IEEE Int. Symp. Information Theory, Seattle, USA, Jul. 2006, pp. 1011–1015.
• [9] F. Vrins, Contrast properties of entropic criteria for blind source separation: A unifying framework based on information-theoretic inequalities.   Louvain University Press (UCL), Mar. 2007.
• [10] J.-F. Cardoso, “An efficient technique for the blind separation of complex sources,” in Proc. IEEE Signal Proc. Workshop on Higher-Order Statistics, South Lake Tahoe, CA, Jun. 1993, pp. 275–279.
• [11] O. Rioul, “Optimal transportation to the entropy-power inequality,” in IEEE Information Theory and Applications Workshop (ITA 2017), San Diego, USA, Feb. 2017.
• [12] R. Zamir and M. Feder, “On the volume of the Minkowski sum of line sets and the entropy-power inequality,” IEEE Trans. Inf. Theory, vol. 44, no. 7, pp. 3039–3043, Nov. 1998.
• [13] B. Picinbono, “On circularity,” IEEE Transactions on Signal Processing, vol. 42, no. 12, pp. 3473–3482, Dec. 1994.
• [14] J. Eriksson and V. Koivunen, “Complex random vectors and ICA models: identifiability, uniqueness, and separability,” IEEE Trans. Inf. Theory, vol. 52, no. 3, pp. 1017–1029, Mar. 2006.
• [15] O. Rioul, Théorie des probabilités [in French].   London, UK: Hermes Science - Lavoisier, 2008.
• [16] Y. Brenier, “Polar factorization and monotone rearrangement of vector-valued functions,” Commun. Pure Appl. Math., vol. 44, no. 4, pp. 375–417, Jun. 1991.
• [17] R. J. McCann, “Existence and uniqueness of monotone measure-preserving maps,” Duke Math. J., vol. 80, no. 2, pp. 309–324, 1995.
• [18] T. A. Courtade, M. Fathi, and A. Pananjady, “Quantitative stability of the entropy power inequality,” IEEE Trans. Inf. Theory, vol. 64, no. 8, pp. 5691–5703, Aug. 2018.
• [19] D.-T. Pham, “Blind partial separation of instantaneous mixtures of sources,” in Proc. 6th International Conference on Independent Component Analysis (ICA).   Charleston, SC, USA: Springer, March 5–8 2006, pp. 37–42.
• [20] S. Cruces, A. Cichocki, and S. Amari, “The minimum entropy and cumulants based contrast functions for blind source extraction,” in

6th International Work-Conference on Artificial Neural Networks (IWANN)

.   Granada, Spain: Springer, 2001, pp. 786–793.
• [21] P. Comon, “Independent component analysis, a new concept?” Signal Processing, vol. 36, pp. 287–314, 1994.