# Characterization of Conditional Independence and Weak Realizations of Multivariate Gaussian Random Variables: Applications to Networks

The Gray and Wyner lossy source coding for a simple network for sources that generate a tuple of jointly Gaussian random variables (RVs) X_1 : Ω→R^p_1 and X_2 : Ω→R^p_2, with respect to square-error distortion at the two decoders is re-examined using (1) Hotelling's geometric approach of Gaussian RVs-the canonical variable form, and (2) van Putten's and van Schuppen's parametrization of joint distributions P_X_1, X_2, W by Gaussian RVs W : Ω→R^n which make (X_1,X_2) conditionally independent, and the weak stochastic realization of (X_1, X_2). Item (2) is used to parametrize the lossy rate region of the Gray and Wyner source coding problem for joint decoding with mean-square error distortions E{||X_i-X̂_i||_R^p_i^2 }≤Δ_i ∈ [0,∞], i=1,2, by the covariance matrix of RV W. From this then follows Wyner's common information C_W(X_1,X_2) (information definition) is achieved by W with identity covariance matrix, while a formula for Wyner's lossy common information (operational definition) is derived, given by C_WL(X_1,X_2)=C_W(X_1,X_2) = 1/2∑_j=1^n ln( 1+d_j/1-d_j), for the distortion region 0≤Δ_1 ≤∑_j=1^n(1-d_j), 0≤Δ_2 ≤∑_j=1^n(1-d_j), and where 1 > d_1 ≥ d_2 ≥...≥ d_n>0 in (0,1) are the canonical correlation coefficients computed from the canonical variable form of the tuple (X_1, X_2). The methods are of fundamental importance to other problems of multi-user communication, where conditional independence is imposed as a constraint.

## Authors

• 12 publications
• 2 publications
• ### A New Approach to Lossy Network Compression of a Tuple of Correlated Multivariate Gaussian RVs

The classical Gray and Wyner source coding for a simple network for sour...
05/29/2019 ∙ by Charalambos D. Charalambous, et al. ∙ 0

• ### Relaxed Wyner's Common Information

A natural relaxation of Wyner's Common Information is studied. Specifica...
12/15/2019 ∙ by Erixhen Sula, et al. ∙ 0

• ### Structural Properties of Optimal Test Channels for Distributed Source Coding with Decoder Side Information for Multivariate Gaussian Sources with Square-Error Fidelity

This paper focuses on the structural properties of test channels, of Wyn...
11/22/2020 ∙ by Michail Gkagkos, et al. ∙ 0

• ### Wyner's Common Information: Generalizations and A New Lossy Source Coding Interpretation

Wyner's common information was originally defined for a pair of dependen...
01/10/2013 ∙ by Ge Xu, et al. ∙ 0

• ### Joint Rate Distortion Function of a Tuple of Correlated Multivariate Gaussian Sources with Individual Fidelity Criteria

In this paper we analyze the joint rate distortion function (RDF), for a...
02/14/2021 ∙ by Evagoras Stylianou, et al. ∙ 0

• ### Joint Nonanticipative Rate Distortion Function for a Tuple of Random Processes with Individual Fidelity Criteria

The joint nonanticipative rate distortion function (NRDF) for a tuple of...
03/29/2021 ∙ by Charalambos D. Charalambous, et al. ∙ 0

• ### Degenerate Gaussian factors for probabilistic inference

In this paper, we propose a parametrised factor that enables inference o...
04/30/2021 ∙ by J. C. Schoeman, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction, Main Concepts, Literature, Main Results

In information theory and communications an important class of theoretical and practical problems is of a multi-user nature, such as, lossless and lossy network source coding for data compression over noiseless channels, network channel coding for data transmission over noisy channels [1], and secure communication [2]

. A sub-class of network source coding problems deals with two sources that generate at each time instant, symbols that are stationary memoryless, multivariate, and jointly Gaussian distributed, and similarly for network channel coding problems, i.e., Gaussian multiple access channels (MAC) with two or more multivariate correlated sources and a multivariate output.

In this paper we show the relevance of

three fundamental concepts of statistics and probability

to the network problems discussed above found in the report by Charalambous and van Schuppen [3] that involve a tuple of multivariate jointly independent and identically distributed multivariate Gaussian random variables (RVs) ,

 X1,i:Ω→Rp1=X1,  X2,i:Ω→Rp2=X2,  ∀i, (1) PX1,iX2,i=PX1,X2%jointlyGaussianand (X1,i,X2,i) indep. of (X1,j,X2,j),∀i≠j (2)

We illustrate their application to the calculation of rates that lie in the Gray and Wyner rate region [4] of the simple network shown in Fig. 1, with respect to the average square-error distortions at the two decoders

 E{DXi(XNi,^XNi)}≤Δi,Δi∈[0,∞],i=1,2, (3) DXi(xNi,^xNi)△=1NN∑j=1||xi,j−^xi,j||2Rpi,i=1,2, (4)

and where are Euclidean distances on .
The rest of this section and the remaining of the paper is organized as follows.
In Section I-A we introduced the three concepts which are further described in Charalambous and van Schuppen [3], in Sections I-B-I-C we recall the Gray and Wyner characterization of the rate region [4], and the characterization of the minimum lossy common message rate on the Gray and Wyner rate region due to Viswanatha, Akyol and Rose [5], and Xu, Liu, and Chen [6]. In Section II we present our main results in the form of theorems. In Section III we give the proofs of the main theorems, while citing [3] if necessary.

### I-a Three Concepts of Statistics and Probability

Notation. An -valued Gaussian RV, denoted by , with as parameters the mean value and the variance , , is a function

which is a RV and such that the measure of this RV equals a Gaussian measure described by its characteristic function. This definition includes

.
The effective dimension of the RV is denoted by . An identity matrix is denoted by .
A tuple of Gaussian RVs will be denoted this way to save space, rather than by

 (X1X2).

Then the variance matrix of this tuple is denoted by

 (X1,X2)∈G(0,Q(X1,X2)), Q(X1,X2)=(QX1QX1,X2QTX1,X2QX2)∈R(p1+p2)×(p1+p2).

The variance is distinguished from .

The first concept is Hotelling’s [7] geometric approach to Gaussian RVs [8, 9], where the underlying geometric object of a Gaussian RV is the algebra generated by . A basis transformation of such a RV is then the transformation defined by a non-singular matrix , and it then directly follows that . For the tuple of jointly Gaussian multivariate RVs , a basis transformation of this tuple consists of a matrix composed of two square and non-singular matrices, (see [3, Algorithm 2.10]),

 S△={\rm Block-diag}(S1,S2),Xc1△=S1X,Xc2△=S2X2, (5) FX1=FS1X1,  FX2=FS2X2. (6)

maps into the so-called canonical form of the tuple of RVs (the full specification is given in [3, Section 2.2, Definition 2.2]), which identifies identical, correlated, and private information, as interpreted in the table below,

Xc11=Xc21−{\rma.s.} identical information of Xc1 and Xc2 correlated information of Xc1 w.r.t Xc2 private information of Xc1 w.r.t Xc2 identical information of Xc1 and Xc2 correlated information of Xc2 w.r.t Xc1 private information of Xc2 w.r.t Xc1

where

 Xcij:Ω→Rpij, i=1,2, j=1,2,3, (7) p11=p21,p12=p22=n, (8) p1=p11+p12+p13,p2=p21+p22+p23, (9) S1X1=(Xc11,Xc12,Xc13),S2X2=(Xc21,Xc22,Xc23), (10) Xc11=Xc21−{\rma.s.},Xc11,Xc21∈G(0,Ip11), (11) Xc13∈G(0,Ip13) and Xc23∈G(0,Ip23) are independent (12) Xc12∈G(0,Ip12) and Xc22∈G(0,Ip22) are correlated, (13) E[Xc12(Xc22)T]=D=Diag(d1,…,dp12),di∈(0,1)∀i. (14)

The entries of are called the canonical correlation coefficients. For the term identical information

is used. The linear transformation

is equivalent to a pre-processing of by a linear pre-encoder (see [3] for applications to network problems).

The expression of mutual information between and , denoted by , as a function of the canonical correlation coefficients, discussed in [10] is given in Theorem II.1.

The second concept is van Putten’s and van Schuppen’s [11]

parametrization of the family of all jointly Gaussian probability distributions

by an auxiliary Gaussian RV that makes and conditional independent, defined by

 PCIG△={PX1,X2,W∣∣  PX1,X2|W=PX1|WPX2|W, the X1×X2−marginal dist. of PX1,X2,W is the fixed dist. PX1,X2, and (X1,X2,W) is jointly % Gaussian} (15)

and its subset of the set , with the additional constraint that the dimension of the RV is minimal while all other conditions hold. The parametrizaion is in terms of a set of matrices. Consequences are found in [3, Section 2.3].

The third concept is the weak stochastic realization of RVs that induces distributions in the sets and (see [11, Def. 2.17 and Prop. 2.18] and [3, Def. 2.17 and Prop. 2.18]).

Theorem II.2 (our main theorem) gives as a special case (part (d)) an achievable lower bound on Wyner’s single letter information theoretic characterization of common information:

 CW(X1,X2)△=infPX1,X2,W:PX1,X2|W=PX1|WPX2|WI(X1,X2;W) (16)

and the weak stochastic realization of RVs that induce distributions in the sets and .

### I-B The Gray and Wyner Lossy Rate Region

Now, we describe our results with respect to the fundamental question posed by Gray and Wyner [4] for the simple network shown in Fig. 1, which is: determine which channel capacitity triples are necessary and sufficient for each sequence to be reliably reproduced at the intended decoders, while satisfying the average distortions with respect to single letter distortion functions . Gray and Wyner characterized the operational rate region, denoted by by a coding scheme that uses the auxiliary RV , as described below. Define the family of probability distributions

 P≜ {PX1,X2,W,  x1∈X1, x2∈X2, w∈W ∣∣PX1,X2,W(x1,x2,∞)=PX1,X2}

for some auxiliary random variable .

Theorem 8 in [4]: Let denote the Gray and Wyner rate region. Suppose there exists such that , . For each and , define the subset of Euclidean D space

 RPX1,X2,WGW (Δ1,Δ2)={(R0,R1,R2):  R0≥I(X1,X2;W), R1≥RX1|W(Δ1),  R2≥RX2|W(Δ2)} (17)

where is rate distortion function (RDF) of , conditioned on , at decoder , , and is the joint RDF of joint decoding of . Let

 R∗GW(Δ1,Δ2)△=(⋃PX1,X2,W∈PRPX1,X2,WGW(Δ1,Δ2))c (18)

where denotes the closure of the indicated set. Then the achievable Gray-Wyner lossy rate region is given by

 RGW(Δ1,Δ2)=R∗GW(Δ1,Δ2). (19)

By [4, Theorem 6] if , then

 R0+R1+R2≥RX1,X2(Δ1,Δ2), (20) R0+R1≥RX1(Δ1),R0+R2≥RX2(Δ2) (21)

(20) is called the Pangloss Bound of , and the set of triples that satisfy the Pangloss Plane.

Theorem II.2 is our main theorem for set up (1)-(4). From this theorem follows Proposition II.3 that parametrizes the region by a Gaussian RV , and the weak stochastic realization of the joint distribution of .

### I-C Wyner’s Lossy Common Information

Viswanatha, Akyol, and Rose [5], and Xu, Liu, and Chen [6], characterized the minimum lossy common message rate on the rate region , as follows.

Theorem 4 in [6]: Let denote the minimum common message rate on the Gray and Wyner lossy rate region , with sum rate not exceeding the joint rate distortion function .
Then is characterized by

 CGW(X1,X2;Δ1,Δ2)△=infI(X1,X2;W) (22)

such that the following identity holds

 RX1|W(Δ1)+RX2|W(Δ2)+I(X1,X2;W)=RX1,X2(Δ1,Δ2) (23)

where the infimum is over all RVs in , which parametrize the source distribution via , having a marginal source distribution , and induce joint distributions which satisfy the constraint.

is also given the interpretation of Wyner’s lossy common information, due to its operational meaning [5, 6]. We should mention that from Appendix B in [6] it follows that a necessary condition for the equality constraint (23) is , and sufficient condition for this equality to hold is the conditional independence condition [6]: . Hence, a sufficient condition for any rate to lie on the Pangloss plane, i.e., to satisfy (23) is the conditional independence.

It is shown in [5, 6], that there exists a distortion region such that , i.e., it is independent of the distortions , i.e. it equals the Wyner’s information theoretic characterization of common information defined by (16).

From Theorem II.2 follows Theorem II.4 that gives the closed form expression of and identifies the region , for the multivariate Gaussian RVs with respect to the avarage distortions (1)-(4).

## Ii Main Results

Given the tuple of multivariate Gaussian RVs and distortion functions (1)-(4), the main contributions of the paper are:
(1) the theorem and the proof of Wyner’s common information (information definition). The existing proof of this result in [12] is incomplete (see discussion below Theorem II.2).

(2) Paremetrization of rate triples , and Wyner’s lossy common information.

Below we state the expression of mutual information as a function of the canonical correlation coefficients, discussed in Gelfand and Yaglom [10].

###### Theorem II.1

Consider a tuple of multivariable jointly Gaussian RVs , . Compute the canonical variable form of the tuple of Gaussian RVs according to Algorithm 2.2 of [3]. This yields the indices , , , , and and the diagonal matrix with canonical correlation coefficients for (as in [3, Definition 2.2]).
Then mutual information is given by the formula,

 I(X1;X2)= ⎧⎪ ⎪⎨⎪ ⎪⎩0,0=p11=p12,−12∑ni=1ln(1−d2i),0=p11,p12>0,∞,p11>0

where are the canonical correlation coefficients.

is a generalization of the well-known formula of a tuple of scalar RVs, i.e., , , where is the correlation coefficient.

The case gives ; if such components are present they should be removed. Hence, we state the next theorem under the restriction .

###### Theorem II.2

Consider a tuple of multivariable jointly Gaussian RVs , and without loss of generality assume produces a canonical variable form such that (see [3, Definition 2.2]).
For any joint distrubution parametrized by an arbitrary RV with fixed marginal distribution the following hold.
(a) The mutual information satisfies

 I( X1,X2;W)=I(Xc12,Xc22;W),p12=p22=n (24) ≥ H(Xc12,Xc22)−H(Xc12|W)−H(Xc22|W), (25) = 12n∑i=1ln(1−d2i) −12ln(det([I−D1/2Q−1WD1/2][I−D1/2QWD1/2])) (26)

where the lower bound is parametrized by ,

 QW={QW∈Rn×n|QW=QTW, 0

and such that is jointly Gaussian.
(b) The lower bound in (25) is achieved if is jointly Gaussian and , and a realization of the RVs which achieves the lower bound is

 Xc12= QXc12,WQ−1WW+Z1, (28) QXc12,W= D1/2,  Z1∈G(0,(I−D1/2Q−1WD1/2)), (29) Xc22= QXc22,WQ−1WW+Z2 (30) QXc22,W= D1/2QW,  Z2∈G(0,(I−D1/2QWD1/2)), (31) (Z1,Z2,W), are independent. (32)

(c) A lower bound on (26) occurs if is diagonal, i.e., , and it is achieved by realization (28)-(32), with .
(d) Wyner’s information common information is given by

 CW(X1,X2)={12∑ni=1ln(1+di1−di)∈(0,∞)if n>00if n=0 (33)

and it is achieved by a Gaussian RV , an identity covariance matrix, and the realization of part (b) with .

The characterization of the subset of the set of two RVs in canonical variable form by the set is due to Van Putten and Van Schuppen [11].

In [12] the proof of (33) is incomplete because there is no optimization over the set of measures achieving the conditional independence. In that reference there is an assumption that three cross-covariances can be simultaneously diagonalized. which is not true in general. This assumption implies that case (d) of the above theorem holds. This assumption is repeated in [13].

From Theorem II.2 follows directly the proposition below.

###### Proposition II.3

Consider the statement of Theorem II.2, with in canonical variable form. Then is determined from

 T(α1,α2)= infQW{I(X1,X2;W)+α1RX1|W(d1)+α2RX2|W(d2)}

, and the infimum occurs at the diagonal of Theorem II.2, part (c). Moreover, is given by

 RXi|W(Δi)= inf∑nj=1Δi,j=Δi12n∑j=1log((1−dj/Q∗Wj)Δi,j)+ (34)

where , , and the water-filling equations hold:

 Δi,j={λ,λ<1−dj1−dj,λ≥1−dj,Δi∈(0,∞),i=1,2. (35)

Proof Follows from Gray and Wyner [4, (4) of page 1703, eqn(42)] and Theorem II.2. (34) follows from RDF of Gaussian RVs.

###### Theorem II.4

Consider the tuple of jointly Gaussian RVs of Theorem II.2. Then

 CGW(X1, X2;Δ1,Δ2)=CW(X1,X2) (36) = 12n∑j=1ln(1+dj1−dj),  (Δ1,Δ2)∈DW (37) DW△= 0≤Δ2≤n∑j=1(1−dj)},dj∈(0,1),j=1,…,n.

Formula (37) is a generalization of the analogous formula derived in [4, 5, 6], for a tuple of jointly Gaussian scalar RVs , zero mean, , .

## Iii Proofs of main Theorems

We present in this section additional exposition on the Concepts of Section I-A, and outlines of the proofs of the main theorems (see [3] for additional exposition).

### Iii-a Further Discussion on the Three Conecpts

First we state a few facts.
(A1) The parametrization of the family of Gaussian probability distributions and require the solution of the weak stochastic realization problem of Gaussian RVs (defined by Problem 2.15 in [3]) given in [14, Theorem 4.2] (see also [3, Theorem 3.8]), and reproduced below.

###### Theorem III.1

[14, Theorem 4.2] Consider a tuple of Gaussian RVs in the canonical variable form. Restrict attention to the correlated parts of these RVs, as follows:

 (X1,X2)∈G(0,Q(X1,X2))=P0,  X1,X2:Ω→Rn, (38) Q(x1,x2)=(IDDI),p11=p21=0,p13=p23=0, (39) D=Diag(d1,…,dn)∈Rn×n, 1>d1≥…≥dn>0. (40)
• There exists a probability measure , and a triple of Gaussian RVs defined on it, such that (i) and (ii) and are conditional independent given with having minimal dimension.

• There exist a family of Gaussian measures denoted by , that satisfy (i) and (ii) of (a), and moreover this family is parametrized by the matrices and sets:

 G(0,Qs(QW)), QW∈QW, (41) Qs(QW)=⎛⎜⎝IDD1/2DID1/2QWD1/2QWD1/2QW⎞⎟⎠, (42) (43) Pci={G(0,Qs(QW)) on (R3n,B(R3n))∣∣QW∈QW}

and .

(A2) The weak stochastic realization of a Gaussian measure on the Borel space is then defined and characterized as in Def. 2.17 and Prop. 2.18, Alg. 3.4 of [3].

### Iii-B Proofs of Main Theorems

(B) For the calculatation of via Theorem II.2 and via Theorem II.4 it is sufficient to impose the conditional independence , due to,
(a) the well-known inequality

 I(X1,X2;W)≥H(X1,X2)−H(X1|W)−H(X2|W) (44)

which is achieved if .

(b) A necessary condition for the equality constraint (23) to hold (see Appendix B in [6]) is

 RX1,X2|W(Δ1,Δ2)=RX1|W(Δ1)+RX2|W(Δ2).