Vector Gaussian CEO Problem Under Logarithmic Loss

02/25/2019
by   Yigit Ugur, et al.
HUAWEI Technologies Co., Ltd.
0

In this paper, we study the vector Gaussian Chief Executive Officer (CEO) problem under logarithmic loss distortion measure. Specifically, K ≥ 2 agents observe independently corrupted Gaussian noisy versions of a remote vector Gaussian source, and communicate independently with a decoder or CEO over rate-constrained noise-free links. The CEO wants to reconstruct the remote source to within some prescribed distortion level where the incurred distortion is measured under the logarithmic loss penalty criterion. We find an explicit characterization of the rate-distortion region of this model. For the proof of this result, we obtain an outer bound on the region of the vector Gaussian CEO problem by means of a technique that relies on the de Bruijn identity and the properties of Fisher information. The approach is similar to Ekrem-Ulukus outer bounding technique for the vector Gaussian CEO problem under quadratic distortion measure, for which it was there found generally non-tight; but it is shown here to yield a complete characterization of the region for the case of logarithmic loss measure. Also, we show that Gaussian test channels with time-sharing exhaust the Berger-Tung inner bound, which is optimal. Furthermore, we also show that the established result under logarithmic loss provides an outer bound for a quadratic vector Gaussian CEO problem with determinant constraint, for which we characterize the optimal rate-distortion region.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

11/09/2018

Vector Gaussian CEO Problem Under Logarithmic Loss and Applications

We study the vector Gaussian CEO problem under logarithmic loss distorti...
02/18/2020

Vector Gaussian Successive Refinement With Degraded Side Information

We investigate the problem of the successive refinement for Wyner-Ziv co...
12/03/2018

The CEO Problem with rth Power of Difference and Logarithmic Distortions

The CEO problem has received a lot of attention since Berger et al. firs...
07/12/2021

Rate-Exponent Region for a Class of Distributed Hypothesis Testing Against Conditional Independence Problems

We study a class of K-encoder hypothesis testing against conditional ind...
04/10/2019

On the Scalar-Help-Vector Source Coding Problem

In this paper, we consider a scalar-help-vector problem for L+1 correlat...
02/15/2021

Scalable Vector Gaussian Information Bottleneck

In the context of statistical learning, the Information Bottleneck metho...
10/26/2018

Information Bottleneck Methods for Distributed Learning

We study a distributed learning problem in which Alice sends a compresse...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Consider the vector Gaussian Chief Executive Officer (CEO) problem shown in Figure 1. In this model, there is an arbitrary number of agents each having a noisy observation of a vector Gaussian source . The goal of the agents is to describe the source to a central unit, which wants to reconstruct this source to within a prescribed distortion level. The incurred distortion is measured according to some loss measure , where designates the reconstruction alphabet. For quadratic distortion measure, i.e.,

the rate-distortion region of the vector Gaussian CEO problem is still unknown in general, except in few special cases the most important of which is perhaps the case of scalar sources, i.e., scalar Gaussian CEO problem, for which a complete solution, in terms of characterization of the the optimal rate-distortion region, was found independently by Oohama in [1] and by Prabhakaran et al. in [2]. Key to establishing this result is a judicious application of the entropy power inequality. The extension of this argument to the case of vector Gaussian sources, however, is not straightforward as the entropy power inequality is known to be non-tight in this setting. The reader may refer also to [3, 4] where non-tight outer bounds on the rate-distortion region of the vector Gaussian CEO problem under quadratic distortion measure are obtained by establishing some extremal inequalities that are similar to Liu-Viswanath [5], and to [6] where a strengthened extremal inequality yields a complete characterization of the region of the vector Gaussian CEO problem in the special case of trace distortion constraint.

Fig. 1: Chief Executive Officer (CEO) source coding problem.

In this paper, we study the CEO problem of Figure 1 in the case in which is jointly Gaussian and the distortion is measured using the logarithmic loss criterion, i.e.,

(1)

with the letter-wise distortion given by

(2)

where

designates a probability distribution on

, and is the value of this distribution evaluated for the outcome .

The logarithmic loss distortion measure, often referred to as self-information loss

in the literature about prediction, plays a central role in settings in which reconstructions are allowed to be ‘soft’, rather than ‘hard’ or deterministic. That is, rather than just assigning a deterministic value to each sample of the source, the decoder also gives an assessment of the degree of confidence or reliability on each estimate, in the form of weights or probabilities. This measure, which was introduced in the context of rate-distortion theory by Courtade

et al. [7, 8], has appreciable mathematical properties [9, 10], such as a deep connection to lossless coding for which fundamental limits are well developed (e.g., see [11] for recent results on universal lossy compression under logarithmic loss that are built on this connection). Also, it is widely used as a penalty criterion in various contexts, including clustering and classification [12]

, pattern recognition, learning and prediction 

[13], image processing [14], secrecy [15] and others.

The main contribution of this paper is a complete characterization of the rate-distortion region of the vector Gaussian CEO problem of Figure 1 under logarithmic loss distortion measure. The result can be seen as the counterpart, to the vector Gaussian case, of that by Courtade and Weissman [8, Theorem 3] who established the rate-distortion region of the CEO problem under logarithmic loss in the discrete memoryless (DM) case. For the proof of this result, we derive an outer bound on the rate-distortion region of the vector Gaussian CEO problem by evaluating the outer bound from the DM model using the de Bruijn identity, a connection between differential entropy and Fisher information, along with the properties of minimum mean square error (MMSE) and Fisher information. By opposition to the case of quadratic distortion measure, for which the application of this technique was shown in [16] to result in an outer bound that is generally non-tight, we show that this approach is successful in the case of logarithmic distortion measure and yields a complete characterization of the region. The proof of the achievability part simply corresponds to the evaluation of the result for the DM model using Gaussian test channels and no time-sharing. While this does not imply that Gaussian test channels also exhaust the Berger-Tung inner bound for this model, we show that they do but might generally require time-sharing. Furthermore, we also show that the established result under logarithmic loss provides an outer bound for a quadratic vector Gaussian CEO problem with determinant constraint, for which we characterize the optimal rate-distortion region (see [17, 18] for examples of usage of this determinant constraint in the context of equalization).

In the case of one agent, i.e., the remote vector Gaussian Wyner-Ziv model under logarithmic loss, the model was resolved in [19]; and, so, our result here generalizes that of [19] to the case of an arbitrarily number of agents. Related to this aspect, it is also worth mentioning that the orthogonal transform technique which was used in [19] to reduce the vector setting to one of parallel scalar Gaussian settings seems insufficient to diagonalize all the noise covariance matrices simultaneously in the case of more than one agent.

Notation:

Throughout, we use the following notation. Upper case letters denote random variables, e.g.,

; lower case letters denote realizations of random variables, e.g., ; and calligraphic letters denote sets, e.g., . The cardinality of a set is denoted by . A length- sequence is denoted as . Boldface upper case letters denote vectors or matrices, e.g., , where context should make the distinction clear. For an integer , we denote the set of integers smaller or equal as . For a set of integers , the notation designates the set of random variables with indices in the set , i.e., .

In this paper, due to space limitations some of the proofs are omitted or only outlined. Detailed proofs as well as the extension of the results of this paper to the case in which the decoder also has its own correlated side information stream can be found in [20].

Ii Problem Formulation

Consider the -encoder CEO problem shown in Figure 1. In this paper, the agents’ observations are assumed to be Gaussian noisy versions of a remote vector Gaussian source. Specifically, let be a jointly Gaussian random vector, with zero mean and covariance matrix . The vector is complex-valued, and has dimensions; and vector , , is complex-valued and has

dimensions. Throughout, it is assumed that the following Markov chain holds

(3)

Let now be a sequence of independent and identically distributed (i.i.d.) copies of . Encoder , , observes . Using (3), in what follows we assume without loss of generality that


where designates the channel that connects to and is an -dimensional, complex-valued, vector Gaussian noise with zero-mean and covariance matrix . All noises are independent among them, and from .

Encoder , , uses bits per sample to describe its observation to the decoder. The decoder wants to reproduce a soft-estimate of the remote source . That is, we consider the reproduction alphabet to be equal to the set of probability distributions over the source alphabet and the distortion measure is the logarithmic loss criterion as defined by (1).

Definition 1.

A rate-distortion code (of blocklength ) for the CEO problem consists of encoding functions

and a decoding function

where designates the set of probability distributions over the -Cartesian product of .

Definition 2.

A rate-distortion tuple is achievable for the vector Gaussian CEO problem if there exist a blocklength , encoding functions and a decoding function such that

The rate-distortion region of the vector Gaussian CEO problem under logarithmic loss is defined as the union of all non-negative tuples that are achievable.

One important goal in this paper is to characterize the rate-distortion region .

Iii Vector Gaussian CEO Problem Under Logarithmic Loss

Iii-a Rate-Distortion Region

The rate-distortion region of the discrete memoryless -encoder CEO problem under logarithmic loss has been fully characterized by Courtade-Weissman in [8, Theorem 10] in the case in which the Markov chain (3) holds. This result can be extended to the case of Gaussian sources as we stated in the following proposition.

Definition 3.

For given tuple of auxiliary random variables with distribution such that factorizes as

(4)

denotes the set of all non-negative tuples that satisfy, for all subsets ,

(5)

Also, let where the union is taken over all tuples with distributions that satisfy (4).

Proposition 1.

.

Proof.

The proof of Proposition 1 is given in Section V-A. ∎

One main result in this paper is an explicit characterization of . To this end, we show that the region is exhausted by Gaussian test channels. Also, we show that one can optimally set , i.e., time-sharing is not needed.

Theorem 1.

The rate-distortion region of the vector Gaussian CEO problem under logarithmic loss is given by the set of all non-negative rate-distortion tuples that satisfy, for all subsets ,

for some matrices such that .

Proof.

The proof of the direct part of Theorem 1 follows simply by evaluating (5) using Gaussian test channels and no time-sharing. Specifically, we set and . The proof of the converse appears in Section V-B. ∎

Remark 1.

In [8], it was shown that the union of all rate-distortion tuples that satisfy (5) for all subsets coincides with the Berger-Tung inner bound in which time-sharing is used. The direct part of Theorem 1 is obtained by evaluating (5) using Gaussian test channels and , not the Berger-Tung inner bound. The reader may wonder: i) whether Gaussian test channels also exhaust the Berger-Tung inner bound for the vector Gaussian CEO problem that we study here, and ii) whether time-sharing is needed with the Berger-Tung scheme. This is addressed in Section III-B, where it will be shown that the answer to both questions is positive.

Remark 2.

For the converse proof of Theorem 1, we derive an outer bound on the region described by (5). In doing so, we use the de Bruijn identity, a connection between differential entropy and Fisher information, along with the properties of MMSE and Fisher information. By opposition to the case of quadratic distortion for which the application of this technique was shown in [16] to result in an outer bound that is generally non-tight, Theorem 1 shows that the approach is successful in the case of logarithmic loss distortion measure, yielding a complete characterization of the region. Theorem 1 is also connected to recent developments on characterizing the capacity of multiple-input multiple-output (MIMO) relay channels in which the relay nodes are connected to the receiver through error-free finite-capacity links (i.e., the so-called cloud radio access networks). The reader may refer to [21, Theorem 4] where important progress is done, and [22, 23] where compress-and-forward with joint decompression-decoding is shown to be optimal under the constraint of oblivious relay processing.

Iii-B Gaussian Test Channels with Time-Sharing Exhaust the Berger-Tung Region

In this section, we show that for the vector Gaussian CEO problem under logarithmic loss, the Berger-Tung coding scheme with Gaussian test channels and time-sharing achieves distortion levels that are not larger than any other coding scheme. That is, Gaussian test channels with time-sharing exhaust the Berger-Tung region for this model.

Definition 4.

For given tuple of auxiliary random variables with distribution such that factorizes as

(6)

define as the set of all non-negative tuples that satisfy, for all subsets ,

Also, let where the union is taken over all tuples with distributions that satisfy (6).

Proposition 2.

, where is as given in Definition 4 and the superscript

is used to denote that the union is taken over Gaussian distributed

conditionally on .

Proof.

For the proof of Proposition 2, it is sufficient to show that, for fixed Gaussian conditional distributions , the extreme points of the polytopes defined by (5) are dominated by points that are in and which are achievable using Gaussian conditional distributions . Hereafter, we give a brief outline of proof for the case . The reasoning for is similar and is provided in the extended version [20]. Consider the inequalities (5) with and chosen to be Gaussian (see Theorem 1). Consider now the extreme points of the polytopes defined by the obtained inequalities:


where the point is a a triple . It is easy to see that each of these points is dominated by a point in , i.e., there exists for which , and . To see this, first note that and are both in . Next, observe that the point is in , which is clearly achievable by letting , dominates . Also, by using letting , we have that the point is in , and dominates the point . A similar argument shows that is dominated by a point in . The proof is terminated by observing that, for all above corner points, is set either equal (which is Gaussian distributed conditionally on ) or a constant. ∎

Remark 3.

By opposition to the region described by the inequalities (5) for which we have shown that the time-sharing variable can be optimally set to , time-sharing may still be needed to exhaust the entire region . On this aspect, we note that, from the proof of Proposition 2, it is only implied that the corner points of this region are achieved with Gaussian test channels without time-sharing. To get the entire region, one needs to time-share Gaussian test channels.

Iv Quadratic Vector Gaussian CEO Problem with Determinant Constraint

We turn to the case in which the distortion is measured under quadratic loss. In this case, the mean square error matrix is given by

(7)

Under a (general) error constraint of the form

(8)

where designates here a prescribed positive definite error matrix, a complete solution is still to be found in general. In what follows, we replace the constraint (8) with one on the determinant of the error matrix , i.e.,

(9)

( is a scalar here). We note that since the error matrix is minimized by choosing the decoding as

where denote the encoding functions, without loss of generality we can write (7) as

Definition 5.

A rate-distortion tuple is achievable for the quadratic vector Gaussian CEO problem with determinant constraint if there exist a blocklength , encoding functions such that

The rate-distortion region is defined as the union of all non-negative tuples that are achievable.

The following lemma essentially states that Theorem 1 provides an outer bound on .

Lemma 1.

If , then .

Proof.

The proof of Lemma 1 is given in Section V-C. ∎

We are now ready to state the main result of this section, which is a complete characterization of the region .

Theorem 2.

The rate-distortion region of the quadratic vector Gaussian CEO problem with determinant constraint is given by the set of all non-negative rate-distortion tuples that satisfy, for all subsets ,


for some , .

Proof.

The proof of Theorem 2 is given in Section V-D. ∎

Remark 4.

It is believed that the approach of this section, which connects the quadratic vector Gaussian CEO problem to that under logarithmic loss, can also be exploited to possibly infer other new results on the quadratic vector Gaussian CEO problem. Alternatively, it can also be used to derive new converses on the quadratic vector Gaussian CEO problem. For example, in the case of scalar sources, Theorem 2, and Lemma 1, readily provide an alternate converse proof to those of [1, 2] for this model.

V Proofs

V-a Proof of Proposition 1

First let us define the rate-information region for discrete memoryless sources as the closure of all rate-information tuples for which there exist a blocklength , encoding functions and a decoding function such that

It is easy to see that a characterization of can be obtained by using [8, Theorem 10] and substituting distortion levels therein with . More specifically, the region is given as in the following proposition.

Proposition 3.

The rate-information region of the vector DM CEO problem under logarithmic loss is given by the set of all non-negative tuples that satisfy, for all subsets ,

for some joint measure of the form .

The region involves mutual information terms only (not entropies); and, so, using a standard discretization argument, it can be easily shown that a characterization of this region in the case of continuous alphabets is also given by Proposition 3.

Let us now return to the vector Gaussian CEO problem under logarithmic loss that we study in this paper. First, we state the following lemma, whose proof is easy and is omitted for brevity.

Lemma 2.

if and only if .

For vector Gaussian sources, the region can be characterized using Proposition 3 and Lemma 2. This completes the proof.

V-B Proof of Converse of Theorem 1

The proof of Theorem 1 relies on deriving an outer bound on the region given by Proposition 1. In doing so, we use the technique of [16, Theorem 8] which relies on the de Bruijn identity and the properties of Fisher information and MMSE.

Lemma 3.

[24, 16] Let be a pair of random vectors with pmf . We have

where the conditional Fisher information matrix is defined as

and the minimum mean squared error (MMSE) matrix is

First, we derive an outer bound on (5) as follows. For each and fixed pmf , choose , , satisfying such that

(10)

Such always exists since, for all , , we have

Then, for and , we have

(11)

where is due to Lemma 3; and is due to (10).

On the other hand, for and , we have

(12)

where follows from Lemma 3; and for , we use the connection of the MMSE and the Fisher information to show the following equality, whose proof is provided in the extended version [20].

(13)

Next, we average (11) and (12) over the time sharing and letting , we obtain the lower bound

(14)

where follows from (11); and follows from the concavity of the log-det function and Jensen’s Inequality.

Besides, we can derive the following lower bound

(15)

where is due to (12); and is due to the concavity of the log-det function and Jensen’s inequality.

Finally, the outer bound on is obtained by applying (14) and (15) in (5), noting that since , and taking the union over satisfying .

V-C Proof of Lemma 1

Let a tuple be given. Then, there exist a blocklength , encoding functions and a decoding function such that

(16)

We need to show that there exist such that, for all subsets ,

(17)

Let us define

It is easy to justify that expected distortion is achievable under logarithmic loss (see Proposition 1). Then, following straightforwardly the lines in the proof of [8, Theorem 10], we have

(18)

Next, we upper bound in terms of . Letting , we have

(19)

where holds since conditioning reduces entropy; is due to the maximal differential entropy lemma; is due to the convexity of the log-det function and Jensen’s inequality; and is due to (16).

Combining (19) with (V-C), and using standard arguments for single-letterization, we get (17); and this completes the proof of the lemma.

V-D Proof of Theorem 2

The proof is as follows. By Lemma 1 and Proposition 2, there must exist Gaussian test channels and a time-sharing random variable

, with joint distribution that factorizes as

, such that the following holds for all subsets ,

(20)
(21)

This is clearly achievable by the Berger-Tung coding scheme with Gaussian test channels and time-sharing , since the achievable error matrix under quadratic distortion has determinant that satisfies

The above shows that the rate-distortion region of the quadratic vector Gaussian CEO problem under determinant constraint is given by (21), i.e., (with distortion parameter ). Recalling that , and substituting in Theorem 1 using distortion level completes the proof.

References

  • [1] Y. Oohama, “Rate-distortion theory for Gaussian multiterminal source coding systems with several side informations at the decoder,” IEEE Trans. Inf. Theory, vol. 51, no. 7, pp. 2577 – 2593, Jul. 2005.
  • [2] V. Prabhakaran, D. Tse, and K. Ramachandran, “Rate region of the quadratic Gaussian CEO problem,” in Proc. of IEEE Int. Symp. Inf. Theory, Jun. - Jul. 2004, p. 117.
  • [3] J. Chen and J. Wang, “On the vector Gaussian CEO problem,” in Proc. of IEEE Int. Symp. Inf. Theory, Jul. - Aug. 2011, pp. 2050 – 2054.
  • [4] J. Wang and J. Chen, “On the vector Gaussian -terminal CEO problem,” in Proc. of IEEE Int. Symp. Inf. Theory, Jul. 2012, pp. 571 – 575.
  • [5] T. Liu and P. Viswanath, “An extremal inequality motivated by multiterminal information-theoretic problems,” IEEE Trans. Inf. Theory, vol. 53, no. 5, pp. 1839 – 1851, May 2007.
  • [6] Y. Xu and Q. Wang, “Rate region of the vector Gaussian CEO problem with the trace distortion constraint,” IEEE Trans. Inf. Theory, vol. 62, no. 4, pp. 1823 – 1835, Apr. 2016.
  • [7] T. A. Courtade and R. D. Wesel, “Multiterminal source coding with an entropy-based distortion measure,” in Proc. of IEEE Int. Symp. Inf. Theory, Jul. - Aug. 2011, pp. 2040 – 2044.
  • [8] T. A. Courtade and T. Weissman, “Multiterminal source coding under logarithmic loss,” IEEE Trans. Inf. Theory, vol. 60, no. 1, pp. 740 – 761, Jan. 2014.
  • [9] J. Jiao, T. A. Courtade, K. Venkat, and T. Weissman, “Justification of logarithmic loss via the benefit of side information,” IEEE Trans. Inf. Theory, vol. 61, no. 10, pp. 5357 – 5365, Oct. 2015.
  • [10] A. No and T. Weissman, “Universality of logarithmic loss in lossy compression,” in Proc. of IEEE Int. Symp. Inf. Theory, Jun. 2015, pp. 2166 – 2170.
  • [11] Y. Shkel, M. Raginsky, and S. Verdu, “Universal lossy compression under logarithmic loss,” in Proc. of IEEE Int. Symp. Inf. Theory, Jun. 2017, pp. 1157 – 1161.
  • [12] N. Tishby, F. C. Pereira, and W. Bialek, “The information bottleneck method,” in Proc. of the 37th Annu. Allerton Conf. Commun., Control and Comput., 1999, pp. 368 – 377.
  • [13] N. Cesa-Bianchi and G. Lugosi, Prediction, Learning and Games.   New York,USA: Cambridge Univ. Press, 2006.
  • [14] T. Andre, M. Antonini, M. Barlaud, and R. M. Gray, “Entropy-based distortion measure for image coding,” in Proc. of IEEE Int. Conf. Image Process., Oct. 2006, pp. 1157 – 1160.
  • [15] K. Kittichokechai, Y.-K. Chia, T. J. Oechtering, M. Skoglund, and T. Weissman, “Secure source coding with a public helper,” IEEE Trans. Inf. Theory, vol. 62, no. 7, pp. 3930 – 3949, Jul. 2016.
  • [16] E. Ekrem and S. Ulukus, “An outer bound for the vector Gaussian CEO problem,” IEEE Trans. Inf. Theory, vol. 60, no. 11, pp. 6870 – 6887, Nov. 2014.
  • [17] D. P. Palomar, J. M. Cioffi, and M. A. Lagunas, “Joint Tx-Rx beamforming design for multicarrier MIMO channels: A unified framework for convex optimization,” IEEE Trans. Signal Process., vol. 51, no. 9, pp. 2381 – 2401, Sep. 2003.
  • [18] A. Scaglione, P. Stoica, S. Barbarossa, G. B. Giannakis, and H. Sampath, “Optimal designs for space-time linear precoders and decoders,” IEEE Trans. Signal Process., vol. 50, no. 5, pp. 1051 – 1064, May 2002.
  • [19] C. Tian and J. Chen, “Remote vector Gaussian source coding with decoder side information under mutual information and distortion constraints,” IEEE Trans. Inf. Theory, vol. 55, no. 10, pp. 4676 – 4680, Oct. 2009.
  • [20] Y. Ugur, I.-E. Aguerri, and A. Zaidi, “Vector Gaussian CEO problem under logarithmic loss and applications,” IEEE Trans. Inf. Theory, submitted for publication, 2018. [Online]. Available: http://arxiv.org/abs/1811.03933
  • [21] Y. Zhou, Y. Xu, W. Yu, and J. Chen, “On the optimal fronthaul compression and decoding strategies for uplink cloud radio access networks,” IEEE Trans. Inf. Theory, vol. 62, no. 12, pp. 7402 – 7418, Dec. 2016.
  • [22] I.-E. Aguerri, A. Zaidi, G. Caire, and S. Shamai, “On the capacity of cloud radio access networks with oblivious relaying,” IEEE Trans. Inf. Theory, 2017. [Online]. Available: http://arxiv.org/abs/1710.09275
  • [23] ——, “On the capacity of cloud radio access networks with oblivious relaying,” in Proc. of IEEE Int. Symp. Inf. Theory, Jun. 2017, pp. 2068 – 2072.
  • [24] A. Dembo, T. M. Cover, and J. A. Thomas, “Information theoretic inequalities,” IEEE Trans. Inf. Theory, vol. 37, no. 6, pp. 1501 – 1518, Nov. 1991.