    # Feedback Capacity of MIMO Gaussian Channels

Finding a computable expression for the feedback capacity of additive channels with colored Gaussian noise is a long standing open problem. In this paper, we solve this problem in the scenario where the channel has multiple inputs and multiple outputs (MIMO) and the noise process is generated as the output of a state-space model (a hidden Markov model). The main result is a computable characterization of the feedback capacity as a finite-dimensional convex optimization problem. Our solution subsumes all previous solutions to the feedback capacity including the auto-regressive moving-average (ARMA) noise process of first order, even if it is a non-stationary process. The capacity problem can be viewed as the problem of maximizing the measurements' entropy rate of a controlled (policy-dependent) state-space subject to a power constraint. We formulate the finite-block version of this problem as a sequential convex optimization problem, which in turn leads to a single-letter and computable upper bound. By optimizing over a family of time-invariant policies that correspond to the channel inputs distribution, a tight lower bound is realized. We show that one of the optimization constraints in the capacity characterization boils down to a Riccati equation, revealing an interesting relation between explicit capacity formulae and Riccati equations.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

This work was supported in part by the National Science Foundation (NSF) under grants CCF-1751356 and CCF-1956386. O. Sabag is partially supported by the ISEF international postdoctoral fellowship. The authors are with California Institute of Technology (e-mails: {oron,vkostina,hassibi}@caltech.edu). We consider the feedback capacity of a multiple-input multiple-output (MIMO) Gaussian channel

 yi =Λxi+zi, (1)

where is a known matrix, is the channel output and is the channel input. The noise is a non-white Gaussian random process generated by a MIMO state-space model (a hidden Markov model)

 si+1 =Fsi+Gwi zi =Hsi+vi, (2)

where the sequence is i.i.d. Gaussian. For a particular realization of the state-space, well-known random processes can be revealed, e.g., the auto-regressive moving-average (ARMA) random processes that were studied in [1, 2, 3]

. State-space structures are utilized in the fields of control and estimation to obtain explicit policies. In estimation theory, for instance, the celebrated Kalman filter is a closed-form policy obtained using the underlying state-space structure of the signal and its measurements. In this work, we show that imposing a state-space structure on the noise process leads to a computable solution for the feedback capacity problem too.

Our main result is a computable expression for the feedback capacity. Our assumptions on the state-space are mild and include noise processes that are not necessarily stationary, i.e., when the spectral radius of is greater than . We show that the feedback capacity can be formulated as a finite-dimensional convex optimization problem. The optimization is a standard maximal determinant (max-det) optimization problem subject to linear matrix inequalities (LMIs) constraints [4, 5, 6, 7]. Such formulations appeared in information theory contexts e.g. , but it is a recent development that the fundamental limits of problems with memory are formulated using this important class of convex optimization problems [9, 10].

The literature on the feedback capacity of the (scalar) Gaussian channel is vast, e.g. [11, 12, 13, 14, 15, 16], and a detailed survey can be found in . The feedback capacity of arbitrary Gaussian processes was characterized by a multi-letter expression in . In 

, a Markov decision processes formulation provided a computational tool, and an explicit lower bound for ARMA noise process of first order was derived. The tightness of the lower bound for the moving average (MA) process was established in



, concluding the first explicit capacity formula. Later, the same author considered general stationary noise processes and showed that the capacity can be expressed with a variational formula in the frequency domain

. This approach does not provide a consistent methodology to compute the feedback capacity, but for the ARMA noise process (of first order), they could obtain a closed-form formula and conclude the tightness of the conjectured lower bound in . Another contribution of  was the formulation of the capacity as a single-letter (more precisely, a finite-dimensional) but non-convex optimization problem. In , a change of variable for this optimization problem showed that it can be reformulated as a convex optimization problem. However, the change of variable relies on the fact that a certain covariance matrix is invertible, an imprecise claim (see Remark 1). We also remark that the results are limited to the case of stable and which means, effectively, that the encoder has access to the hidden state of the state-space. Our setting subsumes these models by allowing an arbitrary state-space model for the noise.

The main idea in our derivation is a novel formulation of the letter capacity as a sequential convex optimization problem (SCOP). The SCOP is an optimization problem whose decision variable is a sequence of length , and the optimization constraints (LMIs) also hold a sequential property and depend on consecutive times only. While it is well known that the -letter capacity is a convex optimization problem, it is the sequential property that allows us to obtain a single-letter upper bound for the limiting -letter capacity. For the lower bound, an optimization over a family of time-invariant channel input distributions leads to a non-convex optimization problem that we show to be equivalent to the single-letter upper bound. Thus, our derivation concludes the optimality of time-invariant input distributions, extending this important conclusion from  for the stationary case.

The rest of the paper is organized as follows. In Section II, we present the setting and problem formulation. Section III includes our main result on the feedback capacity of the MIMO Gaussian channel. In Section IV, we present the main ideas and the technical lemmas that prove our main result, while their detailed proofs are given in Section V.

## Ii The setting and Preliminaries

In this section, we define the communication setting and the noise characteristics. We also present some preliminaries on Kalman filtering and Riccati equations.

### Ii-a The setting

We consider a MIMO additive Gaussian channel

 yi =Λxi+zi, (3)

where the channel input is , the additive noise is , and is a fixed known matrix. We assume that the encoder has access to noiseless, instantaneous feedback so that the input is a function of the message and all previous channel outputs . For a fixed blocklength , the channel input has an average power constraint

The definitions of the average probability of error, an achievable rate and the feedback capacity are standard and can be found in

, for instance.

We consider a colored Gaussian noise that is generated as the output of a state-space model:

 si+1 =Fsi+Gwi zi =Hsi+vi, (4)

where and are i.i.d. sequences with , and are independent of the initial state . The class of linear dynamical systems described by the state-space in (II-A) is rich and captures many known instances like the MIMO ARMA random processes of th order given by , where is an i.i.d. sequence.

### Ii-B The Kalman filter and the Riccati equation

The Kalman filter is a simple, recursive method to compute the maximum likelihood estimation of the hidden state based on the measurements. The predicted-state estimation and the prediction estimation error covariance are defined as

 ^si =E[si|zi−1] Σi =cov(si−^si). (5)

Then, the standard Kalman filter is given by the recursion

 ^si+1 =F^si+Kp,i(zi−H^si), (6)

with

 Kp,i =(FΣiHT+GL)Ψ−1i Ψi =HΣiHT+V, (7)

and the error covariance is described by the Riccati recursion

 Σi+1 =FΣiFT+GWGT−Kp,i(HΣiHT+V)KTp,i. (8)

The innovation process defined by holds the property that it is orthogonal (statistically independent) from previous instances of the measurements .

Note that in (II-B), it is assumed that for all . This is a natural assumption in our communication setting since otherwise the capacity is infinite. Namely, if

is only positive semidefinite, a coordinate in the noise vector

is a deterministic function of the past noise instances . Building an infinite-rate scheme is straightforward: the encoder transmits for so that . Then, based on , the encoder and the decoder can communicate an inifinite number of bits on this vector coordinate (assuming the image of is not degenerated at this particular direction).

We move on to present our assumptions on the state-space model. The stability of is significant since it determines the stationarity of the noise process.

###### Definition 1.

The matrix is stable if its spectral radius satisfies .

Without further assumptions, our results hold for the stationary case, i.e., when is stable. If , the stability of should be replaced with the stability of .

For the general case where is not stable, we need three additional assumptions. Since the assumptions are satisfied for the case where is stable, a reader whose interest is limited to the stationary case may skip these assumptions.

###### Assumption 1.

The pair is detectable. That is, there exists a matrix such that .

###### Assumption 2.

The pair is stabilizable. That is, for any and with such that , .

###### Assumption 3.

The matrix

does not have eigenvalues on the unit circle.

The first two assumptions are made to guarantee that the Riccati recursion in (8) converges to a matrix that solves a Riccati equation. More specifically, consider the function

 f(Σ) =FΣFT−Σ+W−Kp(Σ)Ψ(Σ)KTp(Σ), (9)

where and . The Riccati equation is defined as . Under Assumptions , the Riccati recursion converges to a unique stabilizing solution of the Riccati equation [23, Ch. ]. That is, there exists a unique such that and is stable. Moreover, for any initial condition , at exponential rate. From now on, we refer to the constants

 Kp =(FΣHT+GL)Ψ−1 Ψ =HΣHT+V (10)

as the ones evaluated at the stabilizing solution to the Riccati equation.

The solution to the Riccati equation also characterizes the entropy rate of the Gaussian noise process:

 1nh(zn) =1nn∑i=1h(zi|zi−1) =1nn∑i=1logdet(HΣiHT+V) →logdet(Ψ), (11)

as

## Iii Main result and discussion

In this section we present the feedback capacity of the MIMO channel, its particularization to the scalar case and an explicit computation of the feedback capacity for the MA process. The following is our main result.

###### Theorem 1.

The feedback capacity of the MIMO Gaussian channel in (3)-(II-A) is given by the convex optimization problem

 Cfb(P)=maxΠ,^Σ,Γ12logdet(ΨY)−12logdet(Ψ) s.t.    ΨY=ΛΠΛT+H^ΣHT+ΛΓHT+HΓTΛT+Ψ (ΠΓΓT^Σ)⪰0,  Tr(Π)≤P, (F^ΣFT+KpΨKTp−^ΣF(ΓTΛT+^ΣHT)+KpΨ(ΛΓ+H^Σ)FT+ΨKTpΨY)⪰0, (12)

where and are constants given in (II-B).

The objective structure is the difference between the entropy rates of the channel outputs and the channel noise random processes. The entropy rate of the noise process is the constant given in (II-B), while the entropy rate of the channel outputs process is and is part of the optimization. The decision variable corresponds to the channel inputs covariance, while the decision variables and the error covariance matrix will be given a straightforward interpretation in Lemma 1 on the optimal policy structure.

The Schur complement of the second LMI constraint in (1) implies the Riccati inequality

 ^Σ ⪯F^ΣFT+KpΨKTp−KYΨYKTY, (13)

with . In Lemma 6 in Section IV below, it is shown that optimal decision variables satisfy that the Riccati inequality (13) with equality, i.e., it is a Riccati equation. This fact reveals that the origin for explicit capacity formulae expressed as function of roots to some polynomials in the literature, e.g., [17, 20, 19] is the Riccati equation. We demonstrate this interesting fact in Section III-B for the MA noise process.

### Iii-a The scalar case:

If the channel outputs, inputs, and the additive noise are scalars, but the the hidden state of the noise is possibly a vector, the capacity in Theorem 1 can be simplified.

###### Theorem 2.

The feedback capacity of the scalar Gaussian channel (3)-(II-A) with is given by the following convex optimization problem

 Cfb(P)=max^Σ,Γ12log(1+P+H^ΣHT+2ΓHTΨ) s.t.    (PΓΓT^Σ)⪰0, (14) (F^ΣFT+KpΨKTp−^ΣFΓT+F^ΣHT+KpΨΓFT+H^ΣFT+ΨKTpP+H^ΣHT+2ΓHT+Ψ)⪰0,

where and are constants defined in (II-B).

Choosing in (2) recovers the capacity formula of an additive white Gaussian noise channel

 Cfb(P)=12log(1+PV).
###### Remark 1.

The state-space that was studied in  can be recovered by choosing . In this case, the constants are and the capacity in (2) and that in [21, Th. ] are almost in full agreement. Specifically, there is a difference in the sign of the first LMI in (2) which reads as a strict LMI () in . A strict LMI implies that the Schur complement satisfies . However, it can be shown that the optimum is achieved with equality in the Schur complement at least for particular instances like the MA noise process in Section III-B. The claim that the LMI constraint is positive definite was also utilized in  to show their main argument that is invertible, and thus should be read with care.

### Iii-B Moving average noise

In , the feedback capacity of the MA noise process of first order was shown to be

 Cfb(P) =−logx0, (15)

where is the unique positive root of . As this noise realization corresponds to the special case , we illustrate the simplicity of computing such expressions from Theorem 2.

###### Theorem 3.

The feedback capacity of the scalar Gaussian channel with first-order MA noise process is

 Cfb(P) =12log(1+SNR), (16)

where is the positive root of the polynomial .

The capacity expressions in Theorem 3 and that in  are different, but it can be shown that they are equal. Specifically, a change of variable in (16) leads to the following equivalent expression where solves . Interestingly, the new polynomial and the polynomial in (15) are fundamentally different but it can be shown that they share a unique positive root meaning that the capacity is the same.

###### Proof of Theorem 3.

In Lemma 6 in Section IV, it is shown that the Schur complement of the Riccati LMI (13) is always achieved with equality. We can show the same property for the first LMI in (2) using contradiction. Assume that for some . Then, one can choose to show that the objective is increased. The Riccati LMI can be verified to be satisfied with this substitution.

To obtain the capacity expression, we use and the Riccati equation which simplifies to . Substituting these equations into the objective gives the fixed-point equation , where the sign of is chosen to maximize . By the variable change , we get the polynomial . ∎

Note that the proof is a straightforward computation for all values of , regardless of whether noise process is stationary.

## Iv Proof sketch of the main result

In this section we outline the proof of the main result of this paper in Theorem 1. We structure the proof as three parts.

1. Sequential convex optimization problem (SCOP): Define the -letter capacity as

 Cn(P) =maxP(xn||yN):1n∑ni=1E[xTixi]≤Ph(Yn)−h(Zn). (17)

The first three lemmas formulate the -letter capacity as a SCOP. Since the objective of is directed information (e.g. [24, 25]), it is easy to show that it is concave in its decision variable , but the challenge is to formulate it as a convex optimization problem that enables one to explicitly compute the limit of thereafter. To this end, we realize a SCOP whose fundamental LMI constraint has a sequential structure.

2. Upper bound via convexity: The second part of the proof utilizes the SCOP structure to show that the capacity expression in Theorem 1

is an upper bound on the capacity. Since the the optimization constraints contain decision variables at consecutive times, the standard time-sharing random variable argument does not apply here, and we use a different technique to show that these constraints are

asymptotically satisfied and not satisfied at all times.

3. Lower bound using Time-invariant inputs: The last part constructs a time-invariant policy whose parameters’ optimization leads to a lower bound that is expressed as the upper bound optimization problem with additional constraints. We show that the additional constraints are redundant, concluding the proof of the main result.

### Iv-a Sequential convex optimization problem

Define the estimators

 ^si ≜E[si|zi−1] ^^si ≜E[^si|yi−1], (18)

The first lemma identifies an optimal structure for the input distribution using these estimators.

###### Lemma 1 (The optimal policy structure).

For a fixed , it is sufficient to optimize (17) with inputs of the form

 xi =Γi^Σ†i(^si−^^si)+mi,    i=1,…,n (19)

where is independent of , is the Moore-Penrose pseudo-inverse of

 ^Σi =cov(^si−^^si), (20)

is a matrix that satisfies

 Γi(I−^Σ†i^Σi)=0, (21)

and the power constraint is

 1nn∑i=1Tr(Γi^Σ†iΓTi+Mi)≤P. (22)

Lemma 1 simplifies the optimization (17) by showing that the optimization domain is over the sequence of matrices . Note that is a deterministic function of the policy up to time and thus is not part of the policy. The main insight is that the input has two signaling components. The first component is a scaled version of the estimation error at the decoder , and its purpose is to refine the decoder’s knowledge of the channel state . The other component is an additive Gaussian corresponding to the new information sent to the decoder. For instance, if the noise is white, the entire power is dedicated to the new information encapsulated in .

We remark that a similar policy has been reported in [17, Section IV] and . Their policy reads , and is missing the scaling and the orthogonality constraint in (21). If is invertible, then both policies are equivalent by the variable change . However, the invertibility is not always true, and the orthogonality constraint must be introduced prior to the convex optimization formulation (see also Remark 1). In the next lemma, the dynamics of the channel output is formalized as a controlled state space.

###### Lemma 2 (Channel outputs dynamics).

For a fixed policy , the channel outputs admit the state-space model

 ^si+1 =F^si+Kp,iei, yi =(ΛΓi^Σ†i+H)^si−ΛΓi^Σ†i^^si+Λmi+ei, (23)

where and are defined in (II-B). The estimator in (IV-A) can be written as

 ^^si+1 =F^^si+KY,i(yi−H^^si), (24)

and its corresponding error covariance satisfies the recursion

 ^Σi+1 =F^ΣiFT+Kp,iΨiKTp,i−KY,iΨY,iKTY,i (25)

with

 ΨY,i =(ΛΓi^Σ†i+H)^Σi(ΛΓi^Σ†i+H)T+ΛMiΛT+Ψi KY,i =(F^Σi(ΛΓi^Σ†i+H)T+Kp,iΨi)Ψ−1Y,i (26)

and .

Lemma 2 is a consequence of the policy that was derived in Lemma 1. As seen from (2), the encoders’ policy translates into an additive measurement noise and a modification of the observability matrix . Similar state-space structures appeared in [20, 26]. It is interesting to realize that that (2) does not fall into the classical state-space structure since the observability matrix depends on error covariance due to our policy. Lemma 2 already reveals an objective structure that resembles that in Theorem 1. Namely, by (2), we can write the objective at time as

 h(yi|yi−1)−h(zi|zi−1) =12logdet(ΨY,i)−12logdet(Ψi).

The next lemma summarizes the SCOP formulation.

###### Lemma 3 (Sequential convex-optimization formulation).

The -letter capacity can be bounded by the convex optimization problem

 Cn(P)≤max{Γi,Πi,^Σi+1}ni=1  12nn∑i=1logdet(ΨY,i)−logdet(Ψi) s.t.   (ΠtΓtΓTt^Σt)⪰0,  1nn∑i=1Tr(Πi)≤P, ΨY,t=ΛΠtΛT+H^ΣtHT+ΛΓtHT+HΓTtΛT+Ψt KY,t=(FΓTtΛT+F^ΣtHT+Kp,tΨt)Ψ−1Y,t (F^ΣtFT+Kp,tΨtKTp,t−^Σt+1KY,tΨY,tΨY,tKTY,tΨY,t)⪰0, (27)

where the constraints hold for and .

To see that (3) is a standard convex optimization, note that each of the LMI constraints is a linear function of the decision variables. In the next section, we provide the single-letter upper bound on the capacity. The key to the upper bound is the concavity of the objective function along with the linearity of the constraints, along with the crucial property that the Riccati LMI constraint contains decision variables of two consecutive times only.

### Iv-B Single-letter upper bound

The next lemma concludes the upper bound in Theorem 1.

###### Lemma 4 (The upper bound).

The feedback capacity is bounded by the convex optimization problem

 Cfb(P)≤maxΠ,^Σ,Γ12logdet(ΨY)−12logdet(Ψ) s.t.    (ΠΓΓT^Σ)⪰0,  Tr(Π)≤P, ΨY=ΛΠΛT+H^ΣHT+ΛΓHT+HΓTΛT+Ψ KY=(FΓTΛT+F^ΣHT+KpΨ)Ψ−1Y (F^ΣFT+KpΨKTp−^ΣKYΨYΨYKTYΨY)⪰0. (28)

The main idea behind the upper bound is to show that the convex combination of each of the decision variables in Lemma 3 obtains a larger objective. At a high level, this is similar to the time-sharing random variable, but the challenge lies in the constraints. Specifically, one cannot show that the Riccati LMI constraint (4) is satisfied at all times when evaluated at the convex combination of the decision variables. To settle this point, we show that the constraint is satisfied in the asymptotics.

### Iv-C Lower bound

In this section, we prove that the upper bound in Lemma 4 is achievable. It will be shown using two lemmas: the first formulates a lower bound as an optimization problem that resembles the upper bound but has two additional constraints. The second lemma shows that in the upper bound optimization problem these two constraints are satisfied.

###### Lemma 5 (Lower bound).

For time-invariant policies

 xi =Γ(^si−^^si)+mi,    i≥1 (29)

with , the maximization of (17) over achieves the lower bound

 Cfb(P) ≥maxΓ,Π,^Σlogdet(ΨY)−logdet(Ψ) s.t.  (ΠΓΓT^Σ)⪰0,  Tr(Π)≤P KY =(F^ΣHT+FΓTΛT+KpΨ)Ψ−1Y ΨY =ΛΠΛT+ΛΓHT+HΓTΛT+Ψ ^Σ =F^ΣFT+KpΨKTp−KYΨYKTY (30) ∃K:ρ(F−K(ΛΓ^Σ†+H))<1. (31)

The optimization problem in (31) is the same as the upper bound in (4) except for the additional constraint (31) and the Riccati equation (30) which appears as an inequality in the upper bound (13). Next, we show that these two conditions can be neglected and conclude the proof of Theorem 1.

###### Lemma 6 (Equality between the lower and upper bounds).

An optimal tuple for the upper bound optimization problem in (4) satisfies the following:

1. The Schur complement of the Riccati LMI (13) is achieved with equality.

2. The pair is detectable, i.e.,

 ∃K:ρ(F−K(ΛΓ^Σ†+H))<1.

Consequently, the upper bound in Lemma 4 and the lower bound in Lemma 5 are equal to the feedback capacity.

## V Proof of Technical lemmas

In this section, we provide detailed proofs of Lemmas 1 - 6 consecutively.

###### Proof of Lemma 1.

The new policy is a subset of the general maximization domain subject to the power constraint. Thus, our proof strategy is to construct a policy of the new form (19), for any input distribution, and show it induces the same -letter objective value. To distinguish variables that are induced by the fixed policy and the new constructed policy we will use the letters and respectively.

The optimality of a Gaussian input distributions can be shown with a standard argument on entropy rate in (17). For any Gaussian input distribution, denoted by , the objective is

 hP(yi|yi−1) =logdet(covP(yi−^^yi)), (32)

where . The covariance can be written explicitly as

 covP(yi−^^yi) =covP(yi−^^zi) =covP(Λxi+H^si−H^^si+zi−H^si) =covP(Λxi+H(^si−^^si))+Ψi (33)

where in the first equality , and the assumption, without loss of generality, that .

Construct a new policy, denoted by , of the form (19) as

 xi=Γi^Σ†i(^si−^^si)+mi, (34)

where , is independent of and is distributed according to with

 Mi=EP[xixTi]−EP[xi(^si−^^si)T]^Σ†iEP[(^si−^^si)xTi],

and is the pseudo inverse of .

We now analyze the objective induced by the new policy by computing its argument

 covQ(yi−^^yi) =covQ(yi−^^zi) =covQ(Λxi+H(^si−^^si)+H(si−^si)) (a)=(ΛΓi^Σ†i+H)EQ[(^si−^^si)(^si−^^si)T](ΛΓi^Σ†i+H)T +ΛMiΛT+Ψi (b)=(ΛΓi^Σ†i+H)EP[(^si−^^si)(^si−^^si)T](ΛΓi^Σ†i+H)T +ΛMiΛT+Ψi (c)=covP(yi−^^yi), (35)

where follows from the independence of on , and the fact that the covariance of the innovation is and is independent of the policy choice, follows by the induction hypothesis, follows from the relation that is shown next. Indeed, it is a simple property of covariance matrices since is the orthogonal projection onto the kernel of

. Nevertheless, for completeness, consider the singular value decomposition for the covariance matrix

 ^Σi =E[(^si−^^si)(^si−^^si)T] =(U0U1)(Ω000)(UT0UT1), (36)

where

is an orthogonal matrix,

and . The Moore-Penrose pseudo inverse is

 ^Σ†i =(U0U1)(Ω−1000)(UT0UT1) (37)

Now, we can show that

 E[xi(^si−^^si)T]^Σ†i^Σi =E[xi(^si−^^si)T](U0U1)(I000)(UT0UT1) =E[xi(^si−^^si)T], (38)

where we used the fact .

Finally, it can be verified that the power consumed by the new policy satisfies

###### Proof of Lemma 2.

The recursion for the predicted state is given in Eq. (6) where is the innovation process. For the channel output, we use Lemma 1 to write

 yi =Λxi+zi =(ΛΓi^Σ†i+H)^si−ΛΓi^^si+Λmi+ei. (39)

Note that the term is a function of and has no effect on the estimation error. To show that it is a valid state-space model, note that the measurement noise has two summands that are independent of . Thus, the measurement noise is independent of previous measurements and the hidden states of the state-space model.

To obtain the optimal estimator and the error covariance recursion in (25), we use the standard Kalman filter recursions in (II-B)-(II-B) with the constants , , and . The recursions also hold for the time-varying modifications. ∎

###### Proof of Lemma 3.

Our starting point is the combination of Lemma 1 and Lemma 2 to the optimization problem of

 max12nn∑i=1logdet(ΨY,i)−logdet(Ψi)