# On the Separability of Ergodic Fading MIMO Channels: A Lattice Coding Approach

This paper addresses point-to-point communication over block-fading channels with independent fading blocks. When both channel state information at the transmitter (CSIT) and receiver (CSIR) are available, most achievable schemes use separable coding, i.e., coding independently and in parallel over different fading states. Unfortunately, separable coding has drawbacks including large memory requirements at both communication ends. In this paper a lattice coding and decoding scheme is proposed that achieves the ergodic capacity without separable coding, with lattice codebooks and decoding decision regions that are universal across channel realizations. We first demonstrate this result for fading distributions with discrete, finite support whose sequences are robustly typical. Results are then extended to continuous fading distributions, as well as multiple-input multiple-output (MIMO) systems. In addition, a variant of the proposed scheme is presented for the MIMO ergodic fading channel with CSIR only, where we prove the existence of a universal codebook that achieves rates within a constant gap to capacity for finite-support fading distributions. The gap is small compared with other schemes in the literature. Extension to continuous-valued fading is also provided.

## Authors

• 1 publication
• 10 publications
• ### Outage Common Randomness Capacity Characterization of Multiple-Antenna Slow Fading Channels

We investigate the problem of common randomness (CR) generation from dis...
05/13/2021 ∙ by Rami Ezzine, et al. ∙ 0

• ### Comments and Corrections to "Capacity of Multiple-Antenna Systems With Both Receiver and Transmitter Channel State Information"

In this correspondence, we correct the ergodic capacity versus SNR curve...
12/16/2019 ∙ by Kamal Singh, et al. ∙ 0

• ### Semantically Secure Lattice Codes for Compound MIMO Channels

We consider code construction for compound multi-input multi-output (MIM...
03/24/2019 ∙ by Antonio Campello, et al. ∙ 0

• ### The Optimal DoF for the Noncoherent MIMO Channel with Generic Block Fading

The high-SNR capacity of the noncoherent MIMO channel has been derived f...
09/24/2020 ∙ by Khac-Hoang Ngo, et al. ∙ 0

• ### Semantic Security on Wiretap Channels using Universal Hashing with Fading Applications

We furnish a procedure based on universal hash families (UHFs) that can ...
03/21/2019 ∙ by Eric Kubischta, et al. ∙ 0

• ### A Unified Discretization Approach to Compute-Forward: From Discrete to Continuous Inputs

Compute-forward is a coding technique that enables receiver(s) in a netw...
10/01/2021 ∙ by Adriano Pastore, et al. ∙ 0

• ### Fading memory echo state networks are universal

Echo state networks (ESNs) have been recently proved to be universal app...
10/22/2020 ∙ by Lukas Gonon, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

For the band-limited Additive White Gaussian Noise (AWGN) channel, approaching capacity with manageable complexity has been extensively studied [1, 2, 3, 4, 5, 6, 7, 8]. McEliece and Stark [9] established the ergodic capacity of the Gaussian fading channel with CSIR only. Goldsmith and Varaiya [10] extended the result for full CSI (both CSIT and CSIR). The capacity of the ergodic fading MIMO channel with isotropic fading and CSIR was established by Telatar [11] and Foschini and Gans [12]. For a survey of related results please see Biglieri et al. [13].

Under fading and in the presence of CSIT, one straightforward capacity approaching technique is separable coding, i.e., coding independently and in parallel over different fading states of the channel [10, 14]

. Unfortunately, in practice separable coding imposes heavy costs that are further magnified in the presence of low-probability fading states. In particular either rate loss due to discarding low probability fading states, or loss of coding performance due to shorter block lengths, must be tolerated. In addition, separable coding requires operating multiple encoders and decoders with different transmission rates in parallel, which requires large memory at both communication ends. Thereby, achieving the ergodic capacity of block-fading channels

without separable coding remains an important and interesting question.111It was pointed out in [13] that under maximum likelihood decoding the ergodic capacity of point-to-point channels with CSIT can be attained using Gaussian signaling without separable coding. However, one cannot directly conclude that the same result holds for non-Gaussian (structured) codebooks.

This paper shows that non-separable lattice coding and decoding achieve the ergodic capacity of the block fading SISO channel. At the transmitter, the symbols of the codeword are permuted across time. Time-varying Minimum Mean-Square Error (MMSE) scaling is used at the receiver, followed by a decoder that is universal for all fading realizations drawn from a given fading distribution. Thus, the codebook and decision regions are fixed across transmissions; the only channel-dependent blocks are the permutation and the MMSE scaling. We first highlight the main ideas of the proposed scheme in the context of a heuristic channel model that motivates the proposed approach. We then generalize the solution to all fading distributions whose realizations are robustly typical, and to continuous distributions via a bounding argument. The results are then extended to MIMO block-fading channels.

A lattice coding and decoding scheme is also proposed for the ergodic fading MIMO channel with CSIR only, where the channel coefficients are drawn from a discrete distribution with finite support. In this setting, channel-matching decision regions are proposed, where we use a worst case error bounding technique to show the existence of a universal lattice codebook that achieves rates within a constant gap to capacity for all fading realizations. The gap is infinitesimal in some special cases. We also extend the scheme to continuous-valued fading, and show that the rates achieved are close to capacity under Rayleigh fading.

Lattice coding has an extensive literature. De Buda addressed the optimality of lattice codes for the AWGN channel [15], a result later corrected by Linder et al. [16]. Loeliger [17] proved the achievability of with lattice coding and decoding. Urbanke and Rimoldi [18] showed the achievability of with maximum likelihood decoding. Erez and Zamir [19] showed that lattice coding and decoding achieve the capacity of the AWGN channel, where the ingredients of the achievable scheme include nested lattice codes in addition to common randomness via a dither variable and MMSE scaling at the receiver. Erez et al. [20] also proved the existence of lattices with good properties that achieve the performance promised in [19]. El Gamal et al. [21] showed that nested lattice codes achieve the white-input capacity, as well as the optimal diversity-multiplexing tradeoff, of the AWGN MIMO channel with fixed channel coefficients. Recently, Zhan et al. [22] proposed a novel technique that is based on nested lattice codes together with integer-forcing linear receivers, where the receiver decodes integer combinations of the signals at each antenna, similar to the compute-and-forward technique [23]. Ordentlich and Erez showed that in conjunction with a precoder that is independent of the channel, integer-forcing can operate within a constant gap to the MIMO capacity [24]. In [25, Section 4.5] Vituri analyzed the performance of lattice codes under fading channels without power constraint. Under ergodic fading and CSIR only, Luzzi and Vehkalahti [26] recently showed that a class of lattices belonging to a family of division algebra codes achieve rates within a constant gap to capacity, however, this gap can be large. In [27], a lattice coding scheme was proposed whose decoder does not depend on the fading realizations, achieving rates within a constant gap to capacity. The results in both [26, 27] are limited to channels with isotropic fading, i.e., the optimal input covariance matrix is a scaled identity. Lately, Liu and Ling [28] showed that polar lattices achieve the capacity of the SISO i.i.d. fading channel. Campello et al. [29] also proved that lattices constructed from algebraic codes achieve the SISO ergodic capacity. Unfortunately neither [28] nor [29] are easily extendable to MIMO channels.

The remainder of the paper is organized as follows. Section II establishes the notation and provides an overview of lattices and typicality. Section III presents the lattice coding scheme under full CSI, and Section IV under CSIR only. Section V provides a concluding summary.

## Ii Preliminaries

### Ii-a Notation and Definitions

Throughout the paper we use the following notation. Boldface lowercase letters denote column vectors and boldface uppercase letters denote matrices. The sets of real numbers and integers are denoted by

, respectively. denotes the transpose of matrix . is element  of . and denote the determinant and trace of the square matrix , respectively. is the size-identity matrix. and denote the all-zero and all-one matrices, respectively. denote probability and expectation, respectively, and  represents error probability. is an -dimensional ball of radius  and the volume of shape is . . denotes the number of elements in set . Unless otherwise specified, all logarithms are in base .

### Ii-B Lattice Codes

A lattice is a discrete subgroup of which is closed under reflection and real addition. The fundamental Voronoi region  of the lattice  is defined by

 V={s∈Rn:argminλ∈Λ||s−λ||=0}. (1)

The

second moment per dimension

of is defined as

 σ2Λ=1nVol(V)∫V||s||2ds. (2)

Every can be uniquely written as where , , with ties broken in a systematic manner. The quantizer is then defined by

 QV(s)=λ,if s∈λ+V. (3)

Define the modulo- operation corresponding to as follows

 [s]modΛ≜s−QV(s). (4)

The modulo- operation also satisfies

 [s+t]modΛ=[s+[t]modΛ]modΛ∀s,t∈Rn. (5)

The lattice is nested in  if . We employ the class of nested lattice codes proposed in [19]. For completeness, the lattice construction is outlined as follows:

1. Draw an i.i.d. vector

whose elements are uniformly distributed on the set

, where is a large prime number.

2. Define the codebook }.

3. Apply Construction  to lift to such that .

A self-similar pair of nested lattices is used such that the coarse lattice , where the scaling factor  assures has second moment , and where scales the fundamental volume of  to achieve rate  given by

 R≜1nlogVol(V)Vol(V1), (6)

and are the Voronoi regions of the coarse and fine lattices, respectively. The ensemble of nested lattice pairs employed above have been shown to be simultaneously good for AWGN coding, packing, covering and quantization [20]. The covering goodness of is defined by

 limn→∞1nlogVol(Bn(Rc))Vol(Bn(Rf))=0, (7)

where the covering radius is the radius of the smallest sphere spanning and is the radius of the sphere whose volume is equal to .

###### Definition 1.

[21, Theorem 1] Let be a Riemann integrable function of bounded support (i.e., if exceeds some bound). An ensemble of lattices  with fundamental volume  satisfies the Minkowski-Hlawka Theorem if for any there exists a lattice with dimension  such that

 ∣∣∣EΛ[∑z∈Λ,z≠0f(z)]−1Vol(V)∫Rnf(z)dz∣∣∣<ϵ. (8)
###### Lemma 1.

[21, Theorem 2] The ensemble of nested lattice pairs of [19] satisfies the Minkowski-Hlawka Theorem at large dimension .

A key ingredient of the lattice coding scheme proposed in [19] is using common randomness (dither) , drawn uniformly over , in conjunction with the lattice code. The following lemma from [19] is key to the development of the results in this paper.

###### Lemma 2.

[19, Lemma 1] For any point that is independent of a dither drawn uniformly over a lattice Voronoi region , the point is uniformly distributed over and is also independent of .

### Ii-C Typicality

We briefly review robust typicality [30, Appendix] and weak typicality [31, Chapter 3.1]

. Consider a probability distribution

on the discrete domain .

###### Definition 2.

A -robustly typical set according to is the set of all sequences  that satisfy

 |nk−nPk|≤δnPk, (9)

for all , where stands for and the number of coordinates of that are equal to .

Long random sequences drawn i.i.d. are with high probability robustly typical according to the underlying distribution, as indicated by the following result.

###### Lemma 3.

[30, Lemma 17] The probability of a sequence  of length  not being -robustly typical is upper bounded by

 P(x∉T(R)δ) ≤χ∑k=1P(|nk−nPk|>δnPk)≤2χe−δ2μn/3, (10)

where is the smallest non-zero probability in .

Weak typicality [31] is defined here via entropy rates.

###### Definition 3.

An -weakly typical set with respect to a sequence of probability distributions is defined as the set of all vectors that satisfy

 2−n(ℏ(x)+ϵ)≤P(x)≤2−n(ℏ(x)−ϵ), (11)

where  is the entropy rate of the sequence of probability distributions, assuming it exists.222A prominent example is when the sequence of probability laws is stationary.

The probability of an arbitrary sequence of length  being weakly typical is . The cardinality of is bounded by

 |T(W)ϵ|≤2n(ℏ(x)+ϵ). (12)

## Iii A Capacity Achieving Lattice Coding Scheme

Consider a real-valued single-antenna point-to-point channel with block-fading and i.i.d. Gaussian noise. The received signal is given by . The transmission and reception of a codeword over channel uses is represented by

 y=Hx+w, (13)

where is an diagonal matrix whose diagonal entries  are drawn from a discrete distribution with finite-support . The channel coherence length is with , where is fixed and is proportional to . Therefore each codeword experiences independent fading realizations. The covariance of the channel is

 (14)

Both the transmitter and receiver have full knowledge of the channel state. The noise  is zero-mean i.i.d. Gaussian with covariance  and is independent of . is the codeword, subject to an average power constraint .

The ergodic capacity of the real-valued point-to-point channel is given by [10]

 C=12Eh[log(1+h2ρ∗(h))], (15)

where denotes the channel-dependent waterfilling power allocation [10], which satisfies . This capacity is achieved via separable coding [10], which is defined as follows

###### Definition 4.

In a separable coding scheme, the ergodic fading channel over time is demultiplexed into virtual parallel channels according to fading states, over which independent codewords are transmitted. Each codeword is therefore transmitted over multiple occurrences of the same fading state.

To highlight the essential ideas of the proposed scheme we first address the problem in the context of a heuristic channel model.

### Iii-a The Random Location Channel

We define a channel model, called the random location channel, where in each block of length , denoted , the empirical frequency of occurrence of each channel state perfectly matches the underlying probability distribution. Channel coefficients take values from the set . Consider sequences that satisfy . The random location channel draws from this set of sequences with equal probability. Thus, the channel is by construction perfectly robustly-typical. The transmitter knows non-causally the number of occurrences of each in , however, their location is random, and only known causally at both the transmitter and receiver (full CSI). The model provides a stepping stone for the achievable scheme proposed for the ergodic channel in Section III-B, and serves to illustrate its underlying intuitions.

###### Theorem 1.

For the random location channel defined above, the rate

 R<12|H|∑s=1μslog(1+h2sρ∗(hs)) (16)

is achievable using non-separable lattice coding, where  represents the frequency of occurrence of coefficient value  such that , and  is the waterfilling power allocation for channel coefficient  drawn from .

###### Proof.

Encoding: Nested lattice codes are used where . The transmitter emits a lattice point  that is dithered with  which is drawn uniformly over . The dithered codeword is as follows

 x=[t−d] modΛ=t−d+λ, (17)

where from (4). The coarse lattice  has a second moment . The codeword is then multiplied by two cascaded matrices as follows

 x′=DVx, (18)

where  is a permutation matrix and  is a diagonal matrix with , where  is the optimal waterfilling power allocation for the fading coefficient , as given in [10]. Hereafter we use  as a short-hand notation for . We show in Appendix A that the average power constraint of  is approximately the same as .

Decoding: The received signal  is multiplied by a matrix  cascaded with an inverse permutation matrix , and the dither is removed as follows

 y′= VTUy+d = x+(VTUHDV−In)x+VTUw+d = t+λ+(VTUHDV−In)x+VTUw, = t+λ+z, (19)

where

 z≜(VTUHDV−In)x+VTUw, (20)

and  is independent of  from Lemma 2.

The receiver matrix  is chosen to be the MMSE matrix given by

 U=ρHD(In+ρH2D2)−1. (21)

is diagonal, where . Now, the diagonal elements of  are

 Uii=√ρρ∗ihi1+ρ∗ih2i. (22)

With a slight abuse of notation, define the permutation function  such that represent the channel coefficients arranged in ascending order of the magnitudes. Consider permutation matrix  such that , where the diagonal entries of  are . See Appendix B for further details on . From (20) and (21), are given by 333Since waterfilling dedicates more power to channels with larger magnitude, implies  [10].

 zi=−1ρ∗π(i)h2π(i)+1xi+√ρρ∗π(i)hπ(i)ρ∗π(i)h2π(i)+1wπ(i). (23)

The following lemma, whose proof can be found in Appendix C, elaborates some geometric properties of .

###### Lemma 4.

For any and , there exists such that for all ,

 P(z∉Ω)<γ, (24)

where is an -dimensional ellipsoid, given by

 Ω≜{s∈Rn : sTΣ−1s≤(1+ϵ)n}, (25)

and  is a diagonal matrix whose diagonal elements are given by

 Σii=ρρ∗π(i)h2π(i)+1 (26)

Now, we apply a version of the ambiguity decoder proposed in [17], defined by an ellipsoidal decision region in (25).444 is a bounded measurable region of  [17]. The decoder chooses if and only if the received point falls exclusively within the decision region of the lattice point , i.e., .

Probability of error: As shown in [17, Theorem 4], on averaging over the ensemble of fine lattices  of rate  whose construction follows Section II-B, the probability of error can be bounded by

 1|L|∑LPe< P(z∉Ω)+(1+δ)% Vol(Ω)Vol(V1) = P(z∉Ω)+(1+δ)2nRVol(Ω)Vol(V), (27)

for any , and the equality follows from (6). This is a union bound involving two events: the event that  is outside the decision region, i.e., and the event that  is in the intersection of two decision regions , where are two distinct lattice points. From Lemma 4, the first term in (27) is bounded by . Consequently, the error probability can be bounded by

 1|L|∑LPe<γ+(1+δ)2nRVol(Ω)Vol(V), (28)

for any . The volume of  is given by

 Vol(Ω)=(1+ϵ)n2Vol(B(√nρ))(n∏i=11ρ∗ih2i+1)12. (29)

The second term in (28) is then bounded by

 (1+δ)2nR(1+ϵ)n/2(n∏i=11ρ∗ih2i+1)12Vol(B(√nρ))Vol(V) =(1+δ)2−n(−1nlog(Vol(B(√nρ))Vol(V))+ξ), (30)

where

 ξ≜ −12log(1+ϵ)−12nlog(n∏i=11ρ∗ih2i+1)−R = −12log(1+ϵ)+12nn∑i=1log(1+ρ∗ih2i)−R. = −12log(1+ϵ)+12|H|∑s=1μslog(1+ρ∗sh2s)−R, (31)

and (31) follows from the structure of the random location channel. From (7), since the lattice is good for covering, the first term of the exponent in (30) vanishes. From (30), whenever is a positive constant we have as . Hence, positive can be achieved as long as

 R<12|H|∑s=1μslog(1+ρ∗sh2s)−12log(1+ϵ)−ϵ′, (32)

where , diminish with . The existence of a fine lattice that achieves the probability of error averaged over the ensemble of lattices  is straightforward. The outcome of the decoding process is the lattice point , where in the event of successful decoding the noise is eliminated and from (19), . On applying the modulo- operation on ,

 [^t] modΛ=[t+λ] modΛ=t, (33)

where the second equality follows from (5) since . Following in the footsteps of [21], it can be shown that the error probability of the Euclidean lattice decoder is upper bounded by the error probability of the ellipsoidal decision region in (25). The Euclidean lattice decoder is given by

 ^t=argmint′∈Λ1||Σ−12(y′−t′)||2, (34)

followed by the modulo- operation in (33). This concludes the proof of Theorem 1. ∎

Now, we are ready to address the ergodic fading channel whose channel coefficients are drawn from a discrete distribution with finite support. Unlike the random location channel discussed earlier, in the following the number of occurrences of  within a block is no longer fixed.

###### Theorem 2.

Non-separable lattice coding achieves the ergodic capacity of block-fading channels whose channel coefficients are drawn from an arbitrary discrete distribution with finite-support, when channel state information is available at all nodes.

###### Proof.

The proof appears in Appendix D; here we provide a sketch. We follow a best effort approach in designing the permutation matrix . In order to account for the ordering errors, we use a fixed decision region that is slightly larger than (the decision region resulting from perfect channel ordering, which is non-realizable due to the causality of the channel knowledge). However, when the channel is robustly typical, the total number of ordering errors is negligible at large , and hence the rate loss incurred by using larger decision regions vanishes. ∎

The extension of Theorem 2 to complex-valued channels is straightforward, using techniques similar to [32, Theorem 2]. The channel would then be ordered with respect to the magnitude of channel coefficients.

### Iii-C Extension to Continuous-Valued Fading

In order to extend the arguments to continuous-valued fading channels, we assume the fading distribution possesses a finite second moment. We note that with full CSI, the information density contributed by each transmission is a strictly increasing function of the absolute value of the fading coefficient. First, let  denote the squared channel gain times the normalized waterfilling power allocation for that channel gain. Thus, we can partition the continuous values  into  sets , where and . For any sequence of channel gains  drawn from a continuous distribution, we quantize  to the lower limit of the bracket

to which it belongs, producing a discrete random variable

taking values over the set . Note that the independence of the continuous-valued fading realizations guarantees the independence of the discrete-valued counterparts, and hence robust typicality would still apply. We show that the rate supported by the discrete-valued channel is within a gap to capacity that can be bounded as follows

 C−R= E[log(1+ρ~g)]−E[log(1+ρg)] = E[log(1+ρ~g1+ρg)|~g≤gL]P(~g≤gL)+E[log(1+ρ~g1+ρgL)|~g>gL]P(~g>gL) (35) < < < γ1+E[log(1+~g−gLgL)|~g>gL]P(~g>gL) = γ1+E[log(~ggL)|~g>gL]P(~g>gL) < γ1+c(E[~g|~g>gL]gL−1)P(~g>gL) (36) < γ1+c(E[~g]gLP(~g>gL)−1)P(~g>gL) (37) < γ1+cE[~g]gL≜γ1+γ2, (38)

where , and . (36) follows since for all and (37) follows from the law of total expectation. vanishes when , while vanishes when . Note that a necessary condition for to vanish is that  is finite.

The gap is bounded more tightly when the distribution of  has a vanishing tail. For instance, when  is exponential,

 C−R< γ1+c(E[~g|~g>gL]gL−1)P(~g>gL) < γ1+c(E[~g+gL]gL−1)P(~g>gL) (39) < γ1+cE[~g]gLe−gLE[~g], (40)

which vanishes exponentially with . (39) follows since

is exponentially distributed and hence memoryless.

To summarize, the gap bounding argument can be described as follows: Given channel quantization bins, we bound the total rate loss due to quantization by the rate loss in each of the bins. The first terms bound the amount of loss in rate by the input-output information density at the highest versus the lowest channel gain in each bracket . This strategy will not work for the final bin because the channel gain in is unbounded, instead we use the total rate contributed by the bin as a bound. Fortunately, this term also vanishes at large  since the probability of occurrence of such fading values is small enough.

### Iii-D Extension to MIMO

The result in Theorem 2 can be extended to an MIMO channel with full CSI. The received signal at time  is given by

 yi=Hixi+wi, (41)

where is the channel-coefficient matrix.

###### Theorem 3.

Lattice codes achieve the ergodic capacity of the MIMO block fading channel with channel state information available at both transmitter and receiver. This result holds for both discrete-valued and continuous-valued channels.

###### Proof.

Since  are known perfectly, the transmitter and receiver can transform the MIMO channel into

SISO parallel channels via singular-value decomposition. The SISO individual capacities can be achieved as shown in Section

III-B. Let the singular-value decomposition of  be , where ,

are orthonormal matrices representing the left and right eigenvalue matrices of

, respectively. is an rectangular diagonal matrix with non-zero values on the main diagonal. Hence, at the receiver, the received signal is spatially equalized as follows

 ~yi=BTiyi, (42)

and at the transmitter, the signal is spatially precoded such that

 xi=Fi~xi. (43)

From (41)–(43), can be represented by

 ~yi=Li~xi+~wi, (44)

where is i.i.d. Gaussian, since is orthonormal. Each element in is then

 ~y(ι)i=ℓ(ι)i~x(ι)i+~w(ι)i,ι=1,…,S, (45)

where represent the singular values of  in descending order. The received signal in (45) is equivalent to a set of  parallel channels, whose individual capacities can be achieved similar to Section III-B via transmitting  simultaneous lattice codebooks across antennas. The final step would be allocating the optimal power policy, which is waterfilling over time and space, as follows [33, Section 8.2.3]. Assuming that the joint probability distribution of is known, the power of stream  at time  is given by

 P(ι)i={c−1(ℓ(ι)i)2}+, (46)

where is chosen such that

 c≜S∑ι=1E[{c−1(ℓ(ι)i)2}+]=P, (47)

and  is the average power constraint. The extension to continuous-valued channels is similar to SISO and is omitted. This concludes the proof of Theorem 3. ∎

## Iv The MIMO Channel Without CSIT

In this section we consider the MIMO point-to-point channel with CSIR only. The received signal at time  is given by , where is the channel-coefficient matrix at time . For convenience channels gains are taken to be real-valued; the extension to complex-valued channels is straight forward and similar to [32]. The channel experiences block fading with coherence length , thus are identically distributed, and any two of them are independent if and only if taken from different fading blocks. For convenience, we also define to obey the same distribution, standing in for the prototypical MIMO channel gain matrix without reference to a specific time. Each codeword consists of channel uses, where is an integer multiple of the fading block length, i.e., . The transmitter knows the channel distribution, including the coherence length, but not the channel realizations. is the transmitted vector at time , where the codeword

 x≜[xT1xT2,…xTn]T (48)

is transmitted throughout  channel uses and satisfies . Unlike the achievable scheme in Section III-B, each codeword is transmitted across both space and time. The noise  defined by is zero-mean i.i.d. Gaussian with covariance . For convenience we define the SNR per transmit antenna to be . The ergodic capacity of the MIMO channel is given by [33]

 (49)

where  is the covariance matrix of each super-symbol . For a sequence of channel coefficients

drawn from an underlying distribution, weak law of large numbers implies that for each positive

and , a finite exists such that

 P(∣∣ (50)

Hence, the expression approaches its statistical mean with high probability as  grows. Hereafter we denote the left-hand side probability in (50) by .

###### Lemma 5.

Consider a MIMO channel , where , and are realizations of a stationary and ergodic process, and are only known at the receiver. Then there exists at least one lattice codebook that achieves rates satisfying

 R<12E[logdet(IM+ρ′HTH)]−η, (51)

with an arbitrary error probability , such that both diminish at large .

###### Proof.

Encoding: Nested lattice codes are used, where the coarse lattice has second moment . The codeword is composed of  super-symbols each of length , as shown in (48), which are transmitted throughout channel uses.

Decoding: The received signal can be expressed in the form , where is a block-diagonal matrix whose diagonal block  is . The received signal is multiplied by and the dither is removed as follows

 y′≜ UTsy+d = t+λ+z, (52)

where

 z≜(UTsHs−IMn)x+Uw, (53)

and is independent of , according to Lemma 2. is then a block-diagonal matrix, where the equalization matrix at time  is the MMSE matrix given by

 Ui=ρ′(IN+ρ′HiHTi)−1Hi. (54)

From (20),(54), is expressed as

 zi=−(IM+ρ′HTiHi)−1xi+ρ′HTi(IN+ρ′HiHTi)−1wi, (55)

where . We apply a version of the ambiguity decoder proposed in [17], defined by an ellipsoidal decision region , as follows

 Ω≜{v∈RMn : vTΣ−1sv≤(1+γ)Mn}, (56)

where is a block-diagonal matrix, whose diagonal block , , is given by

 Σi≜ρ′(IM+ρ′HTiHi)−1. (57)

Let . The volume of  is then

 Vol(Ω)=(1+γ)Mn2Vol(BMn(√Mnρ′))n∏i=1det(Ψi)−12. (58)

Error Probability: As shown in [17, Theorem 4], on averaging over the ensemble of fine lattices  of rate  that belong to the class proposed in Section II-B,

 1|L|∑LPe

for any , where (59) follows from (6). This is a union bound involving three events: the event that the average throughput achieved by the channel sequence is bounded away from its statistical mean by more than , and the event that the noise vector is outside the decision region, i.e., and the event that the post-equalized point is in the intersection of two decision regions, i.e., , where are two distinct lattice points. From (50), for any at large . Following in the footsteps in Appendix C, for any for large . Let . The error probability can then be bounded by

 ϵ′′≜1|L|∑LPe<ϵ′+(1+δ)2nRVol(Ω)Vol(V), (60)

for any . The second term in (60) is then

 ϵavg≜2−n(−R+12n∑ni=1logdet(Ψi)−ϵ′′′), (61)

where

 ϵ′′′≜1nlog(Vol(BMn(√Mnρ′))Vol(V))+log(1+γ)M2+1nlog(1+δ) (62)

From (7), the first term in (62) vanishes, and so do the second and third terms as increases. The probability of error averaged over the codebooks in is bounded by

 ϵ′′≜ϵ′+ϵavg. (63)

Then there exists at least one codebook that achieves , which converges to (51). The remainder of the proof follows Section III-A. ∎

Note that Lemma 5 does not imply the rate in (51) is universally achievable, since it does not guarantee the existence of a single codebook that achieves this rate for all fading sequences drawn from an underlying distribution. A similar approach was adopted in [34], whose universality is not conclusive. In the sequel we discuss the rates achievable using universal codebooks. Similar to Section III we first address channels with finite-support fading distributions and then extend the result to continuous-valued, unbounded fading.

We address point-to-point block-fading channels with coherence length , whose channel coefficients are drawn from a discrete distribution with finite-support . The following result utilizes the weak typicality arguments provided in Section II-C to show the existence of a nested pair of lattice codes that achieve rates within a constant gap to ergodic capacity.

###### Theorem 4.

For a stationary and ergodic block-fading MIMO channel with coherence interval and fading coefficients drawn from a finite-support distribution , a universal nested lattice code exists that achieves rates within a constant gap bits per channel use of the ergodic capacity, where  is the entropy rate of the fading process.

###### Proof.

Lemma 5 ensures the existence of one codebook in  that achieves the rate in (51) with error probability that is less than . We now show that if we allow a multiplicative increase in the error probability, numerous codebooks in  can support the rate  in (51).

###### Lemma 6.

For the channel under study in Lemma 5, at least  codebooks in  achieve the rate  in (51) with at most  error probability, for any where .

###### Proof.

We expurgate codebooks from  as follows. First, arrange the codebooks in descending order of the error probability on the MIMO channel defined in Lemma 5. Then, discard the first  codebooks. From (63), the error probability of each of the remaining codebooks is then bounded by , as follows 555 in (60) is independent of the codebook, so the average over codebooks is also .

 |L|ϵavg= |L|/κ∑ℓ=1ϵℓ+ϵ1+|L|/κ+|L|∑ℓ=2+|L|/κϵℓ ≥ |L|/κ∑ℓ=1ϵℓ+ϵ1+|L|/κ≥(1+1κ|L|)ϵ1+|L|/κ, (64)

Hence,

 (65)

Since  for any , each of the last  codebooks in  have error probability that does not exceed