Massive MIMO BSs, which are equipped with hundreds of antennas, exploit channel reciprocity to estimate the channel based on uplink pilots and spatially multiplex a large number of users on the same time–frequency resource [2, 3]. It is a promising technique to meet the growing demand for wireless data traffic of tomorrow [4, 5]. In a single-cell scenario, there is no need for computationally heavy decoding or precoding methods in Massive MIMO, such as successive interference cancellation or dirty paper coding. Linear processing schemes (e.g., zero-forcing combining) can effectively suppress interference and noise if the BS is equipped with a large number of antennas . In a multi-cell scenario, however, pilot-based channel estimation is contaminated by the non-orthogonal transmission in other cells. This results in coherent intercell interference in the data transmission, so-called pilot contamination , unless some advanced processing schemes are used to suppress it . Pilot contamination causes the gain of having more antennas to decrease and the SE of linear decoding methods, such as maximum-ratio combining (MRC) or zero-forcing, to saturate as the number of antennas grows.
Much work has been done to mitigate the effects of pilot contamination. The first and intuitive approach to mitigate pilot contamination is to increase the length of the pilots. In practical networks, however, it is not possible to make all pilots orthogonal due to the limited channel coherence block . Hence, there is a trade-off between having longer pilots and low pilot overhead. Another method to mitigate pilot contamination is to assign the pilots in a way that reduces the contamination 
, since only a few users from other cells cause substantial contamination. The pilot assignment is a combinatorial problem and heuristic algorithms with low computational complexity can be developed to mitigate the pilot contamination. In, a greedy pilot assignment method is developed that exploits the statistical channel information and mutual interference between users. Pilot assignment approaches still suffer from asymptotic SE saturation since we only change one contaminating user for a less contaminating user. A third method is to utilize the spatial correlation to mitigate the coherent interference using multi-cell minimum-mean-square error (M-MMSE) decoding , but this method has high computational complexity.
Instead of combating pilot contamination, one can utilize it using more advanced decoding schemes [12, 13, 14]. This approach was initially called pilot contamination decoding since the BSs cooperate in suppressing the pilot contamination . The original form of this technique used simplistic MRC, which does not suppress interference very well, thus it required a huge number of antennas to be effective . The latest version of this decoding design, called large-scale fading decoding (LSFD) , was designed to be useful also with a practical number of antennas. In the two-layer LSFD framework, each BS applies an arbitrary local linear decoding method in the first layer, preferably one that suppresses intra-cell interference. The result is then gathered at a common central station that applies so-called LSFD vectors in a second-layer to combine the signals from multiple BSs to suppress pilot contamination and inter-cell interference. This new decoding design overcomes the aforementioned limitations in  and attains high SE even with a limited number of BS antennas.
To explain why LFSD vectors are necessary to mitigate pilot contamination, we consider a toy example comprising of two BSs, each serving one user with the same index as their BS. There are four different channels and denotes the channel between BS and user for . Let denote the desired signal from the user in cell . When using single-layer decoding with MRC, the noise vanishes as , but pilot contamination remains . The resulting detected signals at the two BSs are then given by
Since each BS observes a linear combination of the two signals, the asymptotic SE achieved with single-layer decoding is limited due to interference. However, in a two-layer decoding system, a central station can process and to remove the interference as follows:
The rows of the inverse matrix are called the LFSD vectors and only depends on the statistical parameters , so the central station does not need to know the instantaneous channels. Since the resulting signals in (2) are free from noise and interference, the network can achieve an unbounded SE as .
This motivating example, adapted from , exploits the fact that the channels are spatially uncorrelated and requires an infinite number of antennas. The prior works [12, 14] are only considering uncorrelated Rayleigh fading channels and rely on the particular asymptotic properties of that channel model. The generalization to more practical correlated channels is non-trivial and has not been considered until now.111The concurrent work  appeared just as we were submitting this paper. It contains the uplink SE for correlated Rayleigh fading described by the one-ring model and MMSE estimation, while we consider arbitrary spatial correlation and uses two types of channel estimators. Moreover, they consider joint power control and LFSD for max-min fairness, while we consider sum SE optimization, making the papers complementary. In this paper, we consider spatially correlated channels with a finite number of antennas. We stress that these generalizations are practically important: if two-layer decoding will ever be implemented in practice, the channels will be subject to spatial correlation and the BSs will have a limited number of antennas.
I-a Main Contributions
In this paper, we generalize the LSFD method from [12, 14] to a scenario with correlated Rayleigh fading and arbitrary first-layer decoders, and also develop a method for data power control in the system. We evaluate the performance by deriving an SE expression for the system. Our main contributions are summarized as follows:
An uplink per-user SE is derived as a function of the second-layer decoding weights. A closed-form expression is then obtained for correlated Rayleigh fading and a system that uses MRC in the first decoding layer and an arbitrary choice of LSFD in the second layer. The second-layer decoding weights that maximize the SE follows in closed form.
An uplink sum SE optimization problem with power constraints is formulated. Because it is non-convex and NP-hard, we propose an alternating optimization approach that converges to a local optimum with polynomial complexity.
Numerical results demonstrate the effectiveness of two-layer decoding for Massive MIMO communication systems with correlated Rayleigh fading.
The rest of this paper is organized as follows: Multi-cell Massive MIMO with two-layer decoding is presented in Section II. An SE for the uplink together with the optimal LSFD design is presented in Section III. A maximization problem for the sum SE is formulated and a solution is proposed in Section IV. Numerical results in Section V demonstrate the performance of the proposed system. Section VI states the major conclusions of the paper.
Reproducible research: All the simulation results can be reproduced using the Matlab code and data files available at: https://github.com/emilbjornson/large-scale-fading-decoding
: Lower and upper case bold letters are used for vectors and matrices. The expectation of the random variableis denoted by and the Euclidean norm of the vector by . The transpose and Hermitian transpose of a matrix are written as and , respectively. The -dimensional diagonal matrix with the diagonal elements is denoted . and are the real and imaginary parts of a complex number. denotes the gradient of a multivariate function at . Finally,
is a vector of circularly symmetric, complex, jointly Gaussian distributed random variables with zero mean and correlation matrix.
Ii System Model
We consider a network with cells. Each cell consists of a BS equipped with antennas that serves single-antenna users.222In the uplink, the considered network consists of multiple interfering single-input multiple-output (SIMO) channels. Such a setup has been referred to as multiuser MIMO in the information theoretic-literature for decades, which is why we adopt this terminology in the paper. The -dimensional channel vector in the uplink between user in cell and BS is denoted by . We consider the standard block-fading model, where the channels are static within a coherence block of size channel uses and assume one independent realization in each block, according to a stationary ergodic random process. Each channel follows a correlated Rayleigh fading model:
where is the spatial correlation matrix of the channel. The BSs know the channel statistics, but have no prior knowledge of the channel realizations, which need to be estimated in every coherence block.
Ii-a Channel Estimation
As in conventional Massive MIMO , the channels are estimated by letting the users transmit -symbol long pilots in a dedicated part of the coherence block, called the pilot phase. All the cells share a common set of mutually orthogonal pilots , where the pilot spans symbols. Such orthogonal pilots are disjointly distributed among the users in each cell:
Without loss of generality, we assume that all the users in different cells, which share the same index, use the same pilot and thereby cause pilot contamination to each other .
During the pilot phase, at BS , the signals received in the pilot phase are collectively denoted by the -dimensional matrix and it is given by
where is the power of the pilot of user in cell and is a matrix of independent and identically distributed noise terms, each distributed as .
An intermediate observation of the channel from user to BS is obtained through correlation with the pilot of user in the following way:
where are independent over and . The channel estimate and estimation error of the MMSE estimation of is presented in the following lemma.
If BS uses MMSE estimation based on the observation in (6), the estimate of the channel between user in cell and BS is
where is given by
The channel estimate is distributed as
and the channel estimation error, , is independently distributed as
Lemma 1 provides statistical information for the BS to construct the decoding and precoding vectors for the up- and downlink data transmission. However, to compute the MMSE estimate, the inverse matrix of has to be computed for every user, which can lead to a computational complexity that might be infeasible when there are many antennas. This motivates us to use the simpler estimation technique called element-wise MMSE (EW-MMSE) .
To simplify the presentation, we make the standard assumption that the correlation matrix has equal diagonal elements, denoted by . This assumption is well motivated for elevated macro BSs that only observe far-field scattering effects from every cell. However, EW-MMSE estimation of the channel can also be done when the diagonal elements are different. The generalization to this case is straightforward. EW-MMSE estimation is given in Lemma 2 together with the statistics of the estimates.
If BS uses EW-MMSE estimation and the diagonal elements of the spatial correlation matrix of the channel are equal, the channel estimate between user in cell and BS is
and the channel estimate and estimation error of are distributed as
and are not independent.
The statistics of the estimate and estimation error follow from straightforward computation of the correlation matrices and the derivation is therefore omitted. ∎
As compared to MMSE estimation, EW-MMSE estimation simplifies the computations, since no inverse matrix computation is involved. Moreover, each BS only needs to know the diagonal of the spatial correlation matrices, which are easier to acquire in practice than the full matrices. We can also observe the relationship between two users utilizing nonorthogonal pilots by a simple expression as shown in Corollary 1.
When the diagonal elements of the spatial correlation matrix of the channel are equal, the two EW-MMSE estimates and of the channels of users in cells and that are computed at BS are related as:
Corollary 1 mathematically shows that the channel estimates of two users with the same pilot signal only differ from each other by a scaling factor. Using EW-MMSE estimation leads to severe pilot contamination that cannot be mitigated by linear processing of the data signal only, at least not with the approach in .
Ii-B Uplink Data Transmission
During the data phase, it is assumed that user in cell sends a zero-mean symbol
with variance. The received signal at BS is then
where denotes the transmit power of user in cell . Based on the signals in (17), the BSs decode the symbols with the two-layers-decoding technique that is illustrated in Fig. 1. The general idea of a two-layer decoding system is that each BS decodes the desired signals from its coverage area in the first layer. A central station is then collecting the decoded signals of all users that used the same pilot and jointly processes these signals in the second layer to suppress inter-cell interference using LSFD vectors. In detail, an estimate of the symbol from user in cell is obtained by local linear decoding in the first layer as
where is the linear decoding vector. The symbol estimate generally contains interference and, in Massive MIMO, the pilot contamination from all the users with the same pilot sequence is particularly large. To mitigate the pilot contamination, all the symbol estimates of the contaminating users are collected in a vector
After the local decoding, a second layer of centralized decoding is performed on this vector using the LSFD vector , where is the LSFD weight. The final estimate of the data symbol from user in cell is then given by
In the next section, we use the decoded signals together with the asymptotic channel properties [17, Section 2.5] to derive a closed-from expression of an uplink SE.
Iii Uplink Performance Analysis
This section first derives a general SE expression for each user in each cell and a closed-form expression when using MRC. These expressions are then used to obtain the LSFD vectors that maximize the SE. The use-and-then-forget capacity bounding technique [6, Chapter 2.3.4], [8, Section 4.3] allows us to compute a lower bound on the uplink ergodic capacity (i.e., an achievable SE). We first rewrite (20) as
A lower bound on the uplink ergodic capacity is
where the effective SINR, denoted by , is
where is given by
Here and stand for the desired signal, the pilot contamination, the beamforming gain uncertainty, the non-coherent interference, and the additive noise, respectively, whose expectations are defined as
Note that the lower bound on the uplink ergodic capacity in Lemma 3 can be applied to any linear decoding method and any LSFD design.
To maximize the SE of user in cell is equivalent to selecting the LSFD vector that maximizes a Rayleigh quotient as shown in the proof of the following theorem. This is the first main contribution of this paper.
For a given set of pilot and data power coefficients, the SE of user in cell is
where the matrices and the vector are defined as
and the vectors are defined as
In order to attain this SE, the LSFD vector is formulated as
The proof is available in Appendix B. ∎
We stress that the LSFD vector in (37) is designed to maximize the SE in (30) for every user in the network for a given data and pilot power and a given first-layer decoder. Note that Theorem 1 can be applied to practical correlated Rayleigh fading channels with either MMSE or EW-MMSE estimation and any conceivable choice of first-layer decoder. This stands in contrast to the previous work [18, 14] that only considered uncorrelated Rayleigh fading channels, which are unlikely to occur in practice, and particular linear combining methods that were selected to obtained closed-form expressions. Theorem 1 explicitly reveals the influence that mutual interference and noise have on the SE when utilizing the optimal LFSD vector given in (37): determines the amount of remaining pilot contamination from the users using the same pilot sequence as user in cell . The beamforming gain uncertainty is represented by , while is the noncoherent mutual interference from the remaining users and represent the additive noise.
The following theorem states a closed-form expression of the SE for the case of MRC, i.e., . This is the second main contribution of this paper.
When MRC is used, the SE in (22) of user in cell is given by
where the SINR value is given in (39).
The values and are different depending on the channel estimation technique. MMSE estimation results in
whereas EW-MMSE results in
Theorem 2 describes the exact impact of the spatial correlation of the channel on the system performance through the coefficients and . It is seen that the numerator of (39) grows as the square of the number of antennas, , since the trace in (40) is the sum of terms. This gain comes from the coherent combination of the signals from the antennas. It can also be seen from Theorem 2 that the pilot contamination in (20) combines coherently, i.e., its variance—the first term in the denominator that contains —grows as . The other terms in the denominator represent the impact of non-coherent interference and Gaussian noise, respectively. These two terms only grow as . Since the interference terms contain products of correlation matrices of different users, the interference is smaller between users that have very different spatial correlation characteristics .
The following corollary gives the optimal LSFD vector that maximizes the SE of every user in the network for a given set of pilot and data powers, which is expected to work well when each BS is equipped with a practical number of antennas.
Even though Corollary 2 is a special case of Theorem 1 when MRC is used, its contributions are two-fold: The LSFD vector is computed in the closed form which is independent of the small-scale fading, so it is easy to compute and store. Moreover, this LSFD vector is the generalization of the vector given in  to the larger class of correlated Rayleigh fading channels.
Iv Data Power Control and LFSD Design for Sum SE Optimization
In this section, how to choose the powers (power control) and the LSFD vector to maximize the sum SE is investigated. The sum SE maximization problem for a multi-cell Massive MIMO system is first formulated based on results from previous sections. Next, an iterative algorithm based on solving a series of convex optimization problems is proposed to efficiently find a stationary point.
Iv-a Problem Formulation
We consider sum SE maximization:
This can be shown to be a non-convex and NP-hard problem using the same methodology as in , even if the fine details will be different since that paper considers small-scale multi-user MIMO systems with perfect channel knowledge. Therefore, the global optimum is difficult to find in general. Nevertheless, solving the ergodic sum SE maximization (51) for a Massive MIMO system is more practical than maximizing the instantaneous SEs for a small-scale MIMO network and a given realization of the small-scale fading [20, 21]. In contrast, the sum SE maximization in (51) only depends on the large-scale fading coefficients, which simplifies matters and allows the solution to be used for a long period of time. Another key difference from prior work is that we jointly optimize the data powers and LSFD vectors.
Instead of seeking the global optimum to (51), which has an exponential computational complexity, we will use the weighted MMSE method [22, 23] to obtain a stationary point to (51) in polynomial time. This is a standard method to break down a sum SE problem into subproblems that can be solved sequentially. We stress that the resulting subproblems and algorithms are different for every problem that the method is applied to, thus our solution is a novel contribution. To this end, we first formulate the weighted MMSE problem from (51) as shown in Theorem 3.
The proof is available in Appendix E. ∎