# Large-Scale-Fading Decoding in Cellular Massive MIMO Systems with Spatially Correlated Channels

Massive multiple-input--multiple-output (MIMO) systems can suffer from coherent intercell interference due to the phenomenon of pilot contamination. This paper investigates a two-layer decoding method that mitigates both coherent and non-coherent interference in multi-cell Massive MIMO. To this end, each base station (BS) first estimates the channel to intra-cell users using either minimum mean-squared error (MMSE) or element-wise MMSE (EW-MMSE) estimation based on uplink pilots. The estimates are used for local decoding on each BS followed by a second decoding layer where the BSs cooperate to mitigate inter-cell interference. An uplink achievable spectral efficiency (SE) expression is computed for arbitrary two-layer decoding schemes. A closed-form expression is then obtained for correlated Rayleigh fading, maximum-ratio combining, and the proposed large-scale fading decoding (LSFD) in the second layer. We also formulate a sum SE maximization problem with both the data power and LSFD vectors as optimization variables. Since this is an NP-hard problem, we develop a low-complexity algorithm based on the weighted MMSE approach to obtain a local optimum. Numerical results show that both data power control and LSFD improves the sum SE performance over single-layer decoding multi-cell Massive MIMO systems.

## Authors

• 12 publications
• 3 publications
• 73 publications
• ### Two-Layer Decoding in Cellular Massive MIMO Systems with Spatial Channel Correlation

This paper studies a two-layer decoding method that mitigates inter-cell...
03/17/2019 ∙ by Trinh Van Chien, et al. ∙ 0

• ### Large-Scale Fading Precoding for Spatially Correlated Rician Fading with Phase Shifts

We consider large-scale fading precoding (LSFP), which is a two-layer pr...
06/25/2020 ∙ by Özlem Tugfe Demir, et al. ∙ 0

• ### Performance of Cell-Free Massive MIMO with Rician Fading and Phase Shifts

In this paper, we study the uplink (UL) and downlink (DL) spectral effic...
03/18/2019 ∙ by Özgecan Özdogan, et al. ∙ 0

• ### Spectral Efficiency of Dense Multicell Massive MIMO Networks in Spatially Correlated Channels

This paper is on the spectral efficiency (SE) of a dense multi-cell mass...
11/04/2020 ∙ by FahimeSadat Mirhosseini, et al. ∙ 0

• ### Performance of Multi-Cell Massive MIMO Systems With Interference Decoding

We consider a multi-cell massive MIMO system where a time-division duple...
11/25/2019 ∙ by Meysam Shahrbaf Motlagh, et al. ∙ 0

• ### Spectral Efficiency Analysis of Multi-Cell Massive MIMO Systems with Ricean Fading

This paper investigates the spectral efficiency of multi-cell massive mu...
08/26/2018 ∙ by Pei Liu, et al. ∙ 0

• ### Uplink Power Control in Cellular Massive MIMO Systems: Coping With the Congestion Issue

One main goal of 5G-and-beyond systems is to simultaneously serve many u...
06/14/2020 ∙ by Trinh Van Chien, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Massive MIMO BSs, which are equipped with hundreds of antennas, exploit channel reciprocity to estimate the channel based on uplink pilots and spatially multiplex a large number of users on the same time–frequency resource [2, 3]. It is a promising technique to meet the growing demand for wireless data traffic of tomorrow [4, 5]. In a single-cell scenario, there is no need for computationally heavy decoding or precoding methods in Massive MIMO, such as successive interference cancellation or dirty paper coding. Linear processing schemes (e.g., zero-forcing combining) can effectively suppress interference and noise if the BS is equipped with a large number of antennas [6]. In a multi-cell scenario, however, pilot-based channel estimation is contaminated by the non-orthogonal transmission in other cells. This results in coherent intercell interference in the data transmission, so-called pilot contamination [7], unless some advanced processing schemes are used to suppress it [8]. Pilot contamination causes the gain of having more antennas to decrease and the SE of linear decoding methods, such as maximum-ratio combining (MRC) or zero-forcing, to saturate as the number of antennas grows.

Much work has been done to mitigate the effects of pilot contamination. The first and intuitive approach to mitigate pilot contamination is to increase the length of the pilots. In practical networks, however, it is not possible to make all pilots orthogonal due to the limited channel coherence block [9]. Hence, there is a trade-off between having longer pilots and low pilot overhead. Another method to mitigate pilot contamination is to assign the pilots in a way that reduces the contamination [10]

, since only a few users from other cells cause substantial contamination. The pilot assignment is a combinatorial problem and heuristic algorithms with low computational complexity can be developed to mitigate the pilot contamination. In

[11], a greedy pilot assignment method is developed that exploits the statistical channel information and mutual interference between users. Pilot assignment approaches still suffer from asymptotic SE saturation since we only change one contaminating user for a less contaminating user. A third method is to utilize the spatial correlation to mitigate the coherent interference using multi-cell minimum-mean-square error (M-MMSE) decoding [8], but this method has high computational complexity.

Instead of combating pilot contamination, one can utilize it using more advanced decoding schemes [12, 13, 14]. This approach was initially called pilot contamination decoding since the BSs cooperate in suppressing the pilot contamination [12]. The original form of this technique used simplistic MRC, which does not suppress interference very well, thus it required a huge number of antennas to be effective [13]. The latest version of this decoding design, called large-scale fading decoding (LSFD) [14], was designed to be useful also with a practical number of antennas. In the two-layer LSFD framework, each BS applies an arbitrary local linear decoding method in the first layer, preferably one that suppresses intra-cell interference. The result is then gathered at a common central station that applies so-called LSFD vectors in a second-layer to combine the signals from multiple BSs to suppress pilot contamination and inter-cell interference. This new decoding design overcomes the aforementioned limitations in [12] and attains high SE even with a limited number of BS antennas.

To explain why LFSD vectors are necessary to mitigate pilot contamination, we consider a toy example comprising of two BSs, each serving one user with the same index as their BS. There are four different channels and denotes the channel between BS  and user  for . Let denote the desired signal from the user in cell . When using single-layer decoding with MRC, the noise vanishes as , but pilot contamination remains [3]. The resulting detected signals at the two BSs are then given by

 [^s1^s2]=[β1,1s1+β1,2s2β2,1s1+β2,2s2]=[β1,1β1,2β2,1β2,2]≜B[s1s2]. (1)

Since each BS observes a linear combination of the two signals, the asymptotic SE achieved with single-layer decoding is limited due to interference. However, in a two-layer decoding system, a central station can process and to remove the interference as follows:

 B−1[^s1^s2]=B−1B[s1s2]=[s1s2]. (2)

The rows of the inverse matrix are called the LFSD vectors and only depends on the statistical parameters , so the central station does not need to know the instantaneous channels. Since the resulting signals in (2) are free from noise and interference, the network can achieve an unbounded SE as .

This motivating example, adapted from [12], exploits the fact that the channels are spatially uncorrelated and requires an infinite number of antennas. The prior works [12, 14] are only considering uncorrelated Rayleigh fading channels and rely on the particular asymptotic properties of that channel model. The generalization to more practical correlated channels is non-trivial and has not been considered until now.111The concurrent work [15] appeared just as we were submitting this paper. It contains the uplink SE for correlated Rayleigh fading described by the one-ring model and MMSE estimation, while we consider arbitrary spatial correlation and uses two types of channel estimators. Moreover, they consider joint power control and LFSD for max-min fairness, while we consider sum SE optimization, making the papers complementary. In this paper, we consider spatially correlated channels with a finite number of antennas. We stress that these generalizations are practically important: if two-layer decoding will ever be implemented in practice, the channels will be subject to spatial correlation and the BSs will have a limited number of antennas.

### I-a Main Contributions

In this paper, we generalize the LSFD method from [12, 14] to a scenario with correlated Rayleigh fading and arbitrary first-layer decoders, and also develop a method for data power control in the system. We evaluate the performance by deriving an SE expression for the system. Our main contributions are summarized as follows:

• An uplink per-user SE is derived as a function of the second-layer decoding weights. A closed-form expression is then obtained for correlated Rayleigh fading and a system that uses MRC in the first decoding layer and an arbitrary choice of LSFD in the second layer. The second-layer decoding weights that maximize the SE follows in closed form.

• An uplink sum SE optimization problem with power constraints is formulated. Because it is non-convex and NP-hard, we propose an alternating optimization approach that converges to a local optimum with polynomial complexity.

• Numerical results demonstrate the effectiveness of two-layer decoding for Massive MIMO communication systems with correlated Rayleigh fading.

The rest of this paper is organized as follows: Multi-cell Massive MIMO with two-layer decoding is presented in Section II. An SE for the uplink together with the optimal LSFD design is presented in Section III. A maximization problem for the sum SE is formulated and a solution is proposed in Section IV. Numerical results in Section V demonstrate the performance of the proposed system. Section VI states the major conclusions of the paper.

Reproducible research: All the simulation results can be reproduced using the Matlab code and data files available at: https://github.com/emilbjornson/large-scale-fading-decoding

Notation

: Lower and upper case bold letters are used for vectors and matrices. The expectation of the random variable

is denoted by and the Euclidean norm of the vector by . The transpose and Hermitian transpose of a matrix are written as and , respectively. The -dimensional diagonal matrix with the diagonal elements is denoted . and are the real and imaginary parts of a complex number. denotes the gradient of a multivariate function at . Finally,

is a vector of circularly symmetric, complex, jointly Gaussian distributed random variables with zero mean and correlation matrix

.

## Ii System Model

We consider a network with  cells. Each cell consists of a BS equipped with antennas that serves single-antenna users.222In the uplink, the considered network consists of multiple interfering single-input multiple-output (SIMO) channels. Such a setup has been referred to as multiuser MIMO in the information theoretic-literature for decades, which is why we adopt this terminology in the paper. The -dimensional channel vector in the uplink between user  in cell  and BS  is denoted by . We consider the standard block-fading model, where the channels are static within a coherence block of size channel uses and assume one independent realization in each block, according to a stationary ergodic random process. Each channel follows a correlated Rayleigh fading model:

 hl′l,k∼CN(0,Rl′l,k), (3)

where is the spatial correlation matrix of the channel. The BSs know the channel statistics, but have no prior knowledge of the channel realizations, which need to be estimated in every coherence block.

### Ii-a Channel Estimation

As in conventional Massive MIMO [8], the channels are estimated by letting the users transmit -symbol long pilots in a dedicated part of the coherence block, called the pilot phase. All the cells share a common set of mutually orthogonal pilots , where the pilot spans symbols. Such orthogonal pilots are disjointly distributed among the users in each cell:

 ϕHkϕk′={τpk=k′,0k≠k. (4)

Without loss of generality, we assume that all the users in different cells, which share the same index, use the same pilot and thereby cause pilot contamination to each other [3].

During the pilot phase, at BS , the signals received in the pilot phase are collectively denoted by the -dimensional matrix and it is given by

 Yl=L∑l′=1K∑k=1√^pl′,khll′,kϕHk+Nl, (5)

where is the power of the pilot of user  in cell  and is a matrix of independent and identically distributed noise terms, each distributed as .

An intermediate observation of the channel from user  to BS  is obtained through correlation with the pilot of user  in the following way:

 ~yl,k=Ylϕk=τp√^pl,khll,k+L∑l′=1l′≠lτp√^pl′,khll′,k+~nl,k, (6)

where are independent over and . The channel estimate and estimation error of the MMSE estimation of is presented in the following lemma.

###### Lemma 1.

If BS  uses MMSE estimation based on the observation in (6), the estimate of the channel between user  in cell  and BS  is

 ^hll,k=√^pl,kRll,kΨ−1l,k~yl,k, (7)

where is given by

 Ψl,k≜L∑l′=1τp^pl′,kRll′,k+σ2IM. (8)

The channel estimate is distributed as

 ^hll,k∼CN(0,τp^pl,kRll,kΨ−1l,kRll,k), (9)

and the channel estimation error, , is independently distributed as

 ell,k∼CN(0,Rll,k−τp^pl,kRll,kΨ−1l,kRll,k). (10)
###### Proof.

This lemma follows from adopting standard MMSE estimation results from [16], [17, Section 3] to our system model and notation. ∎

Lemma 1 provides statistical information for the BS to construct the decoding and precoding vectors for the up- and downlink data transmission. However, to compute the MMSE estimate, the inverse matrix of has to be computed for every user, which can lead to a computational complexity that might be infeasible when there are many antennas. This motivates us to use the simpler estimation technique called element-wise MMSE (EW-MMSE) [17].

To simplify the presentation, we make the standard assumption that the correlation matrix has equal diagonal elements, denoted by . This assumption is well motivated for elevated macro BSs that only observe far-field scattering effects from every cell. However, EW-MMSE estimation of the channel can also be done when the diagonal elements are different. The generalization to this case is straightforward. EW-MMSE estimation is given in Lemma 2 together with the statistics of the estimates.

###### Lemma 2.

If BS  uses EW-MMSE estimation and the diagonal elements of the spatial correlation matrix of the channel are equal, the channel estimate between user  in cell  and BS  is

 ^hll,k=ϱll,k~yl,k, (11)

where

 ϱll,k=√^pl,kβll,k∑Ll′=1τp^pl′,kβll′,k+σ2, (12)

and the channel estimate and estimation error of are distributed as

 ^hll,k ∼CN(0,(ϱll,k)2τpΨl,k), (13) ell,k ∼CN(0,Rll,k−(ϱll,k)2τpΨl,k) (14)

and are not independent.

###### Proof.

The statistics of the estimate and estimation error follow from straightforward computation of the correlation matrices and the derivation is therefore omitted. ∎

As compared to MMSE estimation, EW-MMSE estimation simplifies the computations, since no inverse matrix computation is involved. Moreover, each BS only needs to know the diagonal of the spatial correlation matrices, which are easier to acquire in practice than the full matrices. We can also observe the relationship between two users utilizing nonorthogonal pilots by a simple expression as shown in Corollary 1.

###### Corollary 1.

When the diagonal elements of the spatial correlation matrix of the channel are equal, the two EW-MMSE estimates and of the channels of users  in cells  and that are computed at BS  are related as:

 ^hll,k√^pl,kβll,k=^hll′′,k√^pl′′,kβll′′,k, (15)

where with

 ϱll′′,k=√^pl′′,kβll′′,k/(L∑l′=1τp^pl′,kβll′,k+σ2). (16)

Corollary 1 mathematically shows that the channel estimates of two users with the same pilot signal only differ from each other by a scaling factor. Using EW-MMSE estimation leads to severe pilot contamination that cannot be mitigated by linear processing of the data signal only, at least not with the approach in [8].

During the data phase, it is assumed that user  in cell  sends a zero-mean symbol

with variance

. The received signal at BS  is then

 yl=L∑l′=1K∑k=1√pl′,khll′,ksl′,k+nl, (17)

where denotes the transmit power of user  in cell . Based on the signals in (17), the BSs decode the symbols with the two-layers-decoding technique that is illustrated in Fig. 1. The general idea of a two-layer decoding system is that each BS decodes the desired signals from its coverage area in the first layer. A central station is then collecting the decoded signals of all users that used the same pilot and jointly processes these signals in the second layer to suppress inter-cell interference using LSFD vectors. In detail, an estimate of the symbol from user  in cell  is obtained by local linear decoding in the first layer as

 ~sl,k=vHl,kyl=L∑l′=1K∑k′=1√pl′,k′vHl,khll′,k′sl′,k′+vHl,knl, (18)

where is the linear decoding vector. The symbol estimate generally contains interference and, in Massive MIMO, the pilot contamination from all the users with the same pilot sequence is particularly large. To mitigate the pilot contamination, all the symbol estimates of the contaminating users are collected in a vector

 ~sk≜[~s1,k,~s2,k,…,~sL,k]T∈CL. (19)

After the local decoding, a second layer of centralized decoding is performed on this vector using the LSFD vector , where is the LSFD weight. The final estimate of the data symbol from user  in cell  is then given by

 ^sl,k=aHl,k~sk=L∑l′=1(al′l,k)∗~sl′,k. (20)

In the next section, we use the decoded signals together with the asymptotic channel properties [17, Section 2.5] to derive a closed-from expression of an uplink SE.

This section first derives a general SE expression for each user  in each cell  and a closed-form expression when using MRC. These expressions are then used to obtain the LSFD vectors that maximize the SE. The use-and-then-forget capacity bounding technique [6, Chapter 2.3.4], [8, Section 4.3] allows us to compute a lower bound on the uplink ergodic capacity (i.e., an achievable SE). We first rewrite (20) as

 ^sl,k=K∑k′=1k′≠kL∑l′=1(al′l,k)∗E{vHl′,khl′l,k}√pl,ksl,k +L∑l′=1(al′l,k)∗L∑l′′=1l′′≠lE{vHl′,khl′l′′,k}√pl′′,ksl′′,k +L∑l′=1(al′l,k)∗L∑l′′=1(vHl′,khl′l′′,k−E{vHl′,khl′l′′,k})√pl′′,ksl′′,k +L∑l′=1(al′l,k)∗L∑l′′=1K∑k′=1k′≠k√pl′′,k′vHl′,khl′l′′,k′sl′′,k′ +K∑k′=1k′≠kL∑l′=1(al′l,k)∗vHl′,knl′, (21)

then by considering the first part of (III) as the desired signal from user  in cell  while the remaining is effective Gaussian noise, a lower bound on the uplink ergodic capacity is shown in Lemma 3.

###### Lemma 3.

A lower bound on the uplink ergodic capacity is

 Rl,k=max{al′l,k}(1−τpτc)log2(1+SINRl,k), (22)

where the effective SINR, denoted by , is

 SINRl,k=E{|DSl,k|2}/Dl,k, (23)

where is given by

 Dl,k=E{|PCl,k|2}+E{|BUl,k|2}+L∑l′=1K∑k′=1k′≠kE{|NIl′,k′|2}+E{|ANl,k|2}. (24)

Here and stand for the desired signal, the pilot contamination, the beamforming gain uncertainty, the non-coherent interference, and the additive noise, respectively, whose expectations are defined as

 E{|DSl,k|2} ≜pl,k∣∣ ∣∣L∑l′=1(al′l,k)∗E{vHl′,khl′l,k}∣∣ ∣∣2, (25) E{|PCl,k|2} ≜L∑l′′=1l′′≠lpl′′,k∣∣ ∣∣L∑l′=1(al′l,k)∗E{vHl′,khl′l′′,k}∣∣ ∣∣2, (26)
 E{|BUl,k|2} E{vHl′′,khl′′l′,k})∣∣∣2}, (27) E{|NIl′,k′|2} ≜pl′,k′E⎧⎨⎩∣∣ ∣∣L∑l′′=1(al′′l,k)∗vHl′′,khl′′l′,k′∣∣ ∣∣2⎫⎬⎭, (28) E{|ANl,k|2} ≜E⎧⎨⎩∣∣ ∣∣L∑l′=1(al′l,k)∗(^hl′l′,k)Hnl′∣∣ ∣∣2⎫⎬⎭. (29)

Note that the lower bound on the uplink ergodic capacity in Lemma 3 can be applied to any linear decoding method and any LSFD design.

To maximize the SE of user  in cell  is equivalent to selecting the LSFD vector that maximizes a Rayleigh quotient as shown in the proof of the following theorem. This is the first main contribution of this paper.

###### Theorem 1.

For a given set of pilot and data power coefficients, the SE of user  in cell  is

 Rl,k=(1−τpτc)log2⎛⎝1+pl,kbHl,k(4∑i=1C(i)l,k)−1bl,k⎞⎠, (30)

where the matrices and the vector are defined as

 C(1)l,k ≜L∑l′=1l′≠lpl′,kbl′,kbHl′,k, (31) C(2)l,k (32) C(3)l,k ≜diag⎛⎜ ⎜ ⎜⎝L∑l′=1K∑k′=1k′≠kpl′,k′E{∣∣vH1,kh1l′,k′∣∣2},…, L∑l′=1K∑k′=1k′≠kpl′,k′E{∣∣vHL,khLl′,k′∣∣2}⎞⎟ ⎟ ⎟⎠, (33) C(4)l,k (34)

and the vectors are defined as

 bl′,k ≜[E{vH1,kh1l′,k},…,E{vHL,khLl′,k}]T, (35) ~bl′,k ≜[vH1,kh1l′,k,…,vHL,khLl′,k]T−bl′,k. (36)

In order to attain this SE, the LSFD vector is formulated as

 (37)
###### Proof.

The proof is available in Appendix B. ∎

We stress that the LSFD vector in (37) is designed to maximize the SE in (30) for every user in the network for a given data and pilot power and a given first-layer decoder. Note that Theorem 1 can be applied to practical correlated Rayleigh fading channels with either MMSE or EW-MMSE estimation and any conceivable choice of first-layer decoder. This stands in contrast to the previous work [18, 14] that only considered uncorrelated Rayleigh fading channels, which are unlikely to occur in practice, and particular linear combining methods that were selected to obtained closed-form expressions. Theorem 1 explicitly reveals the influence that mutual interference and noise have on the SE when utilizing the optimal LFSD vector given in (37): determines the amount of remaining pilot contamination from the users using the same pilot sequence as user  in cell . The beamforming gain uncertainty is represented by , while is the noncoherent mutual interference from the remaining users and represent the additive noise.

The following theorem states a closed-form expression of the SE for the case of MRC, i.e., . This is the second main contribution of this paper.

###### Theorem 2.

When MRC is used, the SE in (22) of user  in cell  is given by

 (38)

where the SINR value is given in (39).

The values and are different depending on the channel estimation technique. MMSE estimation results in

 bl′′l′,k (40) cl′,k′l′′,k =^pl′′,ktr(Rl′′l′′,kΨ−1l′′,kRl′′l′′,kRl′′l′,k′), (41) dl′,k =σ2^pl′,ktr(Ψ−1l′,kRl′l′,kRl′l′,k), (42)

whereas EW-MMSE results in

 bl′′l′,k =√τpϱl′′l′′,kϱl′′l′,ktr(Ψl′′,k), (43) cl′,k′l′′,k =(ϱl′′l′′,k)2tr(Rl′′l′,k′Ψl′′,k), (44) dl′,k =(ϱl′l′,k)2σ2tr(Ψl′,k). (45)
###### Proof.

The proofs consist of computing the moments of complex Gaussian distributions. They are available in Appendix

C and Appendix D for MMSE and EW-MMSE estimation, respectively. ∎

Theorem 2 describes the exact impact of the spatial correlation of the channel on the system performance through the coefficients and . It is seen that the numerator of (39) grows as the square of the number of antennas, , since the trace in (40) is the sum of terms. This gain comes from the coherent combination of the signals from the antennas. It can also be seen from Theorem 2 that the pilot contamination in (20) combines coherently, i.e., its variance—the first term in the denominator that contains —grows as . The other terms in the denominator represent the impact of non-coherent interference and Gaussian noise, respectively. These two terms only grow as . Since the interference terms contain products of correlation matrices of different users, the interference is smaller between users that have very different spatial correlation characteristics [17].

The following corollary gives the optimal LSFD vector that maximizes the SE of every user in the network for a given set of pilot and data powers, which is expected to work well when each BS is equipped with a practical number of antennas.

###### Corollary 2.

For a given set of data and pilot powers, by using MRC and LSFD, the SE in Theorem 2 is given in the closed form as

 Rl,k=(1−τpτc)log2(1+pl,kbHl,kC−1l,kbl,k) (46)

where and are defined as

 Cl,k ≜L∑l′=1l′≠lpl′,kbl′,kbHl′,k+diag(L∑l′=1K∑k′=1pl′,k′cl′,k′1,k+d1,k, …,L∑l′=1K∑k′=1pl′,k′cl′,k′L,k+dL,k), (47) bl′,k ≜[b1l′,k,…,bLl′,k]T. (48)

The SE in (46) is obtained by using LSFD vector defined as

 al,k=C−1l,kbl,k. (49)

Even though Corollary 2 is a special case of Theorem 1 when MRC is used, its contributions are two-fold: The LSFD vector is computed in the closed form which is independent of the small-scale fading, so it is easy to compute and store. Moreover, this LSFD vector is the generalization of the vector given in [14] to the larger class of correlated Rayleigh fading channels.

## Iv Data Power Control and LFSD Design for Sum SE Optimization

In this section, how to choose the powers (power control) and the LSFD vector to maximize the sum SE is investigated. The sum SE maximization problem for a multi-cell Massive MIMO system is first formulated based on results from previous sections. Next, an iterative algorithm based on solving a series of convex optimization problems is proposed to efficiently find a stationary point.

### Iv-a Problem Formulation

We consider sum SE maximization:

 % maximize{pl,k≥0},{al,k} L∑l=1K∑k=1Rl,k (50) subjectto pl,k≤Pmax,l,k∀l,k.

Using the rate (38) in (50), and removing the constant pre-log factor, we obtain the equivalent formulation

 % maximize{pl,k≥0},{al,k} L∑l=1K∑k=1log2(1+SINRl,k) (51) subjectto pl,k≤Pmax,l,k∀l,k.

This can be shown to be a non-convex and NP-hard problem using the same methodology as in [19], even if the fine details will be different since that paper considers small-scale multi-user MIMO systems with perfect channel knowledge. Therefore, the global optimum is difficult to find in general. Nevertheless, solving the ergodic sum SE maximization (51) for a Massive MIMO system is more practical than maximizing the instantaneous SEs for a small-scale MIMO network and a given realization of the small-scale fading [20, 21]. In contrast, the sum SE maximization in (51) only depends on the large-scale fading coefficients, which simplifies matters and allows the solution to be used for a long period of time. Another key difference from prior work is that we jointly optimize the data powers and LSFD vectors.

Instead of seeking the global optimum to (51), which has an exponential computational complexity, we will use the weighted MMSE method [22, 23] to obtain a stationary point to (51) in polynomial time. This is a standard method to break down a sum SE problem into subproblems that can be solved sequentially. We stress that the resulting subproblems and algorithms are different for every problem that the method is applied to, thus our solution is a novel contribution. To this end, we first formulate the weighted MMSE problem from (51) as shown in Theorem 3.

###### Theorem 3.

The optimization problem

 minimize{pl,k≥0},{al,k},{wl,k≥0},{ul,k} L∑l=1K∑k=1wl,kel,k−ln(wl,k) (52) subjectto pl,k≤Pmax,l,k,∀l,k,

where is defined as

 el,k≜|ul,k|2⎛⎜⎝L∑l′=1pl′,k∣∣ ∣∣L∑l′′=1(al′′l,k)∗bl′′l′,k∣∣ ∣∣2+L∑l′=1K∑k′=1L∑l′′=1pl′,k′|al′′l,k|2cl′,k′l′′,k+L∑l′=1|al′l,k|2dl′,k)−2√pl,kRe(ul,k(L∑l′=1(al′l,k)∗bl′l,k))+1, (53)

is equivalent to the sum SE optimization problem (51) in the sense that (51) and (52) have the same global optimal power solution and the same LSFD elements .

###### Proof.

The proof is available in Appendix E. ∎

### Iv-B Iterative Algorithm

We now find a stationary point to (52) by decomposing it into a sequence of subproblems, each having a closed-form solution. By changing variable as , the optimization problem (52) is equivalent to

 minimize{ρl,k≥0},{al,k},{wl,k≥0},{ul,k} L∑l=1K∑k=1wl,kel,k−ln(wl,k) (54) subjectto ρ2l,k≤Pmax,l,k,∀l,k,

where is

 el,k=|ul,k|2⎛⎜⎝L∑