I Introduction
Massive MIMO is considered a key technology for G networks due to its great improvements in both spectral efficiency (SE) and energy efficiency (EE) over legacy networks [1]. Moreover, resource allocation problems in Massive MIMO are reported to have much lower complexity than that in smallscale systems owning to the fact that the ergodic SE expressions only depend on the largescale fading coefficients, thanks to the socalled channel hardening property [2]. However, so far, most optimization problems in the Massive MIMO literature have been formulated and solved in a centralized fashion [2, 3]. This requires the network to gather full statistical channel state information (CSI) from all base stations (BSs) at one location in order to centrally allocate resources in every cell and then inform each BS of the decisions. This raises practical questions, especially for dense networks with many BSs and users, about backhaul signaling, scalability, and delays [4]. A classic approach to deal with such issues is distributed optimization [5], which transforms the centralized problem implementations where every BS simultaneously optimizes its local resources based on local information and only parameters that describe the intercell interference are exchanged between BSs to iteratively find the globally optimal solution.
A few recent works have applied distributed implementation concepts to Massive MIMO, e.g., [6, 7, 8]. For the uplink (UL) transmission, a distributed maxmin fairness problem for a twodecodinglayers Massive MIMO system is studied in [6] based on the existence of an effective interference function obeying the rigid conditions [9, 10]. By formulating a noncooperative game, [7] proposed a distributed EE problem which has a Nash equilibrium. For the downlink (DL) transmission, by utilizing the ULDL duality, [8]
proposed a distributed framework for a total transmit power minimization problem with qualityofservice (QoS) constraints that is also seeking the optimal precoding vectors, which are functions of the smallscale fading realizations. Even though the optimal precoding vectors provides certain gains over heuristic precoding such as zeroforcing (ZF), the algorithm in
[8] suffers from the fact that the smallscale fading realizations vary rapidly over both time and frequency. Furthermore, the previous algorithms assumed that the BSs have full access to the channel statistics of the neighboring cells [6] or all the other cells [7, 8], which may require heavy backhaul signaling in the backhaul since users move, new users arrive, and current users disconnect. To the best of our knowledge, no previous work has explicitly investigated what information should be exchanged between BSs in Massive MIMO to solve power minimization problems.In this paper, we consider the DL transmission of Massive MIMO systems with maximum ratio (MR) or ZF precoding. Each user has a QoS requirement, in terms of an achievable SE, and we study the total transmit power minimization problem. We compare a centralized solution algorithm with two distributed algorithms to answer the following fundamental questions: i) Can a distributed implementation achieve the optimal solution of the centralized problem? ii) Which subset of parameters should be shared between the BSs? iii) How can we limit complexity and backhaul signaling requirements for distributed implementation?
Notations: Upper/lower bold letters are used for matrices/vectors. denotes the Hermitian transpose.
denotes the expectation of a random variable, while
stands for the circularly symmetric Gaussian distribution and
is the Euclidean norm.Ii System Model
We consider a Massive MIMO system comprising cells, each having a BS equipped with antennas and serving singleantenna users. The system operates according to a time division duplex (TDD) protocol. The timefrequency resources are divided into coherence intervals of symbols where the channels are assumed to be static and frequency flat. The channel between user in cell and BS is assumed to be uncorrelated Rayleigh fading,
(1) 
where is the largescale fading coefficient.
During the UL channel estimation phase, each coherence interval dedicates
symbols for pilot transmission. We assume that the same set of orthonormal pilot signals , with , is reused in each cell wherein user in cell allocates the power to its pilot signal. The received training signal at BS is(2) 
where is additive noise with each element independently distributed as , where
is the variance. The channel between user
in cell and BS is estimated by multiplying the received signal in (2) with as(3) 
Using minimum mean square error (MMSE) estimation [11, 12], the channel estimate is distributed as
(4) 
where the variance is
(5) 
In this paper, the channel estimates are used to construct linear precoding vectors for the DL data transmission.
Iia Downlink Data Transmission
In the DL data transmission, BS transmits a Gaussian signal to its user with . The received baseband signal at user in cell is
(6) 
where the additive noise is , is the power allocated to user in cell for transmission of the data symbol and is the corresponding normalized linear precoding vector:
(7) 
where is the estimated channel matrix of the users in cell , is the th column of . Using standard techniques, a closedform lower bound on the DL ergodic capacity with MR or ZF precoding is obtained.
Lemma 1.
[13, Corollary ] In the DL, the closedform expression for the ergodic SE of user in cell is
(8) 
where the effective signaltointerferenceplusnoise ratio (SINR), denoted by , is
(9) 
The parameters and are specified by the precoding scheme. MR precoding gives and , while precoding gives and .
As the closedform expression of the ergodic SE in (8) is independent of the smallscale fading, it can be used to solve resource allocation problems whose solutions are applicable over a long time period [6, 7, 8]. In this paper, we introduce a distributed implementation for the total transmit power minimization problem.
IiB Total Transmit Power Minimization Problem
The total transmit power at BS is . Suppose user in cell has the QoS requirement , where is given by (8). The total transmit power minimization problem of the BSs subject to these QoS requirements is formulated as
(10)  
subject to  
where is the maximum transmit power at BS . Converting from the SE requirements to the corresponding SINR values, i.e., set , and then problem (10) is reformulated as
(11)  
subject to  
Since problem (11
) is a linear program
[1], its optimal solution can be obtained in polynomial time by an interiorpoint algorithm or a simplex method, e.g., using CVX [14]. A centralized implementation requires the largescale fading coefficients of all channels (i.e., ), the variances of all the channel estimates (i.e., ), and the QoS requirements of the users to be gathered at one location in the network, which can then solve (11) to obtain the optimal power control (i.e., ). The optimal power solution then needs to be fed back to the BSs by sending additional parameters over the backhaul.IiC Basic Form of Distributed Implementation
In a distributed implementation of problem (11), each BS performs data power allocation for its own users and only iteratively exchange signals with the other BSs as shown in Fig. 1. A basic form of (11) is that every BS acquires the parameters, which the centralized implementation requires, and then solves (11) locally. This removes the need for sending the solution over the backhaul.
Iii Distributed Implementation by Dual Decomposition
In this section, a distributed implementation of (11) is studied based on dual decomposition. Its convergence and the computational complexity are also investigated.
Iiia Assumptions for Distributed Implementation
The proposed distributed implementation for problem (11) is based on two levels of optimization: a master level and the sublevels are schematically illustrated in Fig. 2. At Sublevel , BS will locally optimize the transmit powers to its users utilizing only partial information. The following assumptions are made to allocate power for the DL transmission:

BS possesses statistical information including the largescale fading coefficients and the channel estimate variance of the local users.

BS has the statistical information of the channels to the interfering users, i.e., . Those can also be measured locally.

BS jointly optimizes the DL powers allocated to its local users based on the above prior information.

The intercell interference is considered as consistency parameters in a dualdecomposition approach. These are the only parameters needed to exchange between BSs and will be defined hereafter.
IiiB Details of the Distributed Implementation
Since the cost function in the optimization problem (11) is not strictly convex, a standard dual decomposition implementation is not guaranteed to converge [15].^{1}^{1}1A function is strictly convex if for two variables in the feasible domain of and a scalar that gives also in the feasible domain, then Thus, in order to ensure the convergence of the distributed implementation, we will convert (11) into a convex problem which involves a strictly convex cost function by introducing a new variable such as . The effective SINR value of user in cell is reformulated as
(12) 
In the last equation of (12), the first term in the denominator represents the mutual noncoherent interference of the local users served by BS . Hence, BS can locally evaluate this term. The second term includes the coherent interference caused by the users utilizing nonorthogonal pilot signals and the noncoherent interference from all users in the other cells. If BS wishes to evaluate the SE of each local user, it will need to obtain this information from the other BSs (i.e., the prices in Fig. 2) to compute the second part. In order to reduce the amount of exchanged information among BSs, we introduce socalled consistency variables and which represent the exact and believed value of , respectively. Thus, (11) is reformulated as
(13a)  
(13b)  
(13c)  
(13d)  
(13e) 
The constraints (13d) are called consistency constraints. There are in total consistency variables and , thus (13) involves optimization variables. We stress that (11) and (13) are equivalent since at the optimum. The dual decomposition approach is used to decompose problem (13) into subproblems that each can be solved locally at a BS. To that end, a partial Lagrangian function, which is related to the differences between the consistency variables, is formed as
(14) 
where is the Lagrange multiplier associated with the constraint . The dual function of (14) is computed as the superposition of the local dual functions
(15) 
where the local dual function of BS , denoted by , is formulated as
(16) 
From (16), problem (13) can be decomposed into the subproblems and the th subproblem is
(17)  
Notice that BS uses optimization variables to solve the th subproblem, while all the subproblems can be processed in parallel by the BSs for given values of all the Lagrange multipliers . The globally optimal solution to (17) is obtained due to its convexity as stated in Theorem 1.
Theorem 1.
The optimization problem (17) is equivalent to the following secondorder cone (SOC) program
(18)  
where and are respectively defined as
Proof.
Utilizing the epigraph representation [16], the objective function of (17) turns to the following constraint by introducing a new optimization variable ,
(19) 
and it is equivalent to
(20) 
By applying the identity , i.e., with and , for the righthand side of (20), this constraint can be converted to a SOC constraint. The other constraints of (17) can be easily converted to the corresponding SOC constraints as in (18). ∎
At iteration , after obtaining the optimal solutions to the subproblems in (18), every BS will update the Lagrange multipliers at the socalled master level by considering the following master dual problem:
(21) 
where and are the global optimums in the th iteration to the subproblems in (18). The subgradient projection method [5] can be adopted to update the Lagrangian multipliers as
(22) 
where is a positive stepsize at the th iteration and is the projection onto the nonnegative orthant. If the master problem is solved at an arbitrary BS, it acquires the consistency parameters from the remaining BSs. The updated Lagrange multiplier and should be sent back to BS in every iteration. In total, the number of exchanged parameters for each iteration is . The proposed distributed implementation of problem (11) is presented in Algorithm 1.
Furthermore, the convergence property of Algorithm 1 is established in the following theorem.
Theorem 2.
IiiC Complexity Analysis of (18)
Since problem (18) contains SOC constraints, a standard interiorpoint method (IPM) [18, 19] can be used to find its optimal solution. The worstcase runtime of the IPM is computed as follows.
Definition 1: For a given tolerance , the set of is called an solution to problem (18) if
(23) 
where is the globally optimal solution to the optimization problem (18).
The number of decision variables of problem (18) is on the order of where represents the bigO notation, thus we obtain Lemma 2.
Lemma 2.
Proof.
First, problem (18) has SOC constraints of dimension , SOC constraints of dimension , and SOC constraints of dimension . Based on these observations, one can follow the same steps as in [20, Section VA] to arrive at (24). Note that the term in (24) is the order of the number of iteration required to reach solution to problem (18) while the remaining terms represent the periteration computation costs [20, 21]. ∎
IiiD Numerical Results
Next, we study the convergence of the dual decomposition approach. We consider a wrappedaround cellular network with W, mW, and b/s/Hz, . The
users are randomly distributed over the coverage area with uniform distribution. The distance of user
in cell to BS is denoted as km and km. The system bandwidth and noise figure are MHz and dBm, respectively. The largescale fading coefficient is modeled as(25) 
where the shadow fading follows a lognormal Gaussian distribution with standard variation dB. The Lagrange multipliers are initially selected as , while the step size in (22) is defined as . ZF precoding is used for simulations. MonteCarlo results are obtained over different realizations of user locations.
Fig. 3
shows the probability of the number of iterations which Algorithm
1 needs to obtain the % of the global optimum for (a) and (b) . Algorithm 1 converges to the desired objective value after a few iterations. For example, with , of the realizations of user locations only need one iterations to attain % of the global optimum. If we set iterations as a maximum level to terminate the algorithm for the realizations of users which are in trouble to obtain % of the global optimum, there are of the user realizations facing with this issue. Meanwhile, by comparing Fig. 3a and Fig. 3b, we observe that Algorithm 1 requires more iterations to converge to the global optimum when increasing the number of users in the network.Fig. 4 gives the cumulative distribution probability (CDF) of the actual QoS per user when the algorithm has reached % of the global optimum. The results confirm that the proposed distributed implementation satisfies the QoS requirements of most of the users, while some get somewhat lower QoS and some get higher. However, the performance gap between the proposed algorithm and its centralized counterpart is wider as the number of users increases, but mainly in the sense that users get higher QoS than required.
Iv Comparison of Centralized and Distributed Implementation Approaches
Method 
















Table I summarizes the total number of the optimization variables and the exchanged parameters for the centralized and the two distributed implementations. In this table, represents the number of iterations used in the dual decomposition algorithm to reach convergence. Even though the distributed approach utilizing dual decomposition divides (11) into the subproblems where every BS locally optimizes the power control coefficients, it also introduces new optimization variables leading to a large number of optimization variables, both in total and when comparing each subproblem with the centralized problem. Note that the global problem is a linear program, while the subproblems are SOC programs, which are more complex to solve even when the number of optimization variables are equal.
The total number of exchanged parameters with the dual decomposition approach may be less than in the basic form of distributed implementation, because is usually small as shown in Section IIID. However, in situations where the number of iterations needed for the dual decomposition approach increases, the number of exchanged parameters may surpass that of the two other implementations.
Unlike in smallscale networks, where the smallscale fading coefficient from every antenna to every user needs to be exchanged to solve the centralized problem, in Massive MIMO only a few statistical parameters need to be exchanged per user. Hence, in Massive MIMO, the distributed implementation by dual decomposition does not bring any substantial reductions in backhaul signaling compared to the centralized counterpart; this is basically a consequence of the channel hardening property [1].
V Conclusion
This paper compared distributed and centralized implementations of the total transmit power minimization problem with QoS requirements in multicell Massive MIMO systems. In the distributed implementation based on dual decomposition, every BS can independently and simultaneously minimize its transmit powers, while controlling intercell interference based on to the Lagrange multipliers provided by a central entity, while a subset of nominal parameters representing the strength of mutual interference should be sent to the central entity to update these Lagrange multipliers. This iterative method converges to the global optimum after a few iterations (in the most realizations of the user locations) and its convergence is theoretically guaranteed. In smallscale MIMO networks, this approach can substantially reduce the computational complexity and backhaul signaling. However, in Massive MIMO systems, this distributed implementation does not bring any significant reductions in the amount of exchanged information, since each channel is only described by a few parameters. Therefore, for resource allocation problems that only involve largescale fading coefficients, a centralized implementation is preferable in terms of both backhaul signaling and complexity. In practice, a combination of the two approaches might be preferable, for example, where the resource allocation is not optimized on every BS or on a single central entity, but at multiple central entities that are responsible for larger coverage areas.
References
 [1] T. L. Marzetta, E. G. Larsson, H. Yang, and H. Q. Ngo, Fundamentals of Massive MIMO. Cambridge University Press, 2016.
 [2] T. V. Chien, E. Björnson, and E. G. Larsson, “Joint power allocation and user association optimization for Massive MIMO systems,” IEEE Trans. Wireless Commun., vol. 15, no. 9, pp. 6384 – 6399, 2016.
 [3] H. Q. Ngo, A. E. Ashikhmin, H. Yang, E. G. Larsson, and T. L. Marzetta, “Cellfree massive MIMO: Uniformly great service for everyone,” in Proc. IEEE SPAWC, 2015.
 [4] E. Björnson and E. Jorswieck, “Optimal resource allocation in coordinated multicell systems,” Foundations and Trends in Communications and Information Theory, vol. 9, no. 23, pp. 113–381, 2013.
 [5] D. Palomar and M. Chiang, “A tutorial on decomposition methods for network utility maximization,” IEEE J. Sel. Areas Commun., vol. 24, no. 8, pp. 1439–1451, 2006.
 [6] A. Adhikary, A. Ashikhmin, and T. L. Marzetta, “Uplink interference reduction in largescale antenna systems,” IEEE Trans. Wireless Commun., vol. 65, no. 5, pp. 2194–2206, 2017.
 [7] A. Zappone, L. Sanguinetti, G. Bacci, E. Jorswieck, and M. Debbah, “Distributed energyefficient UL power control in Massive MIMO with hardware impairments and imperfect CSI,” in Proc. IEEE ISWCS, 2015, pp. 311–315.
 [8] H. Asgharimoghaddam, A. Tolli, and N. Rajatheva, “Decentralizing the optimal multicell beamforming via large system analysis,” in Proc. IEEE ICC, 2014, pp. 5125–5130.
 [9] R. Yates, “A framework for uplink power control in cellular radio systems,” IEEE J. Sel. Areas Commun., vol. 13, no. 7, pp. 1341–1347, 1995.
 [10] T. A. Le and M. R. Nakhai, “Downlink optimization with interference pricing and statistical CSI,” IEEE Trans. Commun., vol. 61, no. 6, pp. 2339–2349, 2013.
 [11] S. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory. Prentice Hall, 1993.
 [12] T. V. Chien, E. Björnson, and E. G. Larsson, “Joint pilot design and uplink power allocation in multicell Massive MIMO systems,” IEEE Trans. Wireless Commun., 2018, accepted for publication.
 [13] T. V. Chien and E. Björnson, Massive MIMO Communications. Springer International Publishing, 2017, pp. 77–116.
 [14] CVX Research Inc., “CVX: Matlab software for disciplined convex programming, academic users,” http://cvxr.com/cvx, 2015.
 [15] S. Boyd, L. Xiao, A. Mutapcic, and J. Mattingley, “Primal and dual decomposition: Notes on decomposition methods.” [Online]. Available: https://stanford.edu/class/ee364b/lectures.html
 [16] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press, 2004.
 [17] H. Pennanen, A. Tölli, and M. Latvaaho, “Decentralized coordinated downlink beamforming via primal decomposition,” IEEE Signal Process. Lett., vol. 18, no. 11, pp. 647–650, 2011.
 [18] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press, 2004.
 [19] A. BenTal and A. Nemirovski, Lectures on Modern Convex Optimization. Society for Industrial and Applied Mathematics, 2001.
 [20] K.Y. Wang, A. M.C. So, T.H. Chang, W.K. Ma, and C.Y. Chi, “Outage constrained robust transmit optimization for multiuser MISO downlinks: Tractable approximations by conic optimization,” IEEE Trans. Signal Process., vol. 62, no. 21, pp. 5690–5705, Nov. 2014.
 [21] T. A. Le, Q.T. Vien, H. X. Nguyen, D. W. K. Ng, and R. Schober, “Robust chanceconstrained optimization for powerefficient and secure SWIPT systems,” IEEE Trans. Green Comm. and Networking, vol. 1, no. 3, pp. 333–346, 2017.
Comments
There are no comments yet.