Massive MIMO is considered a key technology for G networks due to its great improvements in both spectral efficiency (SE) and energy efficiency (EE) over legacy networks . Moreover, resource allocation problems in Massive MIMO are reported to have much lower complexity than that in small-scale systems owning to the fact that the ergodic SE expressions only depend on the large-scale fading coefficients, thanks to the so-called channel hardening property . However, so far, most optimization problems in the Massive MIMO literature have been formulated and solved in a centralized fashion [2, 3]. This requires the network to gather full statistical channel state information (CSI) from all base stations (BSs) at one location in order to centrally allocate resources in every cell and then inform each BS of the decisions. This raises practical questions, especially for dense networks with many BSs and users, about backhaul signaling, scalability, and delays . A classic approach to deal with such issues is distributed optimization , which transforms the centralized problem implementations where every BS simultaneously optimizes its local resources based on local information and only parameters that describe the inter-cell interference are exchanged between BSs to iteratively find the globally optimal solution.
A few recent works have applied distributed implementation concepts to Massive MIMO, e.g., [6, 7, 8]. For the uplink (UL) transmission, a distributed max-min fairness problem for a two-decoding-layers Massive MIMO system is studied in  based on the existence of an effective interference function obeying the rigid conditions [9, 10]. By formulating a non-cooperative game,  proposed a distributed EE problem which has a Nash equilibrium. For the downlink (DL) transmission, by utilizing the UL-DL duality, 
proposed a distributed framework for a total transmit power minimization problem with quality-of-service (QoS) constraints that is also seeking the optimal precoding vectors, which are functions of the small-scale fading realizations. Even though the optimal precoding vectors provides certain gains over heuristic precoding such as zero-forcing (ZF), the algorithm in suffers from the fact that the small-scale fading realizations vary rapidly over both time and frequency. Furthermore, the previous algorithms assumed that the BSs have full access to the channel statistics of the neighboring cells  or all the other cells [7, 8], which may require heavy backhaul signaling in the backhaul since users move, new users arrive, and current users disconnect. To the best of our knowledge, no previous work has explicitly investigated what information should be exchanged between BSs in Massive MIMO to solve power minimization problems.
In this paper, we consider the DL transmission of Massive MIMO systems with maximum ratio (MR) or ZF precoding. Each user has a QoS requirement, in terms of an achievable SE, and we study the total transmit power minimization problem. We compare a centralized solution algorithm with two distributed algorithms to answer the following fundamental questions: i) Can a distributed implementation achieve the optimal solution of the centralized problem? ii) Which subset of parameters should be shared between the BSs? iii) How can we limit complexity and backhaul signaling requirements for distributed implementation?
Ii System Model
We consider a Massive MIMO system comprising cells, each having a BS equipped with antennas and serving single-antenna users. The system operates according to a time division duplex (TDD) protocol. The time-frequency resources are divided into coherence intervals of symbols where the channels are assumed to be static and frequency flat. The channel between user in cell and BS is assumed to be uncorrelated Rayleigh fading,
where is the large-scale fading coefficient.
During the UL channel estimation phase, each coherence interval dedicatessymbols for pilot transmission. We assume that the same set of orthonormal pilot signals , with , is reused in each cell wherein user in cell allocates the power to its pilot signal. The received training signal at BS is
where is additive noise with each element independently distributed as , where
is the variance. The channel between userin cell and BS is estimated by multiplying the received signal in (2) with as
where the variance is
In this paper, the channel estimates are used to construct linear precoding vectors for the DL data transmission.
Ii-a Downlink Data Transmission
In the DL data transmission, BS transmits a Gaussian signal to its user with . The received baseband signal at user in cell is
where the additive noise is , is the power allocated to user in cell for transmission of the data symbol and is the corresponding normalized linear precoding vector:
where is the estimated channel matrix of the users in cell , is the th column of . Using standard techniques, a closed-form lower bound on the DL ergodic capacity with MR or ZF precoding is obtained.
[13, Corollary ] In the DL, the closed-form expression for the ergodic SE of user in cell is
where the effective signal-to-interference-plus-noise ratio (SINR), denoted by , is
The parameters and are specified by the precoding scheme. MR precoding gives and , while precoding gives and .
As the closed-form expression of the ergodic SE in (8) is independent of the small-scale fading, it can be used to solve resource allocation problems whose solutions are applicable over a long time period [6, 7, 8]. In this paper, we introduce a distributed implementation for the total transmit power minimization problem.
Ii-B Total Transmit Power Minimization Problem
The total transmit power at BS is . Suppose user in cell has the QoS requirement , where is given by (8). The total transmit power minimization problem of the BSs subject to these QoS requirements is formulated as
where is the maximum transmit power at BS . Converting from the SE requirements to the corresponding SINR values, i.e., set , and then problem (10) is reformulated as
Since problem (11
) is a linear program, its optimal solution can be obtained in polynomial time by an interior-point algorithm or a simplex method, e.g., using CVX . A centralized implementation requires the large-scale fading coefficients of all channels (i.e., ), the variances of all the channel estimates (i.e., ), and the QoS requirements of the users to be gathered at one location in the network, which can then solve (11) to obtain the optimal power control (i.e., ). The optimal power solution then needs to be fed back to the BSs by sending additional parameters over the backhaul.
Ii-C Basic Form of Distributed Implementation
In a distributed implementation of problem (11), each BS performs data power allocation for its own users and only iteratively exchange signals with the other BSs as shown in Fig. 1. A basic form of (11) is that every BS acquires the parameters, which the centralized implementation requires, and then solves (11) locally. This removes the need for sending the solution over the backhaul.
Iii Distributed Implementation by Dual Decomposition
In this section, a distributed implementation of (11) is studied based on dual decomposition. Its convergence and the computational complexity are also investigated.
Iii-a Assumptions for Distributed Implementation
The proposed distributed implementation for problem (11) is based on two levels of optimization: a master level and the sublevels are schematically illustrated in Fig. 2. At Sublevel , BS will locally optimize the transmit powers to its users utilizing only partial information. The following assumptions are made to allocate power for the DL transmission:
BS possesses statistical information including the large-scale fading coefficients and the channel estimate variance of the local users.
BS has the statistical information of the channels to the interfering users, i.e., . Those can also be measured locally.
BS jointly optimizes the DL powers allocated to its local users based on the above prior information.
The inter-cell interference is considered as consistency parameters in a dual-decomposition approach. These are the only parameters needed to exchange between BSs and will be defined hereafter.
Iii-B Details of the Distributed Implementation
Since the cost function in the optimization problem (11) is not strictly convex, a standard dual decomposition implementation is not guaranteed to converge .111A function is strictly convex if for two variables in the feasible domain of and a scalar that gives also in the feasible domain, then Thus, in order to ensure the convergence of the distributed implementation, we will convert (11) into a convex problem which involves a strictly convex cost function by introducing a new variable such as . The effective SINR value of user in cell is reformulated as
In the last equation of (12), the first term in the denominator represents the mutual non-coherent interference of the local users served by BS . Hence, BS can locally evaluate this term. The second term includes the coherent interference caused by the users utilizing nonorthogonal pilot signals and the non-coherent interference from all users in the other cells. If BS wishes to evaluate the SE of each local user, it will need to obtain this information from the other BSs (i.e., the prices in Fig. 2) to compute the second part. In order to reduce the amount of exchanged information among BSs, we introduce so-called consistency variables and which represent the exact and believed value of , respectively. Thus, (11) is reformulated as
The constraints (13d) are called consistency constraints. There are in total consistency variables and , thus (13) involves optimization variables. We stress that (11) and (13) are equivalent since at the optimum. The dual decomposition approach is used to decompose problem (13) into subproblems that each can be solved locally at a BS. To that end, a partial Lagrangian function, which is related to the differences between the consistency variables, is formed as
where is the Lagrange multiplier associated with the constraint . The dual function of (14) is computed as the superposition of the local dual functions
where the local dual function of BS , denoted by , is formulated as
Notice that BS uses optimization variables to solve the th subproblem, while all the subproblems can be processed in parallel by the BSs for given values of all the Lagrange multipliers . The globally optimal solution to (17) is obtained due to its convexity as stated in Theorem 1.
The optimization problem (17) is equivalent to the following second-order cone (SOC) program
where and are respectively defined as
and it is equivalent to
By applying the identity , i.e., with and , for the right-hand side of (20), this constraint can be converted to a SOC constraint. The other constraints of (17) can be easily converted to the corresponding SOC constraints as in (18). ∎
At iteration , after obtaining the optimal solutions to the subproblems in (18), every BS will update the Lagrange multipliers at the so-called master level by considering the following master dual problem:
where is a positive step-size at the th iteration and is the projection onto the nonnegative orthant. If the master problem is solved at an arbitrary BS, it acquires the consistency parameters from the remaining BSs. The updated Lagrange multiplier and should be sent back to BS in every iteration. In total, the number of exchanged parameters for each iteration is . The proposed distributed implementation of problem (11) is presented in Algorithm 1.
Furthermore, the convergence property of Algorithm 1 is established in the following theorem.
Iii-C Complexity Analysis of (18)
Definition 1: For a given tolerance , the set of is called an -solution to problem (18) if
where is the globally optimal solution to the optimization problem (18).
The computational complexity to obtain -solution to problem (18) is
First, problem (18) has SOC constraints of dimension , SOC constraints of dimension , and SOC constraints of dimension . Based on these observations, one can follow the same steps as in [20, Section V-A] to arrive at (24). Note that the term in (24) is the order of the number of iteration required to reach -solution to problem (18) while the remaining terms represent the per-iteration computation costs [20, 21]. ∎
Iii-D Numerical Results
Next, we study the convergence of the dual decomposition approach. We consider a wrapped-around cellular network with W, mW, and b/s/Hz, . The
users are randomly distributed over the coverage area with uniform distribution. The distance of userin cell to BS is denoted as km and km. The system bandwidth and noise figure are MHz and dBm, respectively. The large-scale fading coefficient is modeled as
where the shadow fading follows a log-normal Gaussian distribution with standard variation dB. The Lagrange multipliers are initially selected as , while the step size in (22) is defined as . ZF precoding is used for simulations. Monte-Carlo results are obtained over different realizations of user locations.
shows the probability of the number of iterations which Algorithm1 needs to obtain the % of the global optimum for (a) and (b) . Algorithm 1 converges to the desired objective value after a few iterations. For example, with , of the realizations of user locations only need one iterations to attain % of the global optimum. If we set iterations as a maximum level to terminate the algorithm for the realizations of users which are in trouble to obtain % of the global optimum, there are of the user realizations facing with this issue. Meanwhile, by comparing Fig. 3a and Fig. 3b, we observe that Algorithm 1 requires more iterations to converge to the global optimum when increasing the number of users in the network.
Fig. 4 gives the cumulative distribution probability (CDF) of the actual QoS per user when the algorithm has reached % of the global optimum. The results confirm that the proposed distributed implementation satisfies the QoS requirements of most of the users, while some get somewhat lower QoS and some get higher. However, the performance gap between the proposed algorithm and its centralized counterpart is wider as the number of users increases, but mainly in the sense that users get higher QoS than required.
Iv Comparison of Centralized and Distributed Implementation Approaches
Table I summarizes the total number of the optimization variables and the exchanged parameters for the centralized and the two distributed implementations. In this table, represents the number of iterations used in the dual decomposition algorithm to reach convergence. Even though the distributed approach utilizing dual decomposition divides (11) into the subproblems where every BS locally optimizes the power control coefficients, it also introduces new optimization variables leading to a large number of optimization variables, both in total and when comparing each subproblem with the centralized problem. Note that the global problem is a linear program, while the subproblems are SOC programs, which are more complex to solve even when the number of optimization variables are equal.
The total number of exchanged parameters with the dual decomposition approach may be less than in the basic form of distributed implementation, because is usually small as shown in Section III-D. However, in situations where the number of iterations needed for the dual decomposition approach increases, the number of exchanged parameters may surpass that of the two other implementations.
Unlike in small-scale networks, where the small-scale fading coefficient from every antenna to every user needs to be exchanged to solve the centralized problem, in Massive MIMO only a few statistical parameters need to be exchanged per user. Hence, in Massive MIMO, the distributed implementation by dual decomposition does not bring any substantial reductions in backhaul signaling compared to the centralized counterpart; this is basically a consequence of the channel hardening property .
This paper compared distributed and centralized implementations of the total transmit power minimization problem with QoS requirements in multi-cell Massive MIMO systems. In the distributed implementation based on dual decomposition, every BS can independently and simultaneously minimize its transmit powers, while controlling inter-cell interference based on to the Lagrange multipliers provided by a central entity, while a subset of nominal parameters representing the strength of mutual interference should be sent to the central entity to update these Lagrange multipliers. This iterative method converges to the global optimum after a few iterations (in the most realizations of the user locations) and its convergence is theoretically guaranteed. In small-scale MIMO networks, this approach can substantially reduce the computational complexity and backhaul signaling. However, in Massive MIMO systems, this distributed implementation does not bring any significant reductions in the amount of exchanged information, since each channel is only described by a few parameters. Therefore, for resource allocation problems that only involve large-scale fading coefficients, a centralized implementation is preferable in terms of both backhaul signaling and complexity. In practice, a combination of the two approaches might be preferable, for example, where the resource allocation is not optimized on every BS or on a single central entity, but at multiple central entities that are responsible for larger coverage areas.
-  T. L. Marzetta, E. G. Larsson, H. Yang, and H. Q. Ngo, Fundamentals of Massive MIMO. Cambridge University Press, 2016.
-  T. V. Chien, E. Björnson, and E. G. Larsson, “Joint power allocation and user association optimization for Massive MIMO systems,” IEEE Trans. Wireless Commun., vol. 15, no. 9, pp. 6384 – 6399, 2016.
-  H. Q. Ngo, A. E. Ashikhmin, H. Yang, E. G. Larsson, and T. L. Marzetta, “Cell-free massive MIMO: Uniformly great service for everyone,” in Proc. IEEE SPAWC, 2015.
-  E. Björnson and E. Jorswieck, “Optimal resource allocation in coordinated multi-cell systems,” Foundations and Trends in Communications and Information Theory, vol. 9, no. 2-3, pp. 113–381, 2013.
-  D. Palomar and M. Chiang, “A tutorial on decomposition methods for network utility maximization,” IEEE J. Sel. Areas Commun., vol. 24, no. 8, pp. 1439–1451, 2006.
-  A. Adhikary, A. Ashikhmin, and T. L. Marzetta, “Uplink interference reduction in large-scale antenna systems,” IEEE Trans. Wireless Commun., vol. 65, no. 5, pp. 2194–2206, 2017.
-  A. Zappone, L. Sanguinetti, G. Bacci, E. Jorswieck, and M. Debbah, “Distributed energy-efficient UL power control in Massive MIMO with hardware impairments and imperfect CSI,” in Proc. IEEE ISWCS, 2015, pp. 311–315.
-  H. Asgharimoghaddam, A. Tolli, and N. Rajatheva, “Decentralizing the optimal multi-cell beamforming via large system analysis,” in Proc. IEEE ICC, 2014, pp. 5125–5130.
-  R. Yates, “A framework for uplink power control in cellular radio systems,” IEEE J. Sel. Areas Commun., vol. 13, no. 7, pp. 1341–1347, 1995.
-  T. A. Le and M. R. Nakhai, “Downlink optimization with interference pricing and statistical CSI,” IEEE Trans. Commun., vol. 61, no. 6, pp. 2339–2349, 2013.
-  S. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory. Prentice Hall, 1993.
-  T. V. Chien, E. Björnson, and E. G. Larsson, “Joint pilot design and uplink power allocation in multi-cell Massive MIMO systems,” IEEE Trans. Wireless Commun., 2018, accepted for publication.
-  T. V. Chien and E. Björnson, Massive MIMO Communications. Springer International Publishing, 2017, pp. 77–116.
-  CVX Research Inc., “CVX: Matlab software for disciplined convex programming, academic users,” http://cvxr.com/cvx, 2015.
-  S. Boyd, L. Xiao, A. Mutapcic, and J. Mattingley, “Primal and dual decomposition: Notes on decomposition methods.” [Online]. Available: https://stanford.edu/class/ee364b/lectures.html
-  S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press, 2004.
-  H. Pennanen, A. Tölli, and M. Latva-aho, “Decentralized coordinated downlink beamforming via primal decomposition,” IEEE Signal Process. Lett., vol. 18, no. 11, pp. 647–650, 2011.
-  S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press, 2004.
-  A. Ben-Tal and A. Nemirovski, Lectures on Modern Convex Optimization. Society for Industrial and Applied Mathematics, 2001.
-  K.-Y. Wang, A. M.-C. So, T.-H. Chang, W.-K. Ma, and C.-Y. Chi, “Outage constrained robust transmit optimization for multiuser MISO downlinks: Tractable approximations by conic optimization,” IEEE Trans. Signal Process., vol. 62, no. 21, pp. 5690–5705, Nov. 2014.
-  T. A. Le, Q.-T. Vien, H. X. Nguyen, D. W. K. Ng, and R. Schober, “Robust chance-constrained optimization for power-efficient and secure SWIPT systems,” IEEE Trans. Green Comm. and Networking, vol. 1, no. 3, pp. 333–346, 2017.