1 Introduction
Massive multipleinput multipleoutput (MIMO) facilitates the concentration of wireless beams towards target directions [1], and is a promising technology for 5G communication systems and beyond [2, 3]. On the other hand, in many emerging applications such as video streaming [4] and computation offloading [5], a large number of users could be interested in the same data, making massive MIMO multicast beamforming indispensible.
In the context of massive MIMO multicasting, a fundamental criterion for beamforming optimization is to minimize the power consumption subject to qualityofservice (QoS) constraints [6, 7, 8, 9]. However, due to high dimensionality (e.g., the number of antennas and the number of users can be in the range of hundreds or more [2, 3]), traditional semidefinite relaxation (SDR) becomes extremely timeconsuming, since the complexity of SDR is at least [6, 7, 10]. To reduce the computational complexity of beamforming optimization in largescale settings, firstorder methods (FOMs), which only involve the computation of gradients [11], are recently proposed. In particular, the alternating direction method of multipliers (ADMM) has been derived in [12, 13] by introducing slack variables and solving the augmented Lagrangian problem. Nonetheless, the periteration complexity of ADMM is still , and a fundamental question is: can we achieve a lower periteration complexity for the multicast beamforming optimization problem?
It turns out that this is possible if we only update one coordinate in each iteration [14]. This leads to the coordinate descent method, which involves a periteration complexity of . But unfortunately, the naive way of cyclicly updating the coordinates may diverge [15]. Even if it converges, the convergence rate can be slow [14], thus offsetting the benefit brought by the low periteration complexity. In fact, this is the reason why coordinate descent method received less attention compared to its fullgradient counterpart.
However, there has been a revival of interest in coordinate descent method recently [16, 17, 18]. In particular, it is proved in [16] that if we randomly update one coordinate in each iteration, the resultant random coordinate descent (RCD) is guaranteed to converge with a rate of , where is the iteration counter. Moreover, by adopting coordinatewise momentum updates, RCD can be further accelerated to a faster convergence rate of [19]. This results in the parallel accelerated random coordinate descent (ARCD) method, which reduces the computation time by orders of magnitude compared to traditional FOMs in extensive applications, e.g., inverse problem [19], regularized least squares problem [19]
[20], etc.Nonetheless, ARCD can only be used to solve convex and smooth problems with separable constraints [16, 17, 18, 19, 20]. In contrast, the multicast beamforming optimization problem is nonconvex with nonseparable constraints. Therefore, we cannot directly apply ARCD to this application. To this end, leveraging majorization minimization [21], the nonconvex problem is transformed into a sequence of convex problems but with nonseparable constraints. Furthermore, to resolve the coupling among different coordinates, the nonseparable problem is equivalently transformed into its Lagrangian dual counterpart, which is proved to be a coordinatewise Lipschitz smooth problem with separable constraints, thus allowing ARCD to work on this dual problem. Numerical results are further presented to demonstrate the low complexity nature of the proposed method.
Notation. Italic letter , small bold letter , and capital bold letter
represent scalar, vector, and matrix, respectively. The operator
, takes the real part of , and takes the modulus of . The symbol takes the element of vector , is a row vector taking the row of matrix , and takes the diagonal element at the row and the column of matrix . Finally,represents the expectation of a random variable and
represents the order of arithmetic operations.2 Problem Formulation and Existing Methods
We consider a massive MIMO system consisting of a base station with antennas, and singleantenna users. In particular, the base station transmits a signal with to all the users through the beamforming vector with power . Accordingly, the received signal at the user is , where is the downlink channel vector from the base station to user , and is the Gaussian noise at the user with power . Based on the expression of , the received SNR at user is , where .
In multicast systems, our aim is to provide guaranteed SNR for all the users, while minimizing the total transmit power at base station: [6]:
(1) 
where is the common SNR target. Problem has been proved to be NPhard in general [6, Claim 1]. To solve , a traditional way is to apply SDR for convexification [6]. However, since SDR needs to solve a semidefinite programming (SDP) problem with variables and one semidefinite constraint of dimension , SDR requires a complexity of [22], which is too demanding when or is large.
To reduce the computational complexity of beamforming optimization, the majorization minimization (MM) framework [23, 21, 13, 24] can be adopted to transform into a sequence of surrogate problems, Then, an iterative algorithm can be obtained with the following update at the iteration:
(2) 
It has been proved in [23] that the sequence converges to a KruashKuhnTucker solution to .
However, even capitalizing on the MM framework, is still largescale, and the interior point method (IPM) adopted in [23] would lead to timeconsuming computations since the complexity of IPM is [22]. To overcome this challenge, ADMM has been proposed for solving [12, 13]. Such a method reformulates into an augmented Lagrangian problem and then uses hybrid gradient method to solve it. Therefore, its periteration complexity is only .
3 ARCD: Accelerated Random Coordinate Descent
While the periteration complexity of ADMM is lower than that of SDR, a natural question is: can we achieve a lower complexity than that of ADMM for solving ? This section will show that it is possible under the framework of ARCD. To begin with, the following property is established.
Property 1.
Strong duality holds for .
Proof.
To prove the strong duality of , it suffices to show that is convex and satisfies Slater’s condition [25]. Since the objective of is convex quadratic and the constraints of are linear, is convex. On the other hand, showing that satisfies Slater’s condition is equivalent to finding a feasible point of satisfying all the constraints with strict inequality. To this end, consider the solution with , and it can be shown that for all . This completes the proof. ∎
Based on the result of Property 1, the dual problem of must have the same optimal value as [25]. Therefore, we propose to transform into its Lagrangian dual domain, which gives the following proposition.
Proposition 1.
The dual problem of is
(3) 
where and
(4) 
Moreover, denoting the optimal solution of to as , the optimal of is
(5) 
Proof.
Based on Proposition 1 and by defining
(7)  
(8) 
the function in can be rewritten as , which is quadratic. Therefore, FOMs (e.g., gradient descent, accelerated gradient projection, etc.) can be adopted by computing in each iteration, which require a complexity of . However, in the following, a stronger property of will be established, thus allowing more efficient updates with periteration complexity smaller than .
Property 2.
The function in is coordinatewise smooth, where
(9) 
Proof.
Based on Property 2 and since the constraint is separable (it can be rewritten as for all ), problem can be solved by the coordinate descent method. In particular, at iteration , we first choose a coordinate, say , and then update
(12) 
where the coordinatewise gradient
(13) 
It can be seen from (12) and (13) that the complexity of updating (12) is only .
The next question is how to choose the index . A straightforward idea is to choose in a cyclic manner, i.e., . But this may lead to slow convergence of the algorithm [14]. A better way is to choose as a random integer number ranging from to
with equal probability, i.e.,
. With such a random schedule and the coordinate update in (12), the sequence converges to the optimal to with a convergence rate of [16, 17]. This is the socalled RCD method.However, the RCD method can still be improved in the following two ways. First, the update of (12) is sequential, which means that the next iteration must wait before the current iteration is completed. If a block of coordinates is updated in parallel, the running time of RCD can be further reduced [17, 18]. Second, there is a gap between the convergence rate of RCD and the best known convergence rate [26] for solving smooth problems. This indicates that we can consider adding momentums to accelerate the convergence of RCD [19].
Based on the above observations, the following ARCD is adopted for solving . More specifically, at the iteration, instead of generating a random number , we generate a random set with
(14) 
and update all the coordinates in as
(15) 
where the momentum is added, with being the direction at the iteration and being the stepsize to control the importance of .
It can be seen from (15) that different coordinates in are updated in parallel. On the other hand, by choosing
(16)  
(17) 
with the initial and , computed using (15) is guaranteed to converge to the optimal solution to with a convergence rate [19, Theorem 3]. Since is equivalent to and represents a surrogate problem for , problem can be solved via MM and ARCD in the Lagrangian dual domain. The entire procedure is summarized in Algorithm 1.
4 Simulation Results
This section presents simulation results to verify the performance of the proposed ARCD. In particular, each random channel is generated according to [24], where the pathloss is . It is assumed that the noise power , which includes thermal noise and receiver noise [27]. Each point in the figures is obtained by averaging over simulation runs, with independent channels between consecutive runs. All problem instances are solved by Matlab R2015b on a desktop with Intel Core i54570 CPU at 3.2 GHz and 8GB RAM. For comparison, we also simulate the MMIPM^{1}^{1}1The MMIPM [23] is implemented using the Matlab software CVX Mosek [25]. [23], the ADMM ^{2}^{2}2The ADMM is implemented by introducing slack variables [12] and adding a consensus penalty , with [13] and the dual variables being . [12, 13], and the asymptotic method^{3}^{3}3The asymptotic solution is implemented by assuming [1], and this method is used as initialization for the other simulated methods. [1].
To evaluate the solution quality and the running time of Algorithm 1, we simulate the case of with . Notice that the case of large is very important for future crowd sensing applications. For the methods based on MM, the number of MM iterations is [23, 12, 13]. Furthermore, the ADMM for solving stops when the change of objective functions between consecutive iterations is smaller than [12]. If ADMM fails to reach the above condition within iterations, it stops and outputs the result at iteration [12, 13]. It can be observed from Fig. 1a that Algorithm 1 significantly outperforms the asymptotic solution [1], and slightly outperforms the ADMM [12, 13]. In fact, Algorithm 1 achieves the same power consumption as the MMIPM [23]. However, as illustrated in Fig. 1b, Algorithm 1 only requires less than seconds to finish for all the simulated value of . Compared to MMIPM and ADMM, Algorithm 1 saves at least (one order of magnitude) of the computation times.
5 Conclusions
This paper studied the massive MIMO multicast beamforming optimization problem. With majorization minimization and strong duality, the primal problem was transformed into a coordinatewise Lipschitz smooth problem with separable constraints. By further adopting ARCD, lower complexity than that of existing algorithms was achieved. Simulation results showed that the proposed method reduces the execution time by one order of magnitude compared to existing methods while guaranteeing the same performance.
References
 [1] H. Q. Ngo, E. G. Larsson, and T. L. Marzetta, “Energy and spectral efficiency of very large multiuser MIMO systems,” IEEE Trans. Commun., vol. 61, no. 4, pp. 14361449, Apr. 2013.
 [2] E. G. Larsson, F. Tufvesson, O. Edfors, and T. L. Marzetta, “Massive MIMO for next generation wireless systems,” IEEE Commun. Mag., vol. 52, no. 2, pp. 186195, Feb. 2014.

[3]
L. Cheng, Y.C. Wu, J. Zhang, and L. Liu, “Subspace identification for DOA estimation in massive/fulldimension MIMO system: Bad data mitigation and automatic source enumeration,”
IEEE Trans. Signal Process., vol. 63, no. 22, pp. 58975909, Nov 2015.  [4] C. Guo, Y. Cui, D. W. K. Ng, and Z. Liu, “Multiquality multicast beamforming based on scalable video coding,” IEEE Trans. Commun., vol. 66, no. 11, pp. 56625677, Nov. 2018.
 [5] S. Yu, R. Langar, X. Fu, L. Wang, and Z. Han, “Computation offloading with data caching enhancement for mobile edge computing,” IEEE Trans. Veh. Technol., vol. 67, no. 11, pp. 1109811112, Nov. 2018.
 [6] N. D. Sidiropoulos, T. N. Davidson, and Z. Q. Luo, “Transmit beamforming for physicallayer multicasting,” IEEE Trans. Signal Process., vol. 54, no. 6, pp. 22392251, Jun. 2006.
 [7] A. Lozano “Longterm transmit beamforming for wireless multicasting,” Proc. IEEE ICASSP’07, Apr. 2007, pp. III417III420.
 [8] E. Matskani, N. D. Sidiropoulos, and Z.Q. Luo, “Efficient batch and adaptive approximation algorithms for joint multicast beamforming and admission control,” IEEE Trans Signal Process., vol. 57, no. 12, pp. 48824894, Dec. 2009.
 [9] M. Sadeghi, E. Björnson, E. G. Larsson, C. Yuen, and T. L. Marzetta, “Max–min fair transmit precoding for multigroup multicasting in massive MIMO,” IEEE Trans. Wireless Commun., vol. 17, no. 2, pp. 13581373, Feb. 2018.
 [10] S. Wang, M. Xia, and Y.C. Wu, “Multipair twoway relay network with harvestthentransmit users: resolving pairwise uplinkdownlink coupling,” IEEE J. Sel. Topics Signal Process., vol. 10, no. 8, pp. 15061521, Dec. 2016.
 [11] S. Bubeck, “Convex optimization: Algorithms and complexity,” Found. Trends Mach. Learn., vol. 8, no. 34, pp. 231357, 2015.
 [12] K. Huang and N. D. Sidiropoulos, “ConsensusADMM for general quadratically constrained quadratic programming,” IEEE Trans. Signal Process., vol. 64, no. 20, pp. 52975310, Oct. 2016.
 [13] E. Chen and M. Tao, “ADMMbased fast algorithm for multigroup multicast beamforming in largescale wireless systems,” IEEE Trans. Commun., vol. 65, no. 6, pp. 26852698, Jun. 2017.
 [14] S. J. Wright, “Coordinate descent algorithms,” Math. Program., vol. 151, no. 1, pp. 334, 2015.
 [15] A. Beck and L. Tetruashvili, “On the convergence of block coordinate descent methods,” SIAM J. Optim., vol. 23, no. 4, pp. 20372060, Oct. 2013.
 [16] Y. Nesterov, “Efficiency of coordinate descent methods on hugescale optimization problems,” SIAM J. Optim., vol. 22, no. 2, pp. 341362, Apr. 2012.
 [17] P. Richtárik and M. Takáč, “Iteration complexity of randomized blockcoordinate descent methods for minimizing a composite function,” Math. Program., vol. 144, no. 12, pp. 138, 2014.
 [18] P. Richtárik and M. Takáč, “Parallel coordinate descent methods for big data optimization,” Math. Program., vol. 156, no. 1, pp. 433484, Mar. 2016.
 [19] O. Fercoq and P. Richtárik, “Accelerated, parallel, and proximal coordinate descent,” SIAM J. Optim., vol. 25, no. 4, pp. 19972023. Oct. 2015.
 [20] Q. Lin, Z. Lu, and L. Xiao, “An accelerated proximal coordinate gradient method,” in Advances in Neural Information Processing Systems, 2014, pp. 30593067.

[21]
Y Sun, P. Babu, and D. P. Palomar, “Majorizationminimization algorithms in signal processing, communications, and machine learning,”
IEEE Trans. Signal Process., vol. 65, no. 3, pp. 794816, Feb. 2017.  [22] A. BenTal and A. Nemirovski, Lectures on Modern Convex Optimization (MPS/SIAM Series on Optimizations). Philadelphia, PA, USA: SIAM, 2013.
 [23] L. N. Tran, M. F. Hanif, and M. Juntti, “A conic quadratic programming approach to physical layer multicasting for largescale antenna arrays,” IEEE Signal Process. Lett., vol. 21, no. 1, pp. 114117, Jan. 2014.
 [24] S. Wang, M. Xia, K. Huang, and Y.C. Wu, “Wirelessly powered twoway communication with nonlinear energy harvesting model: Rate regions under fixed and mobile relay,” IEEE Trans. Wireless Commun., vol. 16, no. 12, pp. 81908204, Dec. 2017.
 [25] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, U.K.: Cambridge Univ. Press, 2004.
 [26] Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course. Applied Optimization. Springer, 2004.
 [27] A. Goldsmith, Wireless communications. Cambridge University Press, 2005.
 [28]
Comments
There are no comments yet.