I Introduction
The improvements in spectral efficiency, throughput and qualityofservice achieved by utilizing multiantenna networks have been extensively documented in the literature [bolcskei2006mimo], [hoydis2013massive]. In particular, optimizing the resource allocation in such multiantenna networks, by designing beamforming weights and scheduling specific users from the larger pool of potential users, is central to fully exploiting the finite wireless resources available and maximizing spectral efficiency [bjornson2013optimal], [dahrouj2010coordinated]. However, designing efficient resource allocation schemes remains challenging, since many utility functions of practical interest, such as sumrate, sumlogutility and minrate, are inherently nonconvex functions of transmit powers; in fact, the associated optimization problems for each of these objective functions have been found to be NPhard [luo2008dynamic]. Thus, solving these optimization problems to global optimality entails impractical computational complexity even for very small network sizes [liu2012achieving].
One solution to these problems is to utilize singlecell schemes, such as zeroforcing or matched filtering, in which base stations ignore intercell interference when designing beamforming weights and making scheduling decisions [suh2011downlink]
. While far from globally optimal, such uncoordinated schemes offer three significant advantages: first, they are analytically tractable in the sense that they can be analyzed using tools from probability theory and stochastic geometry to yield accurate estimates of the data rates achieved by users (and hence objective functions like the network sumrate)
[hosseini2018optimizing]; second, these schemes are computationally efficient, especially compared to globally optimal techniques or iterative block coordinate descent based algorithms like weighted minimum mean squared error (WMMSE) [shi2011iteratively] processing; third, and likely most important, in these schemes each basestation (BS) requires channel state information (CSI) from only its own users, not for users in other cells^{1}^{1}1We note that, like other works that focus on algorithm development [luo2008dynamic, shi2011iteratively], the acquistion of CSI, its overhead and quality is beyond the scope of this paper. However, we do acknowledge that this is a vitally important problem in wireless networks.. As such, these uncoordinated schemes offer a useful benchmark against which to evaluate the performance of more sophisticated resource allocation strategies.Coordinated resource allocation schemes, in which base stations jointly design their scheduling and beamforming decisions, improve on uncoordinated schemes. Such joint design leads to improved qualityofservice since it helps to mitigate the effects of both intercell and intracell interference [an2017achieving]. In doing so, however, such schemes inevitably incur increased computational complexity, as compared to uncoordinated schemes, since these algorithms optimize across multiple BSs. Since the objective functions for most utility maximization problems are nonconvex, such schemes typically rely on block coordinate ascent [shi2011iteratively], successive convex approximation [weeraddana2012weighted]
or other heuristic methods
[park2018iterative], [douik2016coordinated] to reach, at best, a local optimum.A number of coordinated schemes have been developed in the literature; for example, in [yu2013multicell] the authors develop an interference pricing and greedy proportionally fair (PF) scheduling algorithm to maximize the weighted sum rate (WSR) for the downlink. The proposed scheme demonstrates excellent performance in terms of average sumlogutility but is not guaranteed to be nondecreasing in the objective function since greedy scheduling is used. In [weeraddana2012weighted], Weeraddana et al. propose an algorithm based on the successive convex approximation approach to optimize the needed beamforming weights and power allocation in order to solve the general WSR maximization problem for the downlink of a multiple input multiple output (MIMO) cellular network. The algorithm requires minimal exchange of information between cooperating BSs; however, the algorithm is also shown to underperform the WMMSE scheme of [shi2011iteratively]. Additionally, one alternative is to employ worstcase weighted sumrate maximization [chinnadurai2018worst]
, although such an approach is generally better suited to settings with uncertainty in channel vectors.
The work in [shi2011iteratively] develops the WMMSE algorithm by demonstrating the equivalence of minimizing the weighted MSE and maximizing the WSR and adopting a block coordinate descent strategy to reach a (guaranteed) local optimum of the original WSR objective function. The algorithm iterates between obtaining beamforming weights and a set of auxiliary variables, optimizing one while the other is kept fixed. This WMMSE approach demonstrates excellent performance and is, thus, widely utilized as a benchmark against which the performance of other coordinated resource allocation schemes is compared.
The work in [shi2011iteratively] does not address the important problem of user scheduling, i.e., choosing a set of users to serve from the larger set of available users. One solution is to use the multicell WMMSE scheme, with all users across all cells are scheduled the BSs then jointly design beamforming weights for each and every user, as in [shi2011iteratively]. Eventually, after a number of iterations, the power assigned (the norm of the beamforming vectors) to most users will be essentially zero and these users are, then, implicitly not scheduled. However, since this set of “unscheduled” users is unknown a priori, beamforming weights have to be calculated for all users in the network for each iteration of the algorithm. This is extremely computationally expensive since a large matrix needs to be inverted in each step. Furthermore, it is worth emphasizing that the multicell WMMSE scheme, as described, is not globally optimal. Additionally, as the authors in [shen2018fractional] have observed, when scheduling all users, the WMMSE algorithm tends to get stuck in a lowquality locally optimal solution. Despite these drawbacks, the WMMSE algorithm remains the benchmark against which other resource allocation algorithms are evaluated [shen2018fractional2, kaleva2016decentralized, li2015new].
A lowercomplexity approach is to alternately optimize the scheduling and beamforming variables in a fashion similar to that proposed by [douik2018joint] and [zhang2017sum], by alternately utilizing the WMMSE algorithm for beamforming and updating the scheduling decisions using the greedy PF scheme. However, because of the greedy step, this approach is also not guaranteed to be nondecreasing in the original WSR objective function.
Globally optimal schemes to solve sumrate and weightedsumrate optimization problems have also been formulated in the literature, using the framework of monotonic optimization [liu2012achieving, bjornson2013optimal, brehmer2012utility, utschick2012monotonic], as well as geometric and arithmeticmean methods [roshandeh2018exact]. For example, the authors in [liu2012achieving] develop an algorithm to find the globally optimal beamformers to maximize the WSR for the downlink of a multiuser multiantenna network. However, as the authors in [brehmer2012utility] note, these globally optimal schemes require impractical computational complexity for even small systems, and are thus used almost exclusively as benchmarks for very small network sizes.
It is worth emphasizing that extensive CSI exchange between BSs and computational resources are required in order to enable both locally and globally optimal resource management schemes. These requirements are best served by the utilization of cloud radio access networks (CRANs), which allow for flexible deployment of resource allocation algorithms and ondemand processing while utilizing relatively inexpensive hardware at the BS [quek2017cloud, peng2016recent, peng2014energy, qian2015baseband, gerasimenko2015cooperative]. Deploying dedicated hardware at the BS level to implement individual algorithms is both technically challenging and cost ineffective [quek2017cloud, gerasimenko2015cooperative]; on the other hand, through the use of CRAN, lowcost remote radio heads can be utilized at each BS, while virtualized baseband processing units for the entire network can be implemented in the cloud and easily altered to enable different resource management schemes and capabilities [qian2015baseband, peng2014energy]. Thus, the CRAN architecture is necessary in order to enable coordinated resource allocation and is stated explicitly or assumed implictly in the various coordinated schemes detailed in the literature [weeraddana2012weighted].
In summary, effective multicell resource allocation schemes with relatively low complexity are, as yet, not available. It is this gap in the literature that we address here. Specifically, we develop an iterative scheduling and beamforming strategy to find an effective solution to the problem of maximizing the average sumlogutility function for the downlink of a multiuser MIMO network. Using the framework of fractional programming, originally developed in [shen2018fractional2] and [shen2018fractional] for uplink problems and extended to the matrix setting in [shen2018coordinated], we derive a scheme similar to a block coordinate ascent scheme. In [shen2018fractional2], fractional programming has shown large performance benefits for utility maximization in the uplink setting. Similarly, in [zhang2018energy], the authors utilize fractional programming to jointly optimize power control and scheduling decisions for energy efficiency maximization; the proposed algorithm can be implemented in both distributed and centralized fashion and provides excellent convergence and performance properties. In both these scenarios, the interference pattern changes with the scheduling decisions; thus, optimization across multiple cells can provide considerable benefit. This paper demonstrates the efficacy of fractional programming for the downlink, where we exploit the fixed interference pattern to improve performance and reduce computational complexity in the user scheduling step.
In contrast to our conferencelength work in [khan2018optimizing], this paper considers the most general setting of the problem: we derive the algorithm and present results for proportionallyfair WSR and sumrate maximization through scheduling and beamforming across multiple frequency bands with both joint and decoupled power constraints across the bands. Deriving the algorithm for this setting is considerably more challenging than the singleband case considered in [khan2018optimizing]; this is especially true for the joint power allocation across multiple bands in which, despite the orthogonality of the bands, all beamforming weights and scheduling decisions become coupled due to the power constraint. Nonetheless, we demonstrate that fractional programming allows us to decouple these optimization variables and solve for an effective solution with guaranteed nondecreasing convergence. Specifically, the contributions of this paper are:

We formulate the downlink sumlogutility maximization problem as a WSR problem in the general case of multiple interfering cells, multiple frequency bands, and a large number of potential users per cell.

We develop a joint beamforming and user scheduling algorithm based on fractional programming and the Hungarian algorithm. The Hungarian algorithm selects the optimal set of users from the much larger pool of potential users, for a given set of beamforming weights, in polynomial time. The development of these two aspects of our overall algorithm is our key contribution, as the scheduling step allows us to reach an effective solution while simultaneously reducing computational complexity.

We compare the performance of joint power allocation across all frequency bands with the simpler case in which power constraints are decoupled across bands. We show that the simpler approach, in fact, suffers little performance loss.

We show that each iteration of the proposed algorithm leads to nondecreasing objective function values; the overall algorithm outperforms several competing approaches, including the stateoftheart multicell WMMSE, as well as standard interiorpoint and sequential quadratic programming solvers widely utilized in the literature, with significantly lower computational complexity.

Our proposed algorithm outperforms the aforementioned competing stateoftheart approaches over a wide range of BS maximum transmit power values.
This paper is organized as follows: In Section II, we present our system model and formulate the desired optimization problem. In Section III, we describe the proposed solution approach in detail, while also presenting a proof for its convergence. In Section IV, we present the results and compare the performance and computational complexity of the proposed scheme against the benchmarks described previously. We draw some conclusions in Section V.
Prior to proceeding further, we define some notation used in this paper. , and represent the set of real numbers, nonnegative real numbers and positive numbers respectively. We denote scalars using lowercase (eg. ), vectors using lowercase boldface (eg. ), matrices using uppercase boldface (eg. ) and sets using script typeface (eg. ). The operator denotes the absolute value when applied to a scalar and cardinality when applied to a set; we use to denote the norm of a vector. The conjugate of a complex scalar is denoted by ; the Hermitian of a complex vector is denoted by . Likewise,
represents the set of complex numbers. A complex multivariate normal distribution with mean
and covariance matrix is denoted by . Finally,represents the identity matrix.
Ii System Model and Problem Formulation
We consider the downlink of a wireless cellular network, with basestations located in a regular hexagonal pattern; we denote the set of BSs in the network by . Each user associates with the geographically closest BS, with users associating with the basestation; under the hexagonal grid layout of the BSs, this leads to identically sized hexagonal cells. We choose this hexagonal pattern purely for convenience; the derivations and algorithms that follow are applicable to any distribution of BSs in a multicell network. We assume that there are a total of orthogonal frequency bands, each of bandwidth , available for transmission to each basestation. Each BS is equipped with transmit antennas which are capable of simultaneously transmitting on all available frequency bands; each user is equipped with a single receive antenna capable of simultaneously receiving signals on all available frequency bands. Furthermore, we also assume that the number of users associated with each BS significantly exceeds the number of transmit antennas available at the basestation (i.e., for all ). Figure 1 illustrates the system at hand with hexagonal cells.
Prior to stating the channel model, it is important to emphasize that this paper focuses on analyzing the best systemlevel performance. In this regard, we make two important assumptions that are common to the papers in this area [yu2013multicell],[shen2018coordinated]. First, we assume that all the BSs have access to perfect CSI of all their associated users on all frequency bands. Second, we assume that all the BSs are connected via highspeed backhaul links to a central cloud server that is capable of performing systemlevel optimization based on the CSI received from each of the BSs, and relaying back the beamforming weight vectors and scheduling information; thus our system model falls within the general realm of CRAN. This is necessary for a coordinated transmission strategy to achieve the best systemlevel performance. The downlink channel from the basestation to the user associated with the BS on the frequency band is a complex vector denoted by . As stated earlier, each BS serves only the set of users associated with it. Accordingly, the beamforming weight vector for the user associated with the BS on the frequency band is a complex vector denoted as .
In each time slot, each basestation schedules a subset of its associated users. Specifically, we impose the constraint that each basestation can schedule no more than
users per time slot on each available frequency band. The binary variable
is used to indicate whether the user associated with the BS is scheduled () or not () on the frequency band. The symbol intended for the user associated with the BS on the frequency band is a complex scalar denoted by . It follows that the received downlink signal at the user associated with the basestation on the frequency band is given by(1) 
where
denotes the additive zeromean Gaussian noise with variance
. Thus, the signaltointerferenceplusnoise ratio for the user under consideration on the frequency band is given byConsequently, the data rate to the user on the frequency band is given by . Recalling that we have a total of frequency bands available to serve the user, it is clear that the combined data rate to the user in a time slot, denoted by , is given by .
Now, we formulate the resource allocation problem. Our choice of the objective function is the network WSR.
The maximization of the weighted sum rate with an appropriate choice of weights in each time slot can lead to approximate maximization of arbitrary network utility functions. A popular network utility function is the sum of the logarithm of the longterm average data rates achieved by each of the users. This leads to a proportionally fair allocation of resources amongst all users in the network.
We are interested in answering the following question: given a network as described above, to maximize the WSR in each time slot which subset of users should each BS serve, in which frequency band, at what power level and with what beamformer design? The optimization problem that encapsulates this question for a single time slot can be expressed as
(2a)  
(2b)  
(2c)  
(2d)  
(2e) 
Here, represents the weight for the user associated with the basestation^{2}^{2}2The proportionally fair weight for the time slot is usually determined by finding the inverse of the longterm average data rate achieved by the user in question over an exponentially decaying window [yu2010book], i.e., , where represents the exponentially weighted average data rate achieved by the user in the time slots preceding the time slot across all frequency bands. This is calculated using the update equation , where represents the total data rate achieved in the time slot across all bands, and represents the forgetting factor. , while and denote the optimization variables gathered into a matrix: the scheduling variables () and the beamformers (). For simplicity of notation, we drop the index denoting the time slot in the formulation of the optimization problem in (2). The SINR of the user associated with the BS on the frequency band is denoted by .
Our objective function is the network WSR for a single time slot as expressed in (2a). The constraint in (2b) enforces the scheduling decisions by the BS to be binary; a user is scheduled if its scheduling variable equals one, and vice versa. The second set of constraints in (2c) ensures that a BS is restricted to serving a subset of at most M users from the set of all associated users on each available frequency band. The constraints in (2d) impose a total transmit power constraint at each BS across the different frequency bands (and thus an average power constraint of across each individual frequency band). Finally, the equality constraints in (2e) enforce the SINR values at each user, BS and frequency band.
Iii Proposed Approach
We note that the optimization problem in (2) has a mixedbinary integer form and is nonconvex in the beamforming variables. In fact, as stated earlier, the general WSR maximization problem has been shown to be NPhard by Luo and Zhang in [luo2008dynamic]. We also note that the beamforming and scheduling variables are coupled across the different frequency bands due to the sumpower constraint in (2d)), even though the bands themselves are orthogonal and thus noninterfering. To solve this problem and obtain an effective solution, we adopt an iterative optimization strategy, based on fractional programming as developed by Shen and Yu in [shen2018fractional] and [shen2018fractional2].
We begin by introducing Lagrange multipliers for each of the equality constraints in (2e), in a similar fashion to [shen2018fractional2], as follows
(3) 
where the represent the Lagrange multipliers for each of the equality constraints in (2e). For notational clarity, the SINR auxiliary variables and Lagrange multipliers are collected in the matrices and respectively. Consequently, in order to satisfy the firstorder condition with respect to the values, we set the partial derivative with respect to the Lagrangian equal to zero, i.e.,
(4) 
Now substituting (4) into the equality constraint (2e), we then obtain the optimal Lagrange multipliers as
(5) 
Substituting the optimal Lagrange multipliers from (5) into the expression for the Lagrangian in (3), we obtain the following reformulated objective function, which we denote by .
(6) 
Thus, it follows that the original optimization problem in (2) can be expressed as the following reformulated optimization problem
(7a)  
(7b)  
(7c)  
(7d) 
We note that the reformulated optimization problem in (7) is equivalent to the original optimization problem in (2) in the sense that the optimal objective function value and the associated primal optimization variables, and , for both problems are identical.
A key point to note before we proceed further is that the ratio terms which were present inside the logarithm function in problem (2) have now been moved outside as the sumofratios term in the reformulated objective function. This is the first step in allowing us to develop an iterative optimization strategy. In order to proceed further, we make use of the following theorem, as a vectorvalued version of that derived by Shen and Yu in [shen2018fractional2]:
Theorem 1.
Let and , where be two functions of the optimization variables and . Furthermore, let be a constraint set. Then the sumofratios optimization problem
(8) 
is equivalent to the following reformulated optimization problem
(9) 
in the sense that the optimal values of the objective function and primal optimization variables are identical. Note that the vector is an auxiliary variable.
Proof.
Applying Theorem 1 to the sumofratios term in in (6), we obtain the following new objective function
(10) 
and accordingly, (7) can be expressed as the equivalent optimization problem below
(11a)  
(11b)  
(11c)  
(11d) 
Once again, to avoid excessive notational clutter, we collect the values in the matrix .
From Theorem 1, it follows that the optimization problem in (11) is also equivalent to the original optimization problem in (2) in the sense that the optimal objective function and the associated primal optimization variables, and , for both problems are identical.
We emphasize that both of the reformulated problems in (7) and (11) remain nonconvex and NPhard (like the original problem) since our reformulation steps result in equivalent problems. Crucially, however, the new objective functions are now in a form amenable to an iterative optimization strategy leading to an effective solution to our original optimization problem in (2).
The continuous variables: To develop the iterative approach, we first observe that for fixed and , the optimal can be found by setting
(12) 
as is concave in . Next, we note that holding the , and values fixed, is concave in , so the optimal values can be found by setting
(13) 
In a similar fashion, when , and are fixed, we can find the optimal values (i.e., the beamforming weight vectors). Note that due to the sumpower constraint (11d), taking the derivative of directly with respect to to find the optimal beamforming weight vectors is not valid. To simplify the derivation, we recall that with these variables fixed, we can write the problem of finding the optimal beamforming weight vectors as:
(14a)  
(14b) 
We note that this optimization problem is concave and thus readily solvable; in fact, we can derive a closedform expression for the weights by introducing Lagrange multipliers for the sumpower constraint at each BS in (14b). This yields the Lagrangian (not to be confused with the Lagrangian we derived earlier) as:
(15) 
where, to keep our notation uncluttered, we collect the multipliers in the vector . Then the firstorder optimality condition of with respect to each yields:
(16) 
The dual variable should be chosen to satisfy complementary slackness in the total power constraint at BS ; observing (16), it is clear that the magnitude of is a decreasing function of . Thus, we can obtain easily through a bisection search, which in turn can be used to obtain the optimal beamforming weight vectors. At this juncture, we note an important point for future reference: the beamforming step involves inversion of a matrix for each user, which is computationally costly, especially when we have to perform it for a large number of users. Furthermore, if the scheduling variable for a user is zero, the beamforming weight for that user is automatically zero; there is no need to perform any computation in this case.
The binary scheduling variables: The final step in the iterative approach is to optimize the user scheduling variables when the continuous variables, , and , are held fixed. To do so, we first observe that since the frequency bands are assumed to be orthogonal, the scheduling decisions are decoupled across the different frequency bands. Furthermore, we make use of an intuitive yet powerful insight first suggested in [stolyar2009self] and also observed in [yu2013multicell]: provided that the beamforming weight vectors are held fixed, the interference value experienced by a user in the downlink scheduled on a particular frequency band depends only on the beam used to serve that user and remains fixed regardless of which other users are scheduled on the remaining beams. We note that this is different from the uplink setting, in which the interference pattern in the network changes when a new set of users is scheduled on a given set of beams.
The fact that the interference pattern changes in the uplink creates significant challenges in terms of scheduling: as the authors of [shen2018fractional2] observe, even with beamforming weights fixed, the problem of optimal scheduling remains NPhard. This substantially affects the quality of solutions obtained since as emphasized earlier, we do not know which users are suitable to schedule a priori. One solution, as mentioned previously, is to schedule all users in the network and transform the original problem to an unconstrained problem in terms of scheduling; however, since we need a matrix inversion per user, this results in an undesirable increase in computational complexity to levels identical with the WMMSE algorithm [shen2018fractional2]. In the downlink, however, we are not bound by this constraint, i.e., changing scheduling decisions does not affect the interference pattern. Thus, we can schedule only a subset of users in the entire network, reducing complexity as only the beamforming weights for a small number of users need to be calculated.
To illustrate this point, let us consider the BS in a multicell network. The scheduling decisions for different frequency bands are decoupled; thus, we consider the frequency band without loss of generality. In addition to this, each BS can schedule at most users in a single time slot, whereas it has users associated with it. Suppose the BS is serving users on the indicated frequency band using a fixed set of nonzero beamforming weights which we denote by , i.e.,
It follows that the user associated with this BS can find itself in one of two scenarios with regards to the given frequency band: either it is scheduled for transmission on the one of the beams from the set or it is not being served by the BS. For the former setting, suppose the user is scheduled on the nonzero beam; then the power received on this beam is the signal power. If the user is not scheduled, however, all the power received is interference power. For notational convenience, let us denote by the combined received signal, interference and noise power of the user in question, i.e.,
(17) 
Then the total interference power received by this user is given by
Importantly, if this user is scheduled on the nonzero beam, the interference power it experiences does not depend on which users are scheduled on the remaining beams within its own cell. The same holds true when the user in question is not scheduled on any beam; the interference power experienced by the user is the same regardless of which set of users in its own cell is scheduled on the given set of beams. In addition, we also make the following critical observation: for the BS, changing the set of users scheduled on the fixed set of beams does not change the interference pattern experienced by users outside its own cell. This can be seen from the fact that in (17), the intercell interference power received by user associated with BS from BS on frequency band depends only on the interference channel , and the beamforming weight vectors can be permuted over any of the users served by BS without affecting . Meanwhile, the intracell interference power recieved by user associated with BS on frequency band depends only upon the informationbearing channel and the beamforming weight the user is scheduled on . Taken together, these observations imply that if the beamforming weights throughout the network are held fixed, we can locally optimize the scheduling at each BS in order to maximize the networkwide sum weighted rate for the given set of beamforming weights.
Accordingly, we can formulate a strategy to help us find the best set of users to be served by the BS on its set of nonzero beamforming weights for the frequency band. In other words, our goal is to find the set of users out of the total users in the cell that will yield the maximum weighted sum rate on the given set of beamforming weight vectors. Our choice of users should satisfy the constraint that a user can only be served on a single beam by a BS in keeping with our original system model. A greedy strategy of assigning the user capable of achieving the highest weighted rate on each beam is not guaranteed to solve the combinatorial problem of selecting the best subset of users of the available.
Our first step in matching the users to the fixed beams to maximize the WSR is to define the combined weighted rates matrix for the given set of beams and users on the frequency band. The entry in this matrix, denoted by , indicates the weighted rate that would be achieved by the user if it is scheduled on the nonzero beam on the frequency band, i.e.,
Note that we compute the total interference received by every user in the network, regardless of whether it is scheduled or not; thus, every user is considered eligible for possible scheduling on a nonzero beamforming weight.
It follows that our goal of scheduling the users on the appropriate beams can be formulated as the following binary integer optimization problem:
(18) 
where the binary variables indicate whether or not the user is scheduled on the nonzero beam by the BS on the frequency band. The objective function maximizes the WSR. The first constraint requires at least one user to be scheduled on every beam while the second ensures that a user is scheduled in one beam only.
Note that these binary variables are not the same as the optimization variables which refer to whether the user is scheduled or not. These variables have the additional index which denotes the beam. Specifically, , i.e., if the user is scheduled on any one of the beams associated with the BS, it is scheduled by that BS.
The problem in (18) is, in fact, a linear sum assignment problem, and can also be viewed as a maximum weighted bipartite matching problem, which has been extensively studied in the literature and can be solved in polynomial time using techniques like the Hungarian algorithm [grotschel2012geometric] (more formally known as the KuhnMunkres algorithm) or the auction algorithm [bertsekas1990auction]. Specifically, using the Hungarian algorithm to solve the linear sum assignment problem for an matrix, where , has a complexity of [grotschel2012geometric]. In our case, we have ; hence the complexity of solving problem (18) is . Solving this optimization problem for each BS allows us to optimally schedule the users to maximize the network weighted sum rate on the fixed set of beamforming weight vectors. We remark that this scheduling scheme using fractional programming and Hungarian algorithm is different from the uplink setting [shen2018fractional2], where the scheduling of one user would have changed the interference pattern; consequently the only way to solve the uplink problem to global optimality is by extensive search as mentioned earlier.
This scheduling setup also reduces complexity as compared to the unconstrained scheduling setting, as we now only have to calculate the beamforming weight vectors for a maximum of rather than users at each iteration of the algorithm. Importantly, this set of users can change from iteration to iteration as the beamforming weights are matched to the best set of users; this is unlike the uplink setting where a user not scheduled during the initialization remains unscheduled throughout all subsequent iterations of fractional programming algorithm [shen2018fractional2].
With a fixed set of beamforming weight vectors, therefore, the proposed scheduling scheme finds a percell optimal selection of scheduling decisions; hence, from iteration to iteration, the user assignment to beamforming weight vectors is set according to which combination yields the greatest network WSR. Here, we only consider scheduling each user on a single beam per frequency band for two reasons: first, this is in keeping with our original system model in which each user is scheduled on a maximum of one data stream per frequency band and the standard assumption for coordinated resource allocation algorithms including multicell WMMSE [shi2011iteratively] and uplink fractional programming [shen2018fractional2], and second, the framework of the Hungarian algorithm does not allow for scheduling on more than a single beamforming weight vector.
Combining all these steps together, the proposed technique for coordinated resource allocation in the downlink of multiuser MISO networks is summarized in Algorithm 1. The algorithm optimizes one of the optimization variables keeping the others fixed, iterating till convergence.
Theorem 2.
The proposed algorithm described in Algorithm 1 is nondecreasing in the objective function after each iteration.
Proof: We refer to the objective function in (2a) as . The nondecreasing convergence can be proven by considering the following chain of reasoning going from iteration to :
(19a)  
(19b)  
(19c)  
(19d)  
(19e)  
(19f)  
(19g)  
(19h) 
where (19a) follows from the fact that the reformulated objective function equals the original when the optimal values are substituted; (19b) follows from the fact that the update of when all other variables are fixed maximizes ; (19c) follows from Theorem 1; (19d) follows from the fact that the update of when all other variables are fixed maximizes ; (19e) follows from the fact that the update of when all other variables are fixed maximizes ; (19f) follows from the fact that the joint update of and using (18) maximizes when all other variables are fixed; (19g) follows from Theorem 1; and (19h) from similar reasoning to (19a). Note that we use to denote the set of permuted beamforming weights obtained from by solving (18).
Coupled with the fact that the objective function has a finite maximum, we can state that the algorithm converges. However, since the scheduling variables are binary, we cannot call this a local optimum. Furthermore, we observe that the proposed scheme is not exactly a block coordinate ascent scheme, since we use the partial derivative of in (12); nonetheless, the algorithm converges in a nondecreasing fashion to an effective solution of the original WSR maximization problem as described above.
Iv Performance Evaluation of Proposed Scheme
In order to evaluate the performance of the proposed algorithm, we compare it with the following different coordinated and uncoordinated resource allocation schemes:

Matched filtering transmission with equal power allocation and roundrobin scheduling: This is the simplest uncoordinated resource allocation strategy which can be implemented and as such it provides a useful benchmark with which to compare the performance of the multiuser algorithm.

Zeroforcing with equal power allocation and roundrobin scheduling: Zeroforcing eliminates intracell interference and thus provides improved performance compared to matched filtering. Zeroforcing does involve increased computational complexity compared to matched filtering, requiring a matrix inversion to determine the beamforming weight for each of the scheduled users.

WMMSE with greedy scheduling: The WMMSE algorithm has been well studied in the literature as a coordinated beamforming scheme; adaptive power allocation is implicitly included in the beamformer design. However, as noted in [shen2018fractional2], WMMSE is intended for use as a beamforming algorithm; the question of which users to schedule remains to be answered. Accordingly, we adopt the greedy proportionally fair scheduling scheme introduced in [yu2013multicell].
In this scheduling scheme, we first initialize the algorithm with a random set of users. The beamforming weights are held fixed and we sequentially determine which user will maximize the weighted rate on each beamforming weight from the BS, i.e., the user associated with the BS is scheduled on the beam on the frequency band if
This approach to scheduling is distinct from solving the linear sum assignment problem in the proposed algorithm, as the users are selected greedily for each beam rather than jointly across the set of all given beams.
Importantly, this algorithm is not necessarily monotonically nondecreasing.

Multicell WMMSE: In multicell WMMSE [shi2011iteratively], each BS initializes the algorithm by simultaneously scheduling all the users in the network. The algorithm iterates on the beamformer design for all these users; eventually the beamforming weights for the majority of users will converge to zero, and these users are then implicitly not scheduled by the BS. This multicell WMMSE scheme is the stateoftheart in the literature and has the same convergence properties as our algorithm. However, this comes at a cost: in order to determine the beamforming weights for each user, WMMSE performs a matrix inversion and bisection search. With the multicell WMMSE scheme, since all the users in the network are scheduled, the number of matrix inversions becomes extremely high. This is especially inefficient as the number of users ultimately assigned beamforming weights with nonzero power is very small. As we will show in the analysis of the results, our proposed algorithm is capable of outperforming multicell WMMSE, while simultaneously providing significant savings in computational complexity.
We consider a network partitioned into identical hexagonal cells, with BSs located at the center of each cell. The users are distributed randomly with uniform density over the entire network area. Furthermore, we also assume that the number of users associated with each BS significantly exceeds the number of transmit antennas available at the BS (i.e., for all ).
To compare the performance of the aforementioned resource allocation schemes, we simulate a 7cell network with wraparound. To ensure a fair comparison, all iterative optimization schemes were run for 15 iterations. The rest of the simulation parameters are as listed in Table I.
Total bandwidth  W = 20 MHz 

BS maximum average transmit power per frequency band  = 43 dBm 
Noise figure  = 9 dB 
Pathloss exponent  
Reference distance  0.3920 m 
We begin by comparing the performance of the proposed algorithm against the benchmark schemes listed, as well as against standard interiorpoint and sequential quadratic programming algorithms utilized in the literature [nocedal2006numerical, fletcher1987practical]. The latter were implemented using standard available optimization software; all users were scheduled in a similar fashion to the multicell WMMSE algorithm. Figure 2 shows the convergence of network sumrate for M=2 transmit antennas and =5 users per cell.
As we can observe, the proposed algorithm achieves a higher network sumrate, converging smoothly in a monotonically nondecreasing fashion; this is as expected from Theorem 2. At the same time, it is also clear that the uncoordinated resource allocation strategies perform substantially worse than the coordinated strategies, with matched filtering being the worst beamforming strategy. In addition, we observe that there is a gap in performance between greedy and multicell WMMSE. This is readily understood since the greedy scheduling approach is not guaranteed to increase the network WSR after the scheduling reassignment. The multicell WMMSE algorithm is guaranteed to converge in a monotonically nondecreasing fashion, and as such provides good performance. Nevertheless, it is outperformed by the proposed scheme, which converges to the highest network weighted sum rate of all resource allocation schemes. In contrast, the sequential quadratic programming and interiorpoint algorithms show improving objective values as the number of iterations increase; however, they provide the worst performance. Due to the highly nonconvex nature of the WSRmax optimization problem, these methods have demonstrated inferior performance in prior works in this area [chitti2013joint, shen2018fractional2]; hence these results are not unexpected.
Convergence: Figure 3 shows the sumrate convergence of the various resource allocation schemes for a single time slot with identical channel sets but with a much larger network size of and . We observe similar trends to those in Figure 2, with two notable differences. First, the greedy WMMSE algorithm is substantially outperformed by the multicell WMMSE algorithm, and the performance gap between the proposed scheme and the benchmarks grows larger as well. Secondly, due to the large number of optimization variables involved, the SQP and interiorpoint algorithms failed to converge for this setting. In particular, for the SQP approach, taking a direct step involves computing the Hessian of the optimization variables, which is extremely computationally demanding for a problem of this given size [nocedal2006numerical].
It is important to emphasize that none of the schemes result in the globally optimal solution and, if the number of potential users to be scheduled is very large, the WMMSE algorithm can get stuck in a poor solution and often takes longer to converge. In contrast, the proposed scheme restricts the BS to serve at most users, thereby narrowing the pool of potential users and ensuring faster convergence to a higherquality local optimum.
PF Rates:
In Figure 3, we present the cumulative distribution functions (CDFs) of the longterm user average data rates achieved with the different resource allocation schemes (with
, and ). In order to compare these, we consider two metrics: the sum of the logarithm of the longterm average data rates (in megabits per second) and the percentile user rates, which are logged in Table II. We choose to maximize the WSR in each time slot when the weights are chosen according to the proportionally fair metric described earlier (with the forgetting factor chosen as 0.05), as this leads to maximization of the sum of the logarithm of the average data rates achieved by the users. Thus, comparing the average sumlog utility of the different resource allocation schemes allows us to directly compare them in terms of our original objective. We note that comparing the absolute difference (rather than the relative gain) in the sumlogutility of different schemes illustrates the improvements made to the average rates achieved by the users.Comparing the percentile user rates allows us to compare the quality of service for the celledge users for the different resource allocation schemes. It is important to note, however, that the algorithms do not optimize the celledge rate; comparing edge user rates merely allows us to understand the quality of service that these different resource allocation schemes provide to the lowerpercentile users in the network.
As we can observe from Figure 3 and Table II, the uncoordinated resource allocation schemes have the worst performance in terms of both the average sumlogutility and edge user rates. This is unsurprising, since the benefits of coordinated resource allocation schemes over uncoordinated schemes are wellknown. We note that employing zeroforcing results in significantly higher average sum log utility than matched filtering; this is also to be expected, since the former scheme eliminates intracell interference. All three coordinated resource allocation schemes achieve significantly higher performance than the uncoordinated schemes. However, of the two WMMSE resource allocation schemes, greedy scheduling has the worst performance; this is because greedy scheduling is suboptimal and the associated algorithm is not guaranteed to be monotonic.
Utilizing multicell WMMSE results in a further significant gain to both the average sumlogutility and the edge user rates. The proposed scheme performs even better than the multicell WMMSE scheme with considerably higher sumlogutility and slightly better edge rates. The majority of the performance gain comes at higher percentiles, where the proposed approach achieves much better data rates than multicell WMMSE. Indeed, compared to the uncoordinated resource allocation schemes, there is a fourfold increase in the percentiles, with an increase of almost 30% compared to the greedy WMMSE scheme. The sumlogutility of the proposed scheme is also considerably higher than that achieved by the greedy WMMSE scheme.





Matched Filtering  203  0.51  
ZeroForcing  446  0.51  
Greedy WMMSE  821  1.78  
Multicell WMMSE  908  2.15  
Proposed Algorithm  952  2.25 
A key point related to Table II is that we initialize the proposed algorithm by scheduling the set of users that achieves the highest interferencefree weighted sum rate with an equal power allocation. For a worstcase initialization (i.e., if we start with the set of users that achieves the lowest interferencefree weighted sum rate), the sumlog utility function is 909 for the proposed algorithm, still higher than that achieved by the multicell WMMSE algorithm. We emphasize that there is no known optimal initialization for the coordinated resource allocation schemes.
Sum Rate: To change the optimization objective function, we compare the performance of the aforementioned resource allocation schemes in terms of network sumrate when the BS maximum transmit power, is varied from 20 to 70 dBm for and . When maximizing the sumrate, all users’ weights are set to unity and this assignment does not change across timeslots. The corresponding results are shown in Figure 5.
As the figure clearly shows, the proposed algorithm substantially outperforms the competing WMMSE approaches. In particular, there is a gap of approximately 22% in network sumrate across transmit powers above 40dBm between the proposed approach and the benchmark multicell WMMSE algorithm. Both algorithms substantially outperform the greedy WMMSE algorithm, delivering sumrates that are more than twice as high for large BS transmit powers. As expected, the performance the matched filtering and zeroforcing schemes lags behind those of the coordinated schemes.
This result in particular demonstrates the performance advantage our optimal scheduling approach based on the Hungarian algorithm delivers over greedy scheduling. One interesting phenomenon to note is that the networksum rate of the proposed algorithm strictly increases as a function of BS transmit power; however, this is not always the case for the multicell WMMSE algorithm. This result highlights the tendency of the multicell WMMSE algorithm to get stuck in lowquality local optima.
Optimization Across Frequency Bands: Finally, we compare the performance of the joint power allocation approach versus the perband power allocation method. In the former setting, we assign a total power of to be distributed over the frequency bands at each BS. This means that the BS is free to use as much or as little power in each of the frequency bands, provided that the total power consumed across all frequency bands is less than . In contrast, with the perband power allocation strategy, each BS can only utilize a maximum power of per individual frequency band. As we can observe in Fig. 6; there is no significant performance benefit in terms of either edge rates or overall utility to choosing the joint power allocation strategy over the perband strategy. Furthermore, the number of iterations to calculate the beamforming weights is more than the decoupled setting, since the bisection search step is now being performed across all frequency bands.
Iva Complexity Analysis
In comparing the performance of these various resource allocation strategies, a critical point is the computational complexity involved. From [shi2011iteratively], the computational complexity of the beamforming step in WMMSE can be derived as , where represents the total number of users scheduled in the network. For our setting, we have cooperating cells. For simplicity of analysis, we assume that each cell has users associated with it; thus we have for the multicell WMMSE scheme and for the greedy WMMSE scheme and proposed.
Accordingly, the computational complexity of the WMMSE algorithm with greedy scheduling can be found as whereas the computational complexity of the multicell WMMSE algorithm is . The proposed algorithm has the same computational complexity as the greedy WMMSE scheme (since we schedule at most users in a single time slot). It follows that the computational complexity of the fractional programming strategy is at most given by . A comparison of the periteration computational complexity of the various resource allocation schemes discussed is provided in Table III.

Complexity Per Iteration  

Matched Filtering^{3}^{3}3Note that these uncoordinated schemes require only a single iteration to determine the network resource allocation strategy.  
ZeroForcing^{†}^{†}footnotemark:  
Greedy WMMSE  
Multicell WMMSE  
Proposed Algorithm 
To understand these results, we revisit some of the assumptions made earlier in the system model. Since we deal with a largescale MIMO system, the number of users in each cell () is assumed to be significantly larger than the number of antennas at the BS. Critically, as becomes asymptotically large, for multicell WMMSE, the term will dominate the complexity expression. On the other hand, for the proposed algorithm, the only term dependent upon the number of users per cell is ; thus, if is fixed and as per the assumption in our system model earlier, then the term becomes negligible in comparison. It follows that in this case, the complexity of the multicell WMMSE algorithm will be roughly times higher than the proposed algorithm. Our algorithm achieves this noteworthy reduction in complexity while exceeding the performance of the multicell WMMSE algorithm.
It is worth noting that even with large computation resources in a CRAN, a fully centralized globally optimal solution is infeasible since the problem at hand is NPhard. Furthermore, even in this case, the reduced computational complexity of our algorithm, compared to say the multicell WMMSE approach, is important for implementation with a large number of users and BS antennas. We emphasize that, furthermore, this gain in computational complexity is accompanied by improved performance. Compared to the generic solvers we have considered, our proposed algorithm is more convenient from an implementation perspective since the updates for the optimization variables are expressed in closedform. This is also in contrast to globally optimal strategies such as outer polyblock approximation, in which the updates are not expressible in closedform [utschick2012monotonic]. We also note that the SQP and interiorpoint algorithms do not have closedform updates; calculating Hessians for the latter approach in particular is computationally taxing for large network sizes [nocedal2006numerical].
Prior to proceeding further, we consider the reason for this behavior in greater detail. Recall that in the multicell WMMSE scheme, we schedule all users simultaneously and let the beamformer design iterate. Thus, this is equivalent to considering the original optimization problem but with no scheduling constraints. As the algorithm converges, most users are assigned beamformers with zero power; the number of users ultimately assigned nonzero power is very close to . Nonetheless, the beamforming weights still have to be calculated for the users who will ultimately be dropped since they are not known a priori. This requires computationally costly matrix inversions, thus leading to a higher overall complexity. With the proposed algorithm, since we schedule the best users in a single time slot, the number of matrix inversions required is identical to the greedy WMMSE strategy.
A second consideration is that the set of users scheduled in each iteration of the algorithm has the potential to change. Unlike the greedy WMMSE strategy, however, the proposed scheme ensures that the network WSR increases after each scheduling step as we find the best networkwide scheduling pattern for the fixed set of beamforming weight vectors. Hence, we conclude that the proposed strategy of intelligently scheduling the smaller set of users (which is close to the number of users implicitly scheduled by the multicell WMMSE scheme) nets an improvement in terms of computational complexity while providing superior performance. Also, it is worth pointing out that scheduling all users, as the multicell WMMSE algorithm does, requires a much greater overhead in terms of communication between the coordinated BSs in the network.
Resource Allocation Scheme  ,  , 

Matched Filtering  0.01  1.1 
ZeroForcing  0.05  1.2 
Greedy WMMSE  1.3  27.2 
Multicell WMMSE  1.3  48.8 
Proposed Algorithm  1.3  19.5 
InteriorPoint  40.6  N/A 
SQP  53.6  N/A 
Finally, we consider the actual execution time of the various resource allocation schemes. Although the complexity analysis provides a formal characterization of how the running time of each resource allocation scheme scales as the network parameters change, it is nonetheless useful for us to compare the average execution time of each scheme. To ensure a fair comparison, we measure the time taken from initialization until the periteration increase in the network weighted sum rate is less than 10% for the given time slot. For the simulation parameters detailed in Table I, the average execution times on the desktop computer used to generate these results are logged in Table IV. As we can see, the uncoordinated schemes require a much lower execution time on average than the coordinated schemes; however, this comes at the expense of compromised performance as discussed earlier. Both the proposed algorithm and greedy WMMSE approach perform similarly in terms of average execution time. The multicell WMMSE scheme has the highest average execution time among coordinated schemes; as discussed earlier, this is due to the fact that all users in the network are scheduled simultaneously, so the number of matrix inversions needed is very large compared to both greedy WMMSE and the proposed algorithm. Finally, the interiorpoint and SQP algorithms have extremely long execution times for the and setting and do not converge within a reasonable time for the and setting.
V Conclusions
In this paper, we developed a coordinated resource allocation scheme for the downlink of multiuser MIMO networks with multiple orthogonal frequency bands. The proposed scheme outperforms uncoordinated schemes like zeroforcing and matched filtering, as well as the coordinated greedy and stateoftheart multicell WMMSE schemes in terms of the average sumlogutility function and network sumrate. Furthermore, the proposed scheme offers significant computational complexity savings over the stateoftheart multicell WMMSE scheme and also has a much lower average execution time. By intelligently scheduling the best subset of users for a fixed given set of beamforming weights, the proposed approach is able to reduce the computational complexity as well as providing a higher weighted sumrate in a single time slot and higher longterm average sumlogutility. Thus, we conclude that the proposed approach offers an effective highperformance and lowcomplexity solution to the nonconvex NPhard weighted sumrate maximization problem.