I Introduction
To address the exponentially increasing demand in 5G systems, communications in the millimeterwave (mmWave) band are among the most promising candidates [1], due to the large mmWave spectrum. While most investigations of mmWave communcation have been focused on systems above GHz, in the current work, we study multiuser multicell coordination, in sub GHz systems, e.g., Xband () GHz, Kuband () GHz, and GHz in the Ka band. These systems are characterized by a relatively large antenna spacing (compared to systems beyond GHz), thereby implying that tens (rather than hundreds) of antennas can be fitted on transmitters/receivers. Thus, the urge for hybrid analogdigital precoding is not stringent and fully digital precoding/combining is preferred. Moreover, propagation channels are still dominated by Rayleigh/Rician components in non lineofsight environments [2]
. As a result, conventional pilotbased channel estimation techniques are more efficient than beam alignment/sounding
[3].The implication of highly directional wireless links at GHz (and above) is that they are almost interferencefree. However, in sub GHz systems, channels are less sparse (in terms of eigenmodes), and beamforming has relatively lower directivity than systems in the higher bands. This is attributed to the presence of significant multipath components, in urban propagation (confirmed by narrowband/wideband channel measurements in the GHz, GHz, and GHz bands [4]).
Consequently, interference may still be a limiting factor in these systems, especially when considering dense multicell scenarios, where interference management and coordination are still beneficial. Coordination in multiuser multicell networks generally refers to the exchanges of information among base stations, to increase the network sumrate. While these aspects have been investigated at the MAC layer [5], they are still essentially unaddressed at the physical layer. Indeed, the benefits/costs of coordination in sub GHz systems is still an open problem, especially in the case of ultra dense networks  believed to be pervasive in future networks [6]: In these scenarios, ignoring interference for celledge users may be a limiting factor on the sumrate.
In the context of multiuser multicell networks, coordination is done using the framework of ForwardBackward (FB) iterations: this overtheair training leverages local Channel State Information (CSI) at each Base Station (BS) and user, to iteratively optimize the filter at each BS/user, in a fully distributed manner. This framework has been at the heart of most distributed coordination algorithms, such as interference leakage minimization [7], maxSINR [7], minimum meansquared error [8], and (weighted) sumrate maximization [9].
Unfortunately, these conventional schemes suffer from extremely elevated overhead, as they require hundreds/thousands of FB iterations before convergence [10], where the latter increases with the number of BSs, users and transmit/receive antennas [10]. Moreover, mmWave systems will have a larger number of BSs/cells per unitarea (due to their inherent short range), and require a much larger number of BS/user antennas (to mitigate pathloss through array gain), compared to sub GHz systems. Consequently, this shortcoming severely impairs the applicability of conventional coordination, to the systems in question. This limitation is reinforced by the lower coherence time of mmWave channels. Despite bearing direct relevance to conventional sub GHz, this major limitation has only been addressed in a few recent works [11, 12, 13, 14], and remains essentially unexplored.
In this work, we design lowoverhead distributed coordination algorithms, constrained to operate in a just a few FB iterations for increasing dimensions of the networks. In the algorithm design, we further aim at a tenfold reduction in the communication overhead of conventional algorithms. We derive and optimize a lower bound the sumrate maximization problem in MIMO Interfering MultipleAccess Channels (MIMO IMAC), that we dub Difference of Log and trace (DLT). Unlike the sumrate, when combined with alternating optimization methods, the DLT expression results in subproblems that are distributed (only requiring local CSI at each BS and user) [15]. Despite their nonconvexity, we derive the optimal solution to each of these subproblems, that we dub nonhomogeneous waterfilling (a variation on the classical waterfilling). This solution turnsoff data streams with lowSINR, and allocates power to streams with high SNR. The builtin “streamcontrol” is key to achieving the tenfold increase in convergence speed. Moreover, we show that the devised algorithm converges to a locally optimal solution of the DLT bound. It is revealed that the proposed fastconverging algorithm offers large sumrate gains, compared to many standard and fastconverging benchmarks. Coordination is still a vital aspect of these systems, and offers major performance gains over uncoordinated transmission While the approach is developed for the MIMO IMAC (i.e., uplink communication), the methods/results are applicable to the downlink, and all its special cases.
Ii System Model
The proposed algorithms operate within the framework of FB iterations/training by exploiting the uplink (UL) and downlink (DL) channel reciprocity. The proposed scheme is designed to operate in the lowoverhead regime, by restricting the number of FB iterations, , to (a tenfold reduction in communication overhead, over conventional coordination algorithms). We assume that each BS and user posses local CSI only, that is assumed to be perfect (i.e., no CSI errors).
We consider a multiuser multicell setting, with cells/BSs, serving users each. In the MIMO IMAC case, transmitters at users and receivers are BSs. Each transmitter and receiver have and antennas, respectively, and communicate data streams. Let be the set of BSs, the set of users served by BS , and the total set of users. Moreover, we denote by user the th user (), in the th cell (). The recovered signal for user , in cell (denoted as user hereafter),
(1) 
where ^{1}^{1}1Notation:
we use bold uppercase letters to denote matrices, and bold lowercase denote vectors. For a given matrix
, we define as its trace, as its Frobenius norm, as its determinant, as its conjugate transpose, and as . In addition, denotes its th column, columns to , element in , the eigenvalue of a Hermitian matrix (assuming the eigenvalues are sorted in decreasing order), and denotes thedominant eigenvectors of
. Furthermore, (resp. ) implies that is positive definite (resp. positive semidefinite). Finally, denotes the identity matrix, , and . is the linear transmit filter for user , is the linear receive filter for user , and the MIMO channel from user , to BS assumed to be blockfading.^{2}^{2}2The model/results assume and for simplicity, and can be easily extended to differ across users and BSs. is the transmit signal of user with unit power symbols, and is the AWGM noise at receiver , with .We assume simple decoding, i.e., treating interference as noise , without successive interference cancellation. Then, the achievable rate of user is given by,
(2) 
where and are the desired signal and interferenceplusnoise (I+N) covariance matrices for user , at BS , respectively, and are given by,
We let
denote the desired and I+N covariance matrices of user , in the backwark DL network (where is the noise power at receiver ). Moreover, we let be the Cholesky Decomposition of , and as that of .
We aim at maximizing the sumrate, i.e.,
(3) 
While distributed multiuser multicell optimization generally entails a sumpower constraint on the users of a cell (e.g. WMMSE [9]), we adopt an equal power allocation among the all users within a cell: The BS power is equally split among all its UL (or DL) users, to simplify the presentation. Note that this does not affect the generality of the results.
Iii Proposed Approach
Sumrate maximization problems, such as , are known to be NPhard [16]. Our proposed approach is based on a tractable lower bound formulation that transfers the sumrate maximization, which is an originally coupled problem, into separable subproblems.
Iiia Problem Formulation
We focus on the interferencelimited regime (in a dense deployment for instance), where we assume
(4) 
Proposition 1.
Proof:
Refer to [15][Appendix B] ∎
In what follows, we shall dub the quantity Difference of LogTrace (DLT). The DLT becomes significant when it is used as an alternative objective of the sumrate objective in . Note that DLT is a lower bound on the sumrate, , and can be written in the following ways:
(7)  
(8) 
The above expressions reveal that DLT makes both the receive filters in (7) and the transmit filters in (8) decoupled, and facilitates the aimed distributed FB implementation. We formulate the maximal DLT (maxDLT) criterion as a surrogate objective to the sumrate maximization in ,
(9) 
It should be noted that although DLT allows distributed FB implementation, the problem in is not directly solvable since it is nonconvex due to the coupling between the transmit and receive filters, and the quadratic equality constraints.
IiiB Proposed Algorithm
We underline that a coupled optimization like can ideally be handled by a Block Coordinate Descent (BCD) approach. This means if the superscript is adopted to denote the iteration number, problem is decomposed into a sequence of subproblems that are solved via FB iterations as below
(10) 
for . As seen from (7), at each iteration , given the fixed , problem (J1) is decouples in the receive filters , yielding
(11) 
Likewise, if the receive filters are fixed, problem (J2) decouples, as seen from (8), in the transmit filters, resulting in
(12) 
As aforementioned, the feasible sets of (11) and (12) are nonconvex. Nevertheless, we show in the result below, that their globally optimal solutions can still be found.
Lemma 1.
Nonhomogeneous Waterfilling.
Consider the following problem,
(13) 
where and . Let be the Cholesky factorization of , and , and define the following, , , . Then the optimal solution for (13) is,
(14) 
where (diagonal) is the optimal power allocation. Moreover, optimal power allocation in is,
(15) 
where is the unique root to , on the interval , and is monotonically decreasing on that interval.
Proof:
Refer to [15][Appendix C] ∎
With Lemma 1, the optimal transmit and receive filter updates are formulated as below
(16) 
where and are the optimal power allocation, given in Lemma 1. Denoting by the predefined number of FB iterations, the maxDLT algorithm is in Algorithm 1.
Discussions
Based on the generalized eigenvalue analysis, we have that are also the eigenvalues of . This means can be viewed as a (quasi)SINR measure of each data stream. The proposed method in (15) allocates no power to streams that have lowSINR, since tends to zero as . Moreover, as seen from (15), models the price of activating each of the streams, mimicking the original waterfilling principle. The difference however is that (14) fills the power level based on the SINR and cost for the stream activation, namely the nonhomogeneous waterfilling solution. This readily enables the algorithm to not allocate power to some low SINR streams. Finally, since the global optimizer is found at each iteration in Algorithm 1, we can conclude that in is monotonically decreasing with , and converges to a stationary point of the DLT bound. While the ‘streamcontrol’ greatly speeds up the convergence, it evidently raises fairness issues, as some users/streams with lowSINR, may not get served. This can be remedied by introducing user weights in , with minor modifications in the problem/solutions.
Iv Practical Aspects
Iva Comparisons
Our approach is applicable to other communication scenarios such as the MIMO Interfering Broadcast Channel (MIMO IBC), the MIMO Interference Channel (MIMO IFC). We benchmark our algorithms against widely adopted ones,

maxSINR [7] in the MIMO IMAC / MIMO IFC / MIMO IBC

Uncoordinated (Eigenbeamforming): each transmit (resp. receive) filter uses right (resp. let) singular eigenvectors of the desired channel
We also include relevant fastconverging algorithms,

CCPWMMSE [12]: an accelerated version of WMMSE algorithm for the MIMO IMAC

IWU [14]: a fastconvergent leakage minimization algorithm for the MIMO IFC

AIMS: our previously proposed generalization of maxSINR [14], for MIMO IMAC / MIMO IFC / MIMO IBC
Algorithms such as IWU and CCPWMMSE use socalled turbo iterations, where innerloop iterations are performed within each FB iteration. Unlike IWU where the turbo iterations are done at the BS/user (i.e., offline), these iterations are carried overtheair for CCPWMMSE.
IvB Communication Overhead
The operation of the proposed scheme hinges on each transmitter and receiver’s having knowledge of effective channels, for the desired and interfering links. We note that investigating different mechanisms for the distributed acquisition of CSI is outside the scope of the current work (we refer the reader to [18]). However, we have outlined a simple mechanism that goes handinhand with FB iterations, in Fig. I. We recall that FB iterations are carried out.
Pilots  Estim. cov.  Optimize  Pilots  Estim. cov.  Optimize  Data 
matrices  receive  matrices  transmit  
at receiver  filter  at transmitter  filter 
It becomes clear that each FB iteration has an associated communication overhead. While total overhead comprises of bidirectional transmission of pilots, synchronization, frequency offset calibration, etc, it is dominated by the pilot overhead, if the case of cellular coordination [19]. Thus, we can safely approximate the communication overhead by the total number of pilot symbols, for channel estimation, after FB iterations. In conventional coordination, it is typical to assume until convergence, even for small systems [10]. Moreover, this number increases with more BSs, cells and transmit/receive antennas, all of which are prevalent in sub GHz systems. This limitation is compounded by the naturally lower coherence time of mmWave channels, thus further restricting the possible number of FB iteration (before the channel changes). Indeed, simple calculations reveal that conventional algorithms would fail in these systems, as the overhead would destroy the sumrate gains from coordination. Thus, we aggressively limit the number of FB iteration to , thereby resulting in a drastic tenfold reduction in the communication overhead.
For simplicity, we additionally assume that the minimal number of orthogonal pilots is used, i.e. pilot symbols for each UL/DL effective channel, resulting in a total of orthogonal pilots for each UL/DL phase. The total overhead for maxDLT, in the number of channel uses (c.u.), is given by,
The overhead is the same for schemes such as maxSINR, IWU and MMSE. Similar calculations can be made to estimate the overhead of CCPWMMSE and WMMSE (in c.u.),
where denotes the number of turbo iterations. These simple calculation reveal that the overhead for WMMSE and CCPWMMSE is significantly higher than that of maxDLT. Furthermore, the turbo iteration in CCPWMMSE (outlined in Sec IVA) is carried overtheair, and thus induces a massively higher overhead, compared to other schemes. We include the overhead of these algorithms in the simulation results.
IvC Complexity
We can approximate the computational complexity of maxDLT, by noticing that it is dominated by the complexity of the Cholesky Decomposition of the I+N covariance matrix, , and that of Eigenvalue Decomposition of , ,
One can verify that the above also holds for maxSINR, IWU, MMSE, and WMMSE. Unlike other methods, the acceleration does not require gradient/Hessian, and thus comes at a negligible added computational cost, compared to conventional algorithms. However, each turbo iteration for CCPWMMSE involves running a series of semidefinite programs (using interior point solvers), which render the algorithm very costly.
V Numerical Results
Va Performance in Sub6 GHz systems
We start with presenting results for conventional multicell multiuser MIMO, to illustrate desired features of maxDLT. We refer the reader to [15] for a detailed discussion of the simulation setup.
VA1 Singleuser Multicell MIMO Uplink
We start with a widely used coordination test case, a MIMO IFC with where the set for all algorithms. We include WMMSE results for , and FB iterations (as an upper bound). Fig 3 reveals that while maxDLT and WMMSE (with ) have similar performance in the low and mediumSNR range, this gap increases sharply as the SNR increases. This is in spite of twofold increase in communication overhead for WMMSE. Moreover, the proposed scheme yields better sumrate performance than all benchmarks, with this gap becoming significant in the highSNR: as the following results will show, the gap increases further with more users, antennas, and BS/cells, under a low number of FB iterations.
VA2 Multiuser Multicell MIMO uplink
Moving on to a larger setup with (MIMO IMAC), we benchmark maxDLT against the fastconverging CCPWMMSE (Sec. IVA), by varying the number of turbo iterations , for CCPWMMSE. Fig. 3 clearly exhibits the fast converging nature of maxDLT, that achieves of its final performance, after just iterations. In the low overhead regime (for ), maxDLT outperforms CCPWMMSE (for ), although the overhead of the latter is twice that of former. While additional turbo iterations improve slightly the CCPWMMSE performance, the overhead increases linearly with , e.g., the overhead of CCPWMMSE with is threefold that of maxDLT. Achieving the nominal performance of CCPWMMSE relies on convergence of the turbo iteration, which implies a (possibly arbitrary) large number of turbo iteration. This results in (potentially) ordersofmagnitude higher overhead/complexity (e.g. CCPWMMSE with ). Despite its fastconverging nature, CCPWMMSE is clearly illsuited for the systems studied here.
VB Performance in Dense mmWave Deployments
Following recent measurements in the GHz band [20], we consider a dense urban mmWave setting. The full parametrization is detailed in [15][Sec. VIC].
VB1 Dense Multiuser Multicell uplink
We consider a dense UL system with , where the average SNR (across users) is set to dB. Fig. 5 reveals that maxDLT offers significantly better sumrate, than all benchmarks. Interestingly, maxDLT (with ) provides a threefold increase in sumrate with respect to the uncoordinated scheme, however, with a similar overhead.
VB2 Dense Multiuser Multicell downlink
We next consider DL scenario with , while setting the average SNR to dB, and following the above simulation method. The fastconverging nature of maxDLT is embodied in Fig. 5, where most the performance is delivered in just FB iterations: this is due to inherent stream control feature, that allows poor quality streams to be shut down, thus converging quickly to a good sumrate. Note that maxDLT assumes equal power allocation for users in each cell. In contrast, WMMSE performs power allocation for users in each cell, as part of the algorithm. Despite this unfavorable setup for maxDLT, we observe a large sumrate gain compared to WMMSE, while resulting in a decrease in overhead. Evidently, the sumrate for WMMSE will exceed that of maxDLT, as increases (with a huge overhead).
VC Discussions
We note the significant gap between the proposed scheme and the benchmarks, may be attributed to the fastconverging nature of the maxDLT, which is in turn due to the inherent streamcontrol mechanism of the non homogeneous waterfilling solution. Moreover, the drastically limited number of FB iterations limits the performance of conventional algorithms, due to significant levels of residual interference. As seen in Figs. 5 and 5, that uncoordinated transmission performs extremely poorly: maxDLT provides a threefold sumrate improvement over uncoordinated scheme, with a similar communication overhead. This provides a clear answer that lowoverhead coordination is a crucial, to achieving huge sumrate improvements in a dense multicell GHz mmWave system. This also implies that the same conclusions hold for sub GHz systems, which are naturally more sensitive to inference.
Vi Conclusions
We have proposed a lowoverhead algorithm for coordination, in dense multicell sub GHz systems. The DLT bound  a lower bound on the sumrate, was derived and its tightness was investigated. Moreover, we have proposed a distributed optimization algorithm (maxDLT), and showed its convergence to a stationary point of the DLT bound. The nonhomogeneous waterfilling was derived as a solution to the optimal BS/user filter update, and its ability to turnoff lowSINR streams was underlined. We have tied this to the fastconvergence of the algorithm, thus enabling a tenfold reduction in communication overhead (over conventional coordination). Our numerical results have showed that lowoverhead coordination offers huge gains, in dense sub GHz systems.
References
 [1] J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanly, A. Lozano, A. C. K. Soong, and J. C. Zhang, “What will 5g be?,” IEEE Journal on Selected Areas in Communications, vol. 32, pp. 1065–1082, June 2014.
 [2] M. K. Samimi and T. S. Rappaport, “Characterization of the 28 GHz millimeterwave dense urban channel for future 5G mobile cellular,” March 2014.
 [3] S. Hur, T. Kim, D. Love, J. Krogmeier, T. Thomas, and A. Ghosh, “Millimeter wave beamforming for wireless backhaul and access in small cell networks,” IEEE Transactions on Communications,, vol. 61, pp. 4391–4403, October 2013.
 [4] E. J. Violette, R. H. Espeland, R. O. DeBolt, and F. K. Schwering, “Millimeterwave propagation at street level in an urban environment,” IEEE Transactions on Geoscience and Remote Sensing, vol. 26, pp. 368–380, May 1988.
 [5] H. ShokriGhadikolaei, C. Fischione, G. Fodor, P. Popovski, and M. Zorzi, “Millimeter wave cellular networks: A MAC layer perspective,” IEEE Transactions on Communications, vol. 63, pp. 3437–3458, Oct 2015.
 [6] METIS D6.2, “Initial report on horizontal topics, first results and 5G system concept,” March 2014.
 [7] K. Gomadam, V. R. Cadambe, and S. A. Jafar, “A distributed numerical approach to interference alignment and applications to wireless interference networks,” IEEE Transactions on Information Theory, vol. 57, pp. 3309–3322, June 2011.
 [8] D. Schmidt, C. Shi, R. Berry, M. Honig, and W. Utschick, “Minimum mean squared error interference alignment,” in 2009 Conference Record of the FortyThird Asilomar Conference on Signals, Systems and Computers, pp. 1106 –1110, Nov. 2009.
 [9] Q. Shi, M. Razaviyayn, Z.Q. Luo, and C. He, “An iteratively weighted MMSE approach to distributed sumutility maximization for a MIMO interfering broadcast channel,” IEEE Transactions on Signal Processing, vol. 59, no. 9, pp. 4331–4340, 2011.
 [10] D. Schmidt, C. Shi, R. Berry, M. Honig, and W. Utschick, “Comparison of distributed beamforming algorithms for MIMO interference networks,” IEEE Transactions on Signal Processing, vol. 61, pp. 3476–3489, July 2013.
 [11] P. Komulainen, A. Tölli, and M. Juntti, “Effective CSI signaling and decentralized beam coordination in TDD multicell MIMO systems,” IEEE Transactions on Signal Processing, vol. 61, pp. 2204–2218, May 2013.
 [12] D. H. N. Nguyen and T. LeNgoc, “Sumrate maximization in the multicell MIMO multipleaccess channel with interference coordination,” IEEE Transactions on Wireless Communications, vol. 13, pp. 36–48, January 2014.
 [13] R. Brandt and M. Bengtsson, “Fastconvergent distributed coordinated precoding for TDD multicell MIMO systems,” in IEEE 6th International Workshop on Computational Advances in MultiSensor Adaptive Processing (CAMSAP), pp. 457–460, Dec 2015.
 [14] H. Ghauch, T. Kim, M. Bengtsson, and M. Skoglund, “Distributed lowoverhead schemes for multistream MIMO interference channels,” IEEE Transactions on Signal Processing, vol. 63, pp. 1737–1749, April 2015.
 [15] H. Ghauch, T. Kim, M. Bengtsson, and M. Skoglund, “Sumrate maximization in sub28ghz millimeterwave mimo interfering networks,” IEEE Journal on Selected Areas in Communications, vol. 35, pp. 1649–1662, July 2017.

[16]
M. Razaviyayn, G. Lyubeznik, and Z.Q. Luo, “On the degrees of freedom achievable through interference alignment in a MIMO interference channel,” in
Signal Processing Advances in Wireless Communications (SPAWC), 2011 IEEE 12th International Workshop on, pp. 511–515, June 2011.  [17] S. W. Peters and R. W. Heath, “Cooperative algorithms for MIMO interference channels,” IEEE Transactions on Vehicular Technology, vol. 60, pp. 206–218, Jan. 2011.
 [18] R. Brandt and M. Bengtsson, “Distributed CSI acquisition and coordinated precoding for TDD multicell MIMO systems,” IEEE Transactions on Vehicular Technology, vol. PP, no. 99, pp. 1–1, 2015.
 [19] O. El Ayach, A. Lozano, and R. Heath, “On the overhead of interference alignment: Training, feedback, and cooperation,” IEEE Transactions on Wireless Communications, vol. 11, no. 11, pp. 4192–4203, 2012.
 [20] G. R. MacCartney, M. K. Samimi, and T. S. Rappaport, “Omnidirectional path loss models in new york city at 28 GHz and 73 GHz,” in 2014 IEEE 25th Annual International Symposium on Personal, Indoor, and Mobile Radio Communication (PIMRC), pp. 227–231, Sept 2014.