To address the exponentially increasing demand in 5G systems, communications in the millimeter-wave (mmWave) band are among the most promising candidates , due to the large mmWave spectrum. While most investigations of mmWave communcation have been focused on systems above GHz, in the current work, we study multi-user multi-cell coordination, in sub- GHz systems, e.g., X-band (-) GHz, Ku-band (-) GHz, and GHz in the Ka band. These systems are characterized by a relatively large antenna spacing (compared to systems beyond GHz), thereby implying that tens (rather than hundreds) of antennas can be fitted on transmitters/receivers. Thus, the urge for hybrid analog-digital precoding is not stringent and fully digital precoding/combining is preferred. Moreover, propagation channels are still dominated by Rayleigh/Rician components in non line-of-sight environments 
. As a result, conventional pilot-based channel estimation techniques are more efficient than beam alignment/sounding.
The implication of highly directional wireless links at GHz (and above) is that they are almost interference-free. However, in sub- GHz systems, channels are less sparse (in terms of eigenmodes), and beamforming has relatively lower directivity than systems in the higher bands. This is attributed to the presence of significant multi-path components, in urban propagation (confirmed by narrowband/wideband channel measurements in the GHz, GHz, and GHz bands ).
Consequently, interference may still be a limiting factor in these systems, especially when considering dense multi-cell scenarios, where interference management and coordination are still beneficial. Coordination in multi-user multi-cell networks generally refers to the exchanges of information among base stations, to increase the network sum-rate. While these aspects have been investigated at the MAC layer , they are still essentially unaddressed at the physical layer. Indeed, the benefits/costs of coordination in sub- GHz systems is still an open problem, especially in the case of ultra dense networks - believed to be pervasive in future networks : In these scenarios, ignoring interference for cell-edge users may be a limiting factor on the sum-rate.
In the context of multi-user multi-cell networks, coordination is done using the framework of Forward-Backward (F-B) iterations: this over-the-air training leverages local Channel State Information (CSI) at each Base Station (BS) and user, to iteratively optimize the filter at each BS/user, in a fully distributed manner. This framework has been at the heart of most distributed coordination algorithms, such as interference leakage minimization , max-SINR , minimum mean-squared error , and (weighted) sum-rate maximization .
Unfortunately, these conventional schemes suffer from extremely elevated overhead, as they require hundreds/thousands of F-B iterations before convergence , where the latter increases with the number of BSs, users and transmit/receive antennas . Moreover, mmWave systems will have a larger number of BSs/cells per unit-area (due to their inherent short range), and require a much larger number of BS/user antennas (to mitigate pathloss through array gain), compared to sub- GHz systems. Consequently, this shortcoming severely impairs the applicability of conventional coordination, to the systems in question. This limitation is reinforced by the lower coherence time of mmWave channels. Despite bearing direct relevance to conventional sub- GHz, this major limitation has only been addressed in a few recent works [11, 12, 13, 14], and remains essentially unexplored.
In this work, we design low-overhead distributed coordination algorithms, constrained to operate in a just a few F-B iterations for increasing dimensions of the networks. In the algorithm design, we further aim at a tenfold reduction in the communication overhead of conventional algorithms. We derive and optimize a lower bound the sum-rate maximization problem in MIMO Interfering Multiple-Access Channels (MIMO IMAC), that we dub Difference of Log and trace (DLT). Unlike the sum-rate, when combined with alternating optimization methods, the DLT expression results in subproblems that are distributed (only requiring local CSI at each BS and user) . Despite their non-convexity, we derive the optimal solution to each of these subproblems, that we dub non-homogeneous waterfilling (a variation on the classical waterfilling). This solution turns-off data streams with low-SINR, and allocates power to streams with high SNR. The built-in “stream-control” is key to achieving the tenfold increase in convergence speed. Moreover, we show that the devised algorithm converges to a locally optimal solution of the DLT bound. It is revealed that the proposed fast-converging algorithm offers large sum-rate gains, compared to many standard and fast-converging benchmarks. Coordination is still a vital aspect of these systems, and offers major performance gains over uncoordinated transmission While the approach is developed for the MIMO IMAC (i.e., uplink communication), the methods/results are applicable to the downlink, and all its special cases.
Ii System Model
The proposed algorithms operate within the framework of F-B iterations/training by exploiting the uplink (UL) and downlink (DL) channel reciprocity. The proposed scheme is designed to operate in the low-overhead regime, by restricting the number of F-B iterations, , to (a tenfold reduction in communication overhead, over conventional coordination algorithms). We assume that each BS and user posses local CSI only, that is assumed to be perfect (i.e., no CSI errors).
We consider a multi-user multi-cell setting, with cells/BSs, serving users each. In the MIMO IMAC case, transmitters at users and receivers are BSs. Each transmitter and receiver have and antennas, respectively, and communicate data streams. Let be the set of BSs, the set of users served by BS , and the total set of users. Moreover, we denote by user the th user (), in the th cell (). The recovered signal for user , in cell (denoted as user hereafter),
where 111Notation: we use bold upper-case letters to denote matrices, and bold lower-case denote vectors. For a given matrix dominant eigenvectors of
we use bold upper-case letters to denote matrices, and bold lower-case denote vectors. For a given matrix, we define as its trace, as its Frobenius norm, as its determinant, as its conjugate transpose, and as . In addition, denotes its th column, columns to , element in , the eigenvalue of a Hermitian matrix (assuming the eigenvalues are sorted in decreasing order), and denotes the
dominant eigenvectors of. Furthermore, (resp. ) implies that is positive definite (resp. positive semi-definite). Finally, denotes the identity matrix, , and . is the linear transmit filter for user , is the linear receive filter for user , and the MIMO channel from user , to BS assumed to be block-fading.222The model/results assume and for simplicity, and can be easily extended to differ across users and BSs. is the transmit signal of user with unit power symbols, and is the AWGM noise at receiver , with .
We assume simple decoding, i.e., treating interference as noise , without successive interference cancellation. Then, the achievable rate of user is given by,
where and are the desired signal and interference-plus-noise (I+N) covariance matrices for user , at BS , respectively, and are given by,
denote the desired and I+N covariance matrices of user , in the backwark DL network (where is the noise power at receiver ). Moreover, we let be the Cholesky Decomposition of , and as that of .
We aim at maximizing the sum-rate, i.e.,
While distributed multi-user multi-cell optimization generally entails a sum-power constraint on the users of a cell (e.g. W-MMSE ), we adopt an equal power allocation among the all users within a cell: The BS power is equally split among all its UL (or DL) users, to simplify the presentation. Note that this does not affect the generality of the results.
Iii Proposed Approach
Sum-rate maximization problems, such as , are known to be NP-hard . Our proposed approach is based on a tractable lower bound formulation that transfers the sum-rate maximization, which is an originally coupled problem, into separable subproblems.
Iii-a Problem Formulation
We focus on the interference-limited regime (in a dense deployment for instance), where we assume
Refer to [Appendix B] ∎
In what follows, we shall dub the quantity Difference of Log-Trace (DLT). The DLT becomes significant when it is used as an alternative objective of the sum-rate objective in . Note that DLT is a lower bound on the sum-rate, , and can be written in the following ways:
The above expressions reveal that DLT makes both the receive filters in (7) and the transmit filters in (8) decoupled, and facilitates the aimed distributed F-B implementation. We formulate the maximal DLT (max-DLT) criterion as a surrogate objective to the sum-rate maximization in ,
It should be noted that although DLT allows distributed F-B implementation, the problem in is not directly solvable since it is non-convex due to the coupling between the transmit and receive filters, and the quadratic equality constraints.
Iii-B Proposed Algorithm
We underline that a coupled optimization like can ideally be handled by a Block Coordinate Descent (BCD) approach. This means if the superscript is adopted to denote the iteration number, problem is decomposed into a sequence of subproblems that are solved via F-B iterations as below
for . As seen from (7), at each iteration , given the fixed , problem (J1) is decouples in the receive filters , yielding
Likewise, if the receive filters are fixed, problem (J2) decouples, as seen from (8), in the transmit filters, resulting in
Consider the following problem,
where and . Let be the Cholesky factorization of , and , and define the following, , , . Then the optimal solution for (13) is,
where (diagonal) is the optimal power allocation. Moreover, optimal power allocation in is,
where is the unique root to , on the interval , and is monotonically decreasing on that interval.
Refer to [Appendix C] ∎
With Lemma 1, the optimal transmit and receive filter updates are formulated as below
Based on the generalized eigenvalue analysis, we have that are also the eigenvalues of . This means can be viewed as a (quasi)-SINR measure of each data stream. The proposed method in (15) allocates no power to streams that have low-SINR, since tends to zero as . Moreover, as seen from (15), models the price of activating each of the streams, mimicking the original waterfilling principle. The difference however is that (14) fills the power level based on the SINR and cost for the stream activation, namely the non-homogeneous waterfilling solution. This readily enables the algorithm to not allocate power to some low SINR streams. Finally, since the global optimizer is found at each iteration in Algorithm 1, we can conclude that in is monotonically decreasing with , and converges to a stationary point of the DLT bound. While the ‘stream-control’ greatly speeds up the convergence, it evidently raises fairness issues, as some users/streams with low-SINR, may not get served. This can be remedied by introducing user weights in , with minor modifications in the problem/solutions.
Iv Practical Aspects
Our approach is applicable to other communication scenarios such as the MIMO Interfering Broadcast Channel (MIMO IBC), the MIMO Interference Channel (MIMO IFC). We benchmark our algorithms against widely adopted ones,
max-SINR  in the MIMO IMAC / MIMO IFC / MIMO IBC
Uncoordinated (Eigen-beamforming): each transmit (resp. receive) filter uses right (resp. let) singular eigenvectors of the desired channel
We also include relevant fast-converging algorithms,
CCP-WMMSE : an accelerated version of WMMSE algorithm for the MIMO IMAC
IWU : a fast-convergent leakage minimization algorithm for the MIMO IFC
AIMS: our previously proposed generalization of max-SINR , for MIMO IMAC / MIMO IFC / MIMO IBC
Algorithms such as IWU and CCP-WMMSE use so-called turbo iterations, where inner-loop iterations are performed within each F-B iteration. Unlike IWU where the turbo iterations are done at the BS/user (i.e., offline), these iterations are carried over-the-air for CCP-WMMSE.
Iv-B Communication Overhead
The operation of the proposed scheme hinges on each transmitter and receiver’s having knowledge of effective channels, for the desired and interfering links. We note that investigating different mechanisms for the distributed acquisition of CSI is outside the scope of the current work (we refer the reader to ). However, we have outlined a simple mechanism that goes hand-in-hand with F-B iterations, in Fig. I. We recall that F-B iterations are carried out.
|Pilots||Estim. cov.||Optimize||Pilots||Estim. cov.||Optimize||Data|
|at receiver||filter||at transmitter||filter|
It becomes clear that each F-B iteration has an associated communication overhead. While total overhead comprises of bidirectional transmission of pilots, synchronization, frequency offset calibration, etc, it is dominated by the pilot overhead, if the case of cellular coordination . Thus, we can safely approximate the communication overhead by the total number of pilot symbols, for channel estimation, after F-B iterations. In conventional coordination, it is typical to assume until convergence, even for small systems . Moreover, this number increases with more BSs, cells and transmit/receive antennas, all of which are prevalent in sub- GHz systems. This limitation is compounded by the naturally lower coherence time of mmWave channels, thus further restricting the possible number of F-B iteration (before the channel changes). Indeed, simple calculations reveal that conventional algorithms would fail in these systems, as the overhead would destroy the sum-rate gains from coordination. Thus, we aggressively limit the number of F-B iteration to , thereby resulting in a drastic tenfold reduction in the communication overhead.
For simplicity, we additionally assume that the minimal number of orthogonal pilots is used, i.e. pilot symbols for each UL/DL effective channel, resulting in a total of orthogonal pilots for each UL/DL phase. The total overhead for max-DLT, in the number of channel uses (c.u.), is given by,
The overhead is the same for schemes such as max-SINR, IWU and MMSE. Similar calculations can be made to estimate the overhead of CCP-WMMSE and WMMSE (in c.u.),
where denotes the number of turbo iterations. These simple calculation reveal that the overhead for W-MMSE and CCP-WMMSE is significantly higher than that of max-DLT. Furthermore, the turbo iteration in CCP-WMMSE (outlined in Sec IV-A) is carried over-the-air, and thus induces a massively higher overhead, compared to other schemes. We include the overhead of these algorithms in the simulation results.
We can approximate the computational complexity of max-DLT, by noticing that it is dominated by the complexity of the Cholesky Decomposition of the I+N covariance matrix, , and that of Eigenvalue Decomposition of , ,
One can verify that the above also holds for max-SINR, IWU, MMSE, and WMMSE. Unlike other methods, the acceleration does not require gradient/Hessian, and thus comes at a negligible added computational cost, compared to conventional algorithms. However, each turbo iteration for CCP-WMMSE involves running a series of semidefinite programs (using interior point solvers), which render the algorithm very costly.
V Numerical Results
V-a Performance in Sub-6 GHz systems
We start with presenting results for conventional multi-cell multi-user MIMO, to illustrate desired features of max-DLT. We refer the reader to  for a detailed discussion of the simulation setup.
V-A1 Single-user Multi-cell MIMO Uplink
We start with a widely used coordination test case, a MIMO IFC with where the set for all algorithms. We include W-MMSE results for , and F-B iterations (as an upper bound). Fig 3 reveals that while max-DLT and W-MMSE (with ) have similar performance in the low- and medium-SNR range, this gap increases sharply as the SNR increases. This is in spite of two-fold increase in communication overhead for WMMSE. Moreover, the proposed scheme yields better sum-rate performance than all benchmarks, with this gap becoming significant in the high-SNR: as the following results will show, the gap increases further with more users, antennas, and BS/cells, under a low number of F-B iterations.
V-A2 Multi-user Multi-cell MIMO uplink
Moving on to a larger setup with (MIMO IMAC), we benchmark max-DLT against the fast-converging CCP-WMMSE (Sec. IV-A), by varying the number of turbo iterations , for CCP-WMMSE. Fig. 3 clearly exhibits the fast converging nature of max-DLT, that achieves of its final performance, after just iterations. In the low overhead regime (for ), max-DLT outperforms CCP-WMMSE (for ), although the overhead of the latter is twice that of former. While additional turbo iterations improve slightly the CCP-WMMSE performance, the overhead increases linearly with , e.g., the overhead of CCP-WMMSE with is threefold that of max-DLT. Achieving the nominal performance of CCP-WMMSE relies on convergence of the turbo iteration, which implies a (possibly arbitrary) large number of turbo iteration. This results in (potentially) orders-of-magnitude higher overhead/complexity (e.g. CCP-WMMSE with ). Despite its fast-converging nature, CCP-WMMSE is clearly ill-suited for the systems studied here.
V-B Performance in Dense mmWave Deployments
V-B1 Dense Multi-user Multi-cell uplink
We consider a dense UL system with , where the average SNR (across users) is set to dB. Fig. 5 reveals that max-DLT offers significantly better sum-rate, than all benchmarks. Interestingly, max-DLT (with ) provides a threefold increase in sum-rate with respect to the uncoordinated scheme, however, with a similar overhead.
V-B2 Dense Multi-user Multi-cell downlink
We next consider DL scenario with , while setting the average SNR to dB, and following the above simulation method. The fast-converging nature of max-DLT is embodied in Fig. 5, where most the performance is delivered in just F-B iterations: this is due to inherent stream control feature, that allows poor quality streams to be shut down, thus converging quickly to a good sum-rate. Note that max-DLT assumes equal power allocation for users in each cell. In contrast, WMMSE performs power allocation for users in each cell, as part of the algorithm. Despite this unfavorable setup for max-DLT, we observe a large sum-rate gain compared to WMMSE, while resulting in a decrease in overhead. Evidently, the sum-rate for WMMSE will exceed that of max-DLT, as increases (with a huge overhead).
We note the significant gap between the proposed scheme and the benchmarks, may be attributed to the fast-converging nature of the max-DLT, which is in turn due to the inherent stream-control mechanism of the non homogeneous waterfilling solution. Moreover, the drastically limited number of F-B iterations limits the performance of conventional algorithms, due to significant levels of residual interference. As seen in Figs. 5 and 5, that uncoordinated transmission performs extremely poorly: max-DLT provides a threefold sum-rate improvement over uncoordinated scheme, with a similar communication overhead. This provides a clear answer that low-overhead coordination is a crucial, to achieving huge sum-rate improvements in a dense multi-cell GHz mmWave system. This also implies that the same conclusions hold for sub- GHz systems, which are naturally more sensitive to inference.
We have proposed a low-overhead algorithm for coordination, in dense multi-cell sub- GHz systems. The DLT bound - a lower bound on the sum-rate, was derived and its tightness was investigated. Moreover, we have proposed a distributed optimization algorithm (max-DLT), and showed its convergence to a stationary point of the DLT bound. The non-homogeneous waterfilling was derived as a solution to the optimal BS/user filter update, and its ability to turn-off low-SINR streams was underlined. We have tied this to the fast-convergence of the algorithm, thus enabling a tenfold reduction in communication overhead (over conventional coordination). Our numerical results have showed that low-overhead coordination offers huge gains, in dense sub- GHz systems.
-  J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanly, A. Lozano, A. C. K. Soong, and J. C. Zhang, “What will 5g be?,” IEEE Journal on Selected Areas in Communications, vol. 32, pp. 1065–1082, June 2014.
-  M. K. Samimi and T. S. Rappaport, “Characterization of the 28 GHz millimeter-wave dense urban channel for future 5G mobile cellular,” March 2014.
-  S. Hur, T. Kim, D. Love, J. Krogmeier, T. Thomas, and A. Ghosh, “Millimeter wave beamforming for wireless backhaul and access in small cell networks,” IEEE Transactions on Communications,, vol. 61, pp. 4391–4403, October 2013.
-  E. J. Violette, R. H. Espeland, R. O. DeBolt, and F. K. Schwering, “Millimeter-wave propagation at street level in an urban environment,” IEEE Transactions on Geoscience and Remote Sensing, vol. 26, pp. 368–380, May 1988.
-  H. Shokri-Ghadikolaei, C. Fischione, G. Fodor, P. Popovski, and M. Zorzi, “Millimeter wave cellular networks: A MAC layer perspective,” IEEE Transactions on Communications, vol. 63, pp. 3437–3458, Oct 2015.
-  METIS D6.2, “Initial report on horizontal topics, first results and 5G system concept,” March 2014.
-  K. Gomadam, V. R. Cadambe, and S. A. Jafar, “A distributed numerical approach to interference alignment and applications to wireless interference networks,” IEEE Transactions on Information Theory, vol. 57, pp. 3309–3322, June 2011.
-  D. Schmidt, C. Shi, R. Berry, M. Honig, and W. Utschick, “Minimum mean squared error interference alignment,” in 2009 Conference Record of the Forty-Third Asilomar Conference on Signals, Systems and Computers, pp. 1106 –1110, Nov. 2009.
-  Q. Shi, M. Razaviyayn, Z.-Q. Luo, and C. He, “An iteratively weighted MMSE approach to distributed sum-utility maximization for a MIMO interfering broadcast channel,” IEEE Transactions on Signal Processing, vol. 59, no. 9, pp. 4331–4340, 2011.
-  D. Schmidt, C. Shi, R. Berry, M. Honig, and W. Utschick, “Comparison of distributed beamforming algorithms for MIMO interference networks,” IEEE Transactions on Signal Processing, vol. 61, pp. 3476–3489, July 2013.
-  P. Komulainen, A. Tölli, and M. Juntti, “Effective CSI signaling and decentralized beam coordination in TDD multi-cell MIMO systems,” IEEE Transactions on Signal Processing, vol. 61, pp. 2204–2218, May 2013.
-  D. H. N. Nguyen and T. Le-Ngoc, “Sum-rate maximization in the multicell MIMO multiple-access channel with interference coordination,” IEEE Transactions on Wireless Communications, vol. 13, pp. 36–48, January 2014.
-  R. Brandt and M. Bengtsson, “Fast-convergent distributed coordinated precoding for TDD multicell MIMO systems,” in IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), pp. 457–460, Dec 2015.
-  H. Ghauch, T. Kim, M. Bengtsson, and M. Skoglund, “Distributed low-overhead schemes for multi-stream MIMO interference channels,” IEEE Transactions on Signal Processing, vol. 63, pp. 1737–1749, April 2015.
-  H. Ghauch, T. Kim, M. Bengtsson, and M. Skoglund, “Sum-rate maximization in sub-28-ghz millimeter-wave mimo interfering networks,” IEEE Journal on Selected Areas in Communications, vol. 35, pp. 1649–1662, July 2017.
M. Razaviyayn, G. Lyubeznik, and Z.-Q. Luo, “On the degrees of freedom achievable through interference alignment in a MIMO interference channel,” inSignal Processing Advances in Wireless Communications (SPAWC), 2011 IEEE 12th International Workshop on, pp. 511–515, June 2011.
-  S. W. Peters and R. W. Heath, “Cooperative algorithms for MIMO interference channels,” IEEE Transactions on Vehicular Technology, vol. 60, pp. 206–218, Jan. 2011.
-  R. Brandt and M. Bengtsson, “Distributed CSI acquisition and coordinated precoding for TDD multicell MIMO systems,” IEEE Transactions on Vehicular Technology, vol. PP, no. 99, pp. 1–1, 2015.
-  O. El Ayach, A. Lozano, and R. Heath, “On the overhead of interference alignment: Training, feedback, and cooperation,” IEEE Transactions on Wireless Communications, vol. 11, no. 11, pp. 4192–4203, 2012.
-  G. R. MacCartney, M. K. Samimi, and T. S. Rappaport, “Omnidirectional path loss models in new york city at 28 GHz and 73 GHz,” in 2014 IEEE 25th Annual International Symposium on Personal, Indoor, and Mobile Radio Communication (PIMRC), pp. 227–231, Sept 2014.