# Ultra-Reliable Communication in 5G mmWave Networks: A Risk-Sensitive Approach

In this letter, we investigate the problem of providing gigabit wireless access with reliable communication in 5G millimeter-Wave (mmWave) massive multiple-input multiple-output (MIMO) networks. In contrast to the classical network design based on average metrics, we propose a distributed risk-sensitive reinforcement learning-based framework to jointly optimize the beamwidth and transmit power, while taking into account the sensitivity of mmWave links due to blockage. Numerical results show that our proposed algorithm achieves more than 9 Gbps of user throughput with a guaranteed probability of 90 importantly, there exists a rate-reliability-network density tradeoff, in which as the user density increases from 16 to 96 per km2, the fraction of users that achieve 4 Gbps are reduced by 11.61 baseline models, respectively.

## Authors

• 3 publications
• 141 publications
• 92 publications
• 60 publications
• 44 publications
10/10/2017

### Beam Management for Millimeter Wave Beamspace MU-MIMO Systems

Millimeter wave (mmWave) communication has attracted increasing attentio...
05/19/2021

### User-centric Handover in mmWave Cell-Free Massive MIMO with User Mobility

The coupling between cell-free massive multiple-input multiple-output (M...
07/17/2021

### Reliability and User-Plane Latency Analysis of mmWave Massive MIMO for Grant-Free URLLC Applications

5G cellular networks are designed to support a new range of applications...
07/20/2019

### Power-Consumption Outage Challenge in Next-Generation Cellular Networks

The conventional outage in wireless communication systems is caused by t...
12/14/2020

### Reversing the Curse of Densification in mmWave Networks Through Spatial Multiplexing

The gold standard of a wireless network is that the throughput increases...
10/08/2020

### Hybrid Beamforming in 5G mmWave Networks: a Full-stack Perspective

This paper studies the cross-layer challenges and performance of Hybrid ...
02/06/2018

### Path Selection and Rate Allocation in Self-Backhauled mmWave Networks

We investigate the problem of multi-hop scheduling in self-backhauled mi...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

To enable gigabit wireless access with reliable communication, a number of candidate solutions are currently investigated for G: ) higher frequency spectrum, e.g., millimeter wave (mmWave); ) advanced spectral-efficient techniques, e.g., massive multiple-input multiple-output (MIMO); and ) ultra-dense small cells [1]. This work explores the above techniques to enhance the wireless access [1, 2, 3]. Massive MIMO yields remarkable properties such as high signal-to-interference-plus-noise ratio due to large antenna gains, and extreme spatial multiplexing gain [3, 4]. Specially, mmWave frequency bands offer huge bandwidth [5], while it allows for packing a massive antennas for highly directional beamforming [5]. A unique peculiarity of mmWave is that mmWave links are very sensitive to blockage, which gives rise to unstable connectivity and unreliable communication [5]. To overcome such challenge, we leverage principles of risk-sensitive reinforcement learning (RSL) and exploit the multiple antennas diversity and higher bandwidth to optimize transmission to achieve gigabit data rates, while considering the sensitivity of mmWave links to provide ultra-reliable communication (URC). The prime motivation behind using RSL stems from the fact that the risk-sensitive

utility function to be optimized is a function of not only the average but also the variance

[6]

, and thus it captures the tail of rate distribution to enable URC. While our proposed algorithm is fully distributed, which does not require full network observation, and thus the cost of channel estimation and signaling synchronization is reduced. Via numerical experiments, we showcase the inherently key trade-offs between (

) reliability/data rates and network density, and () availability and network density.

Related work: In [7, 8] authors provided the principles of ultra-reliable and low latency communication (URLLC) and described some techniques to support URLLC. Recently, the problem of low latency communication [9] and URLLC [10, 11] for G mmWave network was studied to evaluate the performance under the impact of traffic dispersion and network densification. Moreover, a reinforcement learning (RL) approach to power control and rate adaptation was studied in [12]

. All these works focus on maximizing the time average of network throughput or minimizing the mean delay without providing any guarantees for higher order moments (e.g., variance, skewness, kurtosis, etc.). In this work, we depart from the classical average-based system design and instead take into account higher order moments in the utility function to formulate a RSL framework through which every small cell optimizes its transmission while taking into account signal fluctuations.

## Ii System Model

Let us consider a mmWave downlink (DL) transmission of a small cell network consisting of a set of small cells (SCs), and a set of user equipments (UEs) equipped with antennas. We assume that each SC is equipped with a large number of antennas to exploit massive MIMO gain and adopt a hybrid beamforming architecture [13], and assume that . Without loss of generality, one UE per one SC is considered111For the multiple UEs case, addition channel estimation and user scheduling need to be considered, one example was studied in [3].. The data traffic is generated from SC to UE via mmWave communication. A co-channel time-division duplexing protocol is considered, in which the DL channel can be obtained via the uplink training phase.

Each SC adopts the hybrid beamforming architecture, which enjoys both analog and digital beamforming techniques [13]. Let and denote the analog transmitter and receiver beamforming gains at the SC and UE , respectively. In addition, we use and to represent the angles deviating from the strongest path between the SC and UE . Also, let and denote the beamwidth at the SC and UE, respectively. We denote

as a vector of the transmitter beamwidth of all SCs. We adopt the widely used antenna radiation pattern model

[13] to determine the analog beamforming gain as

 gbk(ωbk,θbk) ={2π−(2π−θbk)ηθbk,if|ωbk|≤θbk2,η,otherwise, (1)

where is the side lobe gain.

Let denote the channel state from the SC to UE

. We assume a time-varying channel state described by a Markov chain and there are

states, i.e., for each . Considering imperfect channel state information (CSI), the estimated channel state between the SC and UE is modeled as [10]

 ^hbk=√Nb×NkΘ1/2bk(√1−τ2kwbk+τk^wbk),

where is the spatial channel correlation matrix that accounts for path loss and shadow fading. Here,

is the small-scale fading channel matrix, modeled as a random matrix with zero mean and variance of

. Here reflects the estimation accuracy for UE , if , we assume that perfect channel state information. is the estimated noise vector, also modeled as a random matrix with zero mean and variance of . We denote as the network state.

By applying a linear precoding scheme [13], i.e, for the conjugate precoding, the achievable rate222Note that we omit the beam search/track time, since it can be done in a short time as compared with transmission time [14]. We assume that each BS sends a single stream to its users via the main beams. of UE from SC can be calculated as

 rb(t) =Wlog2⎛⎜⎝1+pbg(tx)bkg(rx)bk|h†bkfbk|2∑b′≠bpb′g(tx)b′kg(rx)b′k|h†b′kfb′k|2+ηbk⎞⎟⎠,

where and are the transmit powers of of SC and SC , respectively. In addition, W denotes the system bandwidth of the mmWave frequency band. The thermal noise of user served by SC is . Here, we denote as the maximum transmit power of SC and as the transmit power vector.

## Iii Problem Formulation

We model a decentralized optimization problem and harness tools from RSL to solve, whereby SCs autonomously respond to the network states based on the historical data. Let us consider a joint optimization of transmitter beamwidth333As studied in [13], for , the problem of selecting beamwidth for the transmitter and receiver can be done by adjusting the transmitter beamwidth with a fixed receiver beamwidth. and transmit power allocation . We denote , which takes values in , where . Assume that each SC

selects its beamwidth and transmit power drawn from a given probability distribution

in which is the cardinality of the set of all combinations , i.e., . For each and the mixed-strategy probability is defined as

 πmb(t)=Pr(zb(t)=zmb|zb(0:t−1),πb(0:t−1)). (2)

We denote , in which is the set of all possible probability mass functions (PMF). Let denote the instantaneous rates, in which . Let denote the rate region, which is defined as the convex hull of the rates [15], i.e., . Inspired by the RSL [6], we consider the following utility function, given by

 ¯ub=1μblogEh,π[exp(μbT∑t=0rb(t))], (3)

where the parameter denotes the desired risk-sensitivity, which will penalize the variability [6] and the operator denotes the expectation operation.

###### Remark 1

The Taylor expansion of the utility function given in (3) yields

 ¯ub≜Eh,π[T∑t=0rb(t)]+μb2Varh,π[T∑t=0rb(t)]+O(μ2b).

Remark 1 basically shows that the utility function (3) considers both mean and variance terms (Var) of the mmWave links. We formulate the following distributed optimization problem for every SC as:

 maxπb 1μblogEh,πb[exp(μbT∑t=0rb(t))] (4a) subject to rb∈R,πb∈Π,pb≤Pmaxb. (4b)

It is challenging to solve (4) if each SC does not have full network observation. This work does not assume an explicit knowledge of the state transition probabilities. Here, we leverage principles of RL to optimize the transmit beam in a totally decentralized manner [6, 12, 16].

## Iv Proposed Algorithm

In Fig. 1 each SC acts as an agent which selects an action to maximize a long-term reward based on user feedback and probability distribution for each action. The action is defined as the selection of , while the long-term utility in (4) is the reward, and the environment here contains the network state. To this end, we build the probability distribution for every action and provide a RL procedure to solve (4).

We denote as a utility function of SC when selecting . Here, denotes the composite variable of other agents’ actions excluding SC . From (3), the utility of SC at time slot , i.e., , is rewritten as

 ub(t)=1μblog(Zb∑m=1πmbexp(μbrmb(zmb(t),z−b))), (5)

where is the instantaneous rate of SC when choosing with probability .

###### Remark 2

For a small (3) is approximated via the Taylor approximation444For a small , the Taylor approximation of is . of around as

 ¯ub = 1μbE[T∑t=0(exp(μbrb(t))−1)], (6) = 1(T+1)T∑t=0exp(μbrb(t))−1μb, (7)

where (7) is obtained by expanding the time average of (6). Each SC determines from based on the probability distribution from the previous stage , i.e.,

 πb(t−1) =(π1b(t−1),⋯,πZbb(t−1)). (8)

We introduce the Boltzmann-Gibbs distribution to capture the exploitation and exploration, , given by

 βmb(ub(t))= argmaxπb∈Π∑m∈zb[πmbumb(t) (9) −κbπmbln(πmb)],

where is the utility vector of SC for , and the trade-off factor is used to balance between exploration and exploitation. If is small, the SC selects with highest payoff. For all decisions have equal chance.

For a given and , we solve (9

) to find the probability distribution, by adopting the notion of logit equilibrium

[16], we have

 βmb(ub(t))=exp(1κb[umb]+)∑m′∈Zbexp(1κb[um′b]+), (10)

where . Finally, we propose two coupled RL processes that run in parallel and allow SCs to decide their optimal strategies at each time instant as follows [16].

Risk-Sensitive Learning procedure: We denote as the estimate utility of SC , in which the estimate utility and probability mass function are updated for each action as follows:

 ⎧⎪⎨⎪⎩^umb(t)=^umb(t−1)+ζb(t)I{zb(t)=zmb}×(ub(t−1)−^umb(t−1)),πmb(t)=πmb(t−1)+ιb(t)(βmb(ub(t))−πmb(t−1)),

where and are the learning rates which satisfy the following conditions (due to space limits please see [16] for convergence proof):

 ⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩limT→∞∑Tt=0ζb(t)=+∞,limT→∞∑Tt=0ιb(t)=+∞.limT→∞∑Tt=0ζ2b(t)=+∞,limT→∞∑tt=0ι2b(t)=+∞.limt→∞ιb(t)ζb(t)=0.

Finally, each SC determines as per (8).

## V Numerical Results

A dense SCs are randomly deployed in a area and we assume one UE per each SC and a fixed user association. We assume that each SC adjusts its beamwidth with a step of radian from the range , where radian and radian denote the minimum and maximum beamwidths of each SC, respectively. The transmit power level set of each SC is dBm and the SC antenna gain is dBi. The number of transmit antennas and receive antennas at the SC and UE are set to and , respectively. The blockage is modeled as a distance-dependent probability state where the channel is either line-of-sight (LOS) or non-LOS for urban environments at GHz and the system bandwidth is GHz [17]. Numerical results are obtained via Monte-Carlo simulations over different random topologies. The risk-sensitive parameter is set to . For the learning algorithm, the trade-off factor is set to , while the learning rates and are set to and , respectively [16]. Furthermore, we compare our proposed RSL scheme with the following baselines:

• Classical Learning (CSL) refers to the RL framework in which the utility function only considers the mean value of mmWave links [16].

• Baseline 1 (BL1) refers to [13] optimizing the beamwidth with maximum transmit power.

In Fig. 3

, we plot the complementary cumulative distribution function (tail distribution - CCDF) of user throughput (UT) at

GHz when the number of SCs is per . The CCDF curves reflect the reliable probability (in both linear and logarithmic scales), defined as the probability that the UT is higher than a target rate Gbps, i.e, Pr. We also study the impact of imperfect CSI with and feedback with noise from UEs. We observe that the performance of our proposed RSL framework is reduced under these impacts. We next compare our proposed RSL method with other baselines with perfect CSI and user feedback. It is observed that the RSL scheme achieves better reliability, Pr, of more than , whereas the baselines CSL and BL1 obtain less than and , respectively. However, at very low rate (less than Gbps) or very high rate ( Gbps) captured by the cross-point, the RSL obtains a lower probability as compared to the baselines. In other words, our proposed solution provides a UT which is more concentrated around its median in order to provide uniformly great service for all users. For instance, the UT distribution of our proposed algorithm has a small variance of , while the CSL has a higher variance of .

### V-a Impact of network density

Fig. 3 reports the impact of network density on the reliability, which is defined as the fraction of UEs who achieve a given target rate , i.e., . Here, the number of SCs is varying from to per . For given target rates of , , and Gbps, our proposed algorithm guarantees higher reliability as compared to the baselines. Moreover, the higher the target rate, the bigger the performance gap between our proposed algorithm and the baselines. A linear increase in network density decreases reliability, for example, when the density increases from to , the fraction of users that achieve Gbps of the RSL, CSL, and BL1 are reduced by , and , respectively. This highlights a key tradeoff between reliability and network density.

In Fig. 5 we show the impact of network density on the availability, which defines how much rate is obtained for a target probability. We plot the and probabilities in which the system achieves a rate of at least Gbps. For a given target probability of , our proposed algorithm guarantees more than Gbps of UT, whereas the baselines guarantee less than Gbps of UT for , while if we lower the target probability to , the achievable rate is increased by . This gives rise to a tradeoff between the reliability and the data rate. In addition, for a given probability, the achievable rate is reduced with the increase in network density. For instance, when the network density increases from to , the achievable rate is reduced by . This highlights the tradeoff between availability and network density.

We numerically observe that is long enough for agents to learn and enjoy the optimal solution. We assume that the channel condition is changed after every . Our proposed algorithm converges faster than the classical learning baseline as shown in Fig. 5. By harnessing the notion of risk-averse, the agents try to find the best strategy subject to the variations of the mmWave rates.

## Vi Conclusions

In this letter, we studied the problem of providing multi-gigabit wireless access with reliable communication by optimizing the transmit beam and considering the link sensitivity in G mmWave networks. A distributed risk-sensitive RL based approach was proposed taking into account both mean and variance values of the mmWave links. Numerical results show that our proposed approach provides better services for all users. For instance, our proposed approach achieves a Pr is higher than , whereas the baselines obtain less than and with small cells.

## References

• [1] J. G. Andrews et al., “What Will 5G Be?” IEEE Journal on Selected Areas in Communications, vol. 32, no. 6, pp. 1065–1082, June 2014.
• [2] A. Anpalagan, M. Bennis, and R. Vannithamby, Design and Deployment of Small Cell Networks.   Cambridge University Press, 2015.
• [3] T. K. Vu et al., “Joint load balancing and interference mitigation in 5G heterogeneous networks,” IEEE Transactions on Wireless Communications, vol. 16, no. 9, pp. 6032–6046, Sep. 2017.
• [4] Y. Wu, R. Schober, D. W. K. Ng, C. Xiao, and G. Caire, “Secure massive MIMO transmission with an active eavesdropper,” IEEE Transactions on Information Theory, vol. 62, no. 7, pp. 3880–3900, 2016.
• [5] T. S. Rappaport et al., “Millimeter wave mobile communications for 5G cellular: It will work!” IEEE Access, vol. 1, pp. 335–349, 2013.
• [6] O. Mihatsch and R. Neuneier, “Risk-sensitive reinforcement learning,” Machine learning, vol. 49, no. 2-3, pp. 267–290, 2002.
• [7] P. Popovski et al., “Wireless access for ultra-reliable low-latency communication (urllc): Principles and building blocks,” submitted to IEEE Network, 2017.
• [8] M. Bennis, M. Debbah, and H. V. Poor, “Ultra-Reliable and Low-Latency Wireless Communication: Tail, Risk and Scale,” submitted to Proceedings of the IEEE, 2018.
• [9] G. Yang, M. Xiao, and H. V. Poor, “Low-latency millimeter-wave communications: Traffic dispersion or network densification?” submitted to IEEE Transactions on Communication, 2017.
• [10] T. K. Vu et al., “Ultra-reliable and low latency communication in mmwave-enabled massive MIMO networks,” IEEE Communications Letters, vol. 21, no. 9, pp. 2041–2044, Sep. 2017.
• [11] ——, “Path Selection and Rate Allocation in Self-Backhauled mmWave Networks,” in Proc. IEEE Int. Conf. on Wireless Communications and Networking Conference (WCNC), Barcelona, Spain, 2018, pp. 1–6.
• [12] E. Ghadimi, F. D. Calabrese, G. Peters, and P. Soldati, “A reinforcement learning approach to power control and rate adaptation in cellular networks,” in 2017 IEEE International Conference on Communications, Paris, France, 2017, pp. 1–7.
• [13] J. Liu and E. S. Bentley, “Hybrid-Beamforming-Based Millimeter-Wave Cellular Network Optimization,” in Proc. 15th IEEE Int. Sym. on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), Paris, France, 2017, pp. 1–8.
• [14] J. Palacios et al., “Tracking mm-Wave Channel Dynamics: Fast Beam Training Strategies under Mobility,” in Proc. 36th Annual IEEE Int. Conf. on Computer Communications (INFOCOM), Atlanta, GA, USA, 2017, pp. 1–9.
• [15] S. Boyd and L. Vandenberghe, Convex optimization.   Cambridge university press, 2004.
• [16] M. Bennis, S. M. Perlaza, P. Blasco, Z. Han, and H. V. Poor, “Self-organization in small cell networks: A reinforcement learning approach,” IEEE Transactions on Wireless Communications, vol. 12, no. 7, pp. 3202–3212, 2013.
• [17] T. Bai, V. Desai, and R. W. Heath, “Millimeter wave cellular channel models for system evaluation,” in 2014 IEEE Int. Conf. on Computing, Networking and Communications (ICNC), Honolulu, HI, USA, 2014, pp. 178–182.