Mobile Cellular-Connected UAVs: Reinforcement Learning for Sky Limits

by   M. Mahdi Azari, et al.

A cellular-connected unmanned aerial vehicle (UAV)faces several key challenges concerning connectivity and energy efficiency. Through a learning-based strategy, we propose a general novel multi-armed bandit (MAB) algorithm to reduce disconnectivity time, handover rate, and energy consumption of UAV by taking into account its time of task completion. By formulating the problem as a function of UAV's velocity, we show how each of these performance indicators (PIs) is improved by adopting a proper range of corresponding learning parameter, e.g. 50 strategy. However, results reveal that the optimal combination of the learning parameters depends critically on any specific application and the weights of PIs on the final objective function.



There are no comments yet.


page 3


Path Design for Cellular-Connected UAV with Reinforcement Learning

This paper studies the path design problem for cellular-connected unmann...

Environmental Hotspot Identification in Limited Time with a UAV Equipped with a Downward-Facing Camera

We are motivated by environmental monitoring tasks where finding the glo...

Optimising Energy Efficiency in UAV-Assisted Networks using Deep Reinforcement Learning

In this letter, we study the energy efficiency (EE) optimisation of unma...

Massive UAV-to-Ground Communication and its Stable Movement Control: A Mean-Field Approach

This paper proposes a real-time movement control algorithm for massive u...

IRS-Aided Energy Efficient UAV Communication

Unmanned aerial vehicles (UAVs) have steadily gained attention to overco...

Simultaneous Navigation and Radio Mapping for Cellular-Connected UAV with Deep Reinforcement Learning

Cellular-connected unmanned aerial vehicle (UAV) is a promising technolo...

Airborne Urban Microcells with Grasping End Effectors: A Game Changer for 6G Networks?

Airborne (or flying) base stations (ABSs) embedded on drones or unmanned...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Unmanned aerial vehicles (UAVs)–so-called drones–have many beneficial applications such as environmental sensing, monitoring, and telecommunications [1]. Critical for these applications is communication technology that can ensure UAV’s connectivity, allowing a safe, reliable, and secure use of drones. In particular, a reliable beyond visual line-of-sight (BVLoS) command and control (C&C) of UAVs is instrumental for enabling UAVs autonomous operation. Cellular connectivity has been proposed as a suitable candidate to serve UAVs’ needs, and the development of this is an active endeavour in academia and industry [2, 3].

Although the use of cellular networks for UAVs communication seems to often be a win-win situation for both cellular and UAV operators [3], several challenges need to be addressed before largely launching such an idea. One important challenge is to provide adequate connectivity time to UAVs with respect to ground terminals, which can be particularly demanding for UAVs flying at high altitudes due to interference [3]. On the other hand, a highly mobile UAV may frequently change its serving cell, and therefore it may trigger subsequent HOs. Since a HO procedure requires additional signaling overhead which causes service interruptions, a high rate of HOs deteriorates the communication link reliability. Accordingly, the UAV’s HO rate needs to be managed properly along its trajectory. Furthermore, UAVs in general are battery-limited, which restricts their operational lifetime [2]. As the energy consumption of UAV remarkably depends on its velocity, the speed of UAV needs to be optimized. Furthermore, the speed of UAV considerably influences the connectivity time and HO rate. All aforementioned factors need to be taken into account when designing UAV operation, making this a highly non-trivial task.

The performance of cellular-connected UAVs and solutions to improve connectivity time and reliability of the corresponding link have been mostly addressed using model-based approaches [3, 4, 5]. In [3, 4], the authors thoroughly studied the coverage and rate performance of cellular networks for UAVs with and without HO effects. Also, HO analysis of cellular-connected UAVs has been investigated in [5, 6]. The UAVs energy consumption is discussed in [2], and the optimal trajectory design of UAVs by considering the energy consumption is studied in [7]

. Besides, there is a growing interest in applying machine learning into UAVs communication and networking

[8, 9]. A few recent studies adopted different machine learning (ML) approaches to alleviate the detrimental effect of HOs [10, 11, 12]. However, the existing works do not concretely take into account the stringent rate requirement of UAVs in the learning process, which makes unclear whether the proposed solutions are capable of delivering the target rate and connectivity time. Moreover, to our best knowledge, the limited energy capacity of UAVs for their task accomplishments is neglected in ML literature on UAVs.

In this paper, we introduce a novel learning-based strategy to manage the mobility of UAVs for connectivity and reliability by taking into account the UAV’s energy consumption and time of task completion. We first formulate the problem as a function of UAV’s velocity, and then propose the sage of reinforcement learning (RL) to solve the problem. We propose a novel algorithm based on multi-armed bandit (MAB) to dynamically adjust the speed of UAV. We show that our proposed strategy significantly decreases HO rate, disconnectivity time, energy consumption, and time of task completion. Moreover, depending on the wight of each metric, by adopting proper values for the learning parameters, we observe an important overall performance improvement with respect to a benchmark.

The rest of this paper is organized as follows. In Section II, we present the network model and considered metrics. In Section III, the preliminaries are elaborated, and the problem is formulated. We propose our learning-based solution in Section IV, and in Section V, numerical results are presented. Finally, the paper is concluded in Section VI.

Ii System Modeling

In this section, we introduce the system model, which includes the network topology (Section II-A) and the channel propagation features (Section II-B), and the key performance indicators (KPIs) in Section II-C.

Ii-a Network Topology

We consider the downlink of a wireless cellular network carried out by a set of base stations (BSs) deployed on a hexagonal layout as shown in Fig. 1. We assume each site consists of three co-located BSs of height , and each BS covers an angular interval of in horizontal 2D plane. We assume that each BS has an active user per time and frequency, which generates inter-cell interference to neighboring cells working at the same frequency band. Each BS transmits with power of distributed over the corresponding bandwidth. Therefore, the transmit power over each physical resource block (PRB) is equal to , where represents the total number of available PRBs.

Within this setup, we consider a UAV flying at altitude , which requires cellular connectivity. The UAV is associated with the BS that provides the strongest signal strength. We further assume that the network allocates PRBs for the UAV-BS link.

The trajectory of the UAV, denoted by , might be regularly updated during its mission by the cellular controller. The UAV flies with velocity where and are, respectively, the minimum and maximum velocity determined based on the type of UAV and its mission. The UAV’s acceleration is upper-bounded by , and represents the instantaneous 3D location of UAV. For the UAV’s trajectory, we adopt a random direction (RD) pattern following 3GPP study case [13]. According to RD pattern, the UAV starts its mission at a random location in 3D space and selects a random direction uniformly. Then, the UAV flies a given distance in a straight line with a speed bonded by UAV’s application, capability, and the movement algorithm.

Fig. 1: 3D network representation of a mobile cellular-connected UAV.

Ii-B Propagation Channel

We consider a dual-slop LoS/NLoS propagation channels for each link. Each propagation channel comprises 3GPP-based path-loss, small-scale fading, and 3D BSs antenna gain as described in the sequel.

Ii-B1 Path-Loss

The LoS and NLoS path-losses between UAV and k-th BS can be respectively written as and where is the 3D distance between the UAV and the k-th BS in meters, and is the working frequency in GHz [13]. Note that path loss expressions are valid for which is the range of interest for cellular-connected UAVs operations.

Ii-B2 Small-Scale Fading

A wireless link between the UAV and k-th BS undergoes small-scale fading with and being the fading powers of LoS and NLoS conditions, respectively. Without loss of generality, we adopt the convention , with being LoS or NLoS. We use the Nakagami-m fading model that covers a wide range of fading environments. [3]. Accordingly,

follows a Gamma distribution, whose cumulative distribution function (CDF) can be found in


Ii-B3 Probability of LoS/NLoS

The aforementioned LoS and NLoS path-loss and small-scale fading components are incorporated to the system along with their probability of occurrence. For

, the probability of LoS can be written as [13]

where and . Furthermore, for we have . Finally, the NLoS probability is .

Ii-B4 Antenna Gain

We assume that each BS is equipped with a vertical N-element uniform linear array (ULA) with antenna element spacing of . Each element has directivity of , where and are the spherical angles in local coordinate system of the origin at the antenna location. Following [14] the element gain can be written as where with , and with . The maximum directional gain of an antenna element is considered to be dBi.

The total BS radiation pattern gain denoted by is obtained as the superposition of each element’s gain, i.e. , and the array factor is given by where is the electrical vertical steering angle defined between and ( represents perpendicular to the array). Accordingly, .

We also assume that the UAV is equipped with a single omnidirectional antenna of unitary gain in any direction. Therefore, the received power (in dB) at the UAV from k-th BS can be expressed as where and represent the spherical angles corresponding to the link from the k-th BS to the UAV in 3D space.

Ii-C Performance Metrics

In the following, we present important metrics that capture key limiting factors of mobile cellular-connected UAVs’ performance.

Ii-C1 Disconnectivity

During UAV’s flight over , the disconnectivity time corresponds to the amount of time that the serving BS is not able to provide a target rate to the UAV. If we denote the UAV’s achievable rate as then the link is called disconnected if . The UAV’s achievable rate is obtained as where W is the bandwidth assigned to the UAV’s link which directly relates to the number of allocated PRBs, and is the instantaneous signal-to-interference-plus-noise-ratio. If we assume that the J-th BS is serving the UAV, the SINR at the UAV can be written as where is the noise power.

Ii-C2 Handover Rate

A UAV is able to constantly measure the received signal power from various BSs and eventually may decide to perform a new association based on the received power strengths. This HO procedure can significantly enhance the SINR level when switching to the best BS that provides the highest signal power. Since the HO procedure requires additional signaling exchanges in the network, it yields delays in communicating useful signals. Although switching to the best BS increases the level of SINR and hence reduces the disconnectivity time, the UAV may encounter several consecutive HOs which, in turn, can result in a degraded reliability of the communication link.

Ii-C3 Power Consumption

One of the limiting factors in UAV’s performance is their limited energy budget available for accomplishing a given mission. A UAV consumes energy for two major purposes including communication and propulsion. The energy consumption of the former particularly for C&C is negligible compared to the latter [2, 7], hence we ignore the communication-related part. Assuming that the UAV maneuvering takes a small portion of the total operation time the power consumption of a rotary-wing UAV can be written as where and are constants respectively related to the blade profile power and induced hovering power, denotes the rotor blade’s tip speed, represents the mean rotor induced hovering velocity, and is determined by the fuselage drag ratio, rotor solidity, the air density, and rotor disc area. A more detailed discussion on the power consumption modeling of UAVs can be found in [2, 7].

Iii Preliminaries and Problem Statement

This section considers critical aspects for the performance of cellular-connected UAV, and presents our problem statement.

Iii-a Preliminaries

Figure 1(a) illustrates that disconnected areas (i.e. coverage holes) grow as the UAV’s altitude increases resulting in a more disconnectivity time. This is due to the fact that the UAV at higher altitude experiences a strong LoS interference from the neighboring cells. Importantly, it may be impossible for a trajectory to avoid all the disconnected areas. However, the UAV could choose to pass over disconnected areas with a higher speed in order to reduce the time it remains disconnected from the network. Please note that allocating more PRBs to the UAV link results on a reduction of the disconnected areas. In this paper, however, our focus is to adopt proper UAVs speeds assuming that other cellular network-dependent parameters are fixed. Accordingly, no additional signaling overhead is imposed on the network.

To gain insight in the behaviour of the HO events, we have illustrated the cell association pattern at different altitudes in Figure 1(b), where areas with the same color are served by the same cell. The black lines show the borders between any two different cells, and hence crossing any of these lines triggers a HO event. Importantly, the serving pattern greatly depends on the altitude: at higher altitudes the HOs lines tend to be denser, which implies that HOs happen more often.

(a) Black areas show disconnected areas where a target rate of 100 kbps over one assigned PRB can not be satisfied.
(b) Crossing the lines trigger HO events. In general, the lines are denser at higher altitudes resulting in more HOs.
Fig. 2: Limits illustration of cellular connectivity in the sky.

Iii-B Problem Statement

We consider the perspective of a cellular-connected UAV designer who aims to accomplish a mission such as environmental sensing. For this, we consider a general objective that may include one or more of the followings: shortening the time of mission accomplishment, improving the life-time of UAV by reducing the energy consumption, reducing the disconnectivity time by considering a required target rate, or enhancing the link reliability by reducing the HOs rate.

These PIs are linked since they are dependent on the UAV’s velocity and the network topology. As the latter can not be controlled by a UAV operator, the former may be intelligently controlled to meet the objective’s conditions. As the speed of the UAV increases, the time of mission accomplishment and disconnectivity time decrease. However, it yields an increased HO rate and it may increase or decrease the energy consumption[2, 7]. Accordingly, optimizing the UAV’s speed is crucial to effectively balance the trade-offs between the time of mission accomplishment, UAV’s lifetime, the connectivity time, and the reliability conditions. Mathematically, the problem can be formulated as:


where represents a desired function which can be a weighted summation of the arguments, is the time of task completion, is the UAV’s power consumption, is the disconnectivity time and is the HO rate. Above, the first constraint represents the range of possible choices for velocity. The parameter is limited not only by the UAV’s maximum possible speed but also by the type of task to be performed. Moreover, the limited acceleration of UAV is taken into account through (1). Note that the overall performance may not involve one or more of the considered PIs depending on the applications, and hence one may set some of the weights zero to focus on specific PIs. For instance, if a network can guarantee the required rate constraint for connectivity, then the disconnectivity weight in the objective function is set to zero.

As the dependency of the objective function on the velocity is complex and determined by the environment which might not be known in advance, we propose a learning approach to solve the problem.

Iv Learning-Based Controlled Mobility

Here we propose an RL based mechanism to address the problem stated in Eq. (1).

Iv-a An Overview of RL and MAB

Reinforcement learning deals with the problem of designing policies for agents that need to act in environments whose inner workings they largely ignore. For building this policy, RL assumes that the agent has access to a reward signal, which provides feedback on how well the objective of the agent has been satisfied. The reward signal is updated after each action, which allows the agent to improve its policy based on its own past experiences. Importantly, RL does not rely on prior knowledge of the environment, being more flexible than other related frameworks such as supervised learning.

RL is a natural choice to address problems such as Eq. (1

). Within the RL literature, our approach is to consider the multi-armed bandit (MAB), as it is one of the most well-understood scenarios with a vast related literature. A MAB abstracts a scenario where a gambler has to choose between a number of slot machines to play with. Each machine has its own likelihood of providing a positive outcome, which is not known by the gambler beforehand. Therefore, the gambler needs to do exploratory rounds to estimate the odds of each machine, to then exploit this knowledge by playing in the most favorable ones. Importantly, each exploratory round improves the knowledge of the gambler about the machines at the expense of risking low payoffs; nevertheless, this knowledge is key for ensuring favorable long-term outcomes. This exploration-vs-exploitation trade-off is a hallmark of RL problems, and its balance is a main concern of RL algorithms.

Iv-B Mobility Management as a MAB Problem

To solve (1) as a MAB problem, the UAV acts as the agent with the set of available velocities as the actions . The UAV’s path is divided into m equal segments. At the beginning of each segment the UAV selects a velocity from – within the ones that are consistent with the maximal acceleration, i.e. . Then, the UAV computes a reward signal defined as


where the coefficients , , , and are learning parameters that indicate the impact of mission completion time, disconnectivity time, HOs, and power consumption on the reward function, respectively. Since increasing the UAV’s speed decreases the traveling time, it is considered as a benefit in the reward function. In (IV-B), denotes the fraction of the time interval at segment t when the UAV is in the disconnectivity condition. Note that the time duration of disconnectivity can be decreased by increasing the speed. In (IV-B), moreover, is the average number of HOs occurred until the segment t which is considered as a cost. Therefore, the cost due to HOs increases with a higher number of HOs. The last term in (IV-B) captures the impact of power consumption on the reward function as a cost. Finally, the speed and power are normalized within the range [0,1] in order to fairly combine the benefits and costs elements.

Iv-C Solution Based on Upper Confidence Bound (UCB)

The upper confidence bound (UCB) algorithm is an effective approach for solving the MAB problem. Using the UCB algorithm, the UAV first selects each velocity once. Then, as the iteration becomes larger than the number of actions denoted as , it selects the velocity according to the decision function that satisfies


where denotes the mean reward of velocity until segment , is the number of times that arm has been selected, and is a subset of ensuring the acceleration constraint in (1). In (3), the mean reward term and the second term capturing correspond to exploitation and exploration, respectively. Parameter balances the trade-off between exploration and exploitation. A pseudocode describing our proposed algorithm is presented in Algorithm 1.

1:   Initialization:
3:   for  do
4:       Select the t-th arm
5:       measure , , and
6:       Assign
7:       Calculate using (IV-B)
8:       Assign
9:       Assign
10:   end for
11:   Main loop:
12:   for  do
13:       Select arm according to (3)
14:       measure , , and
15:       Assign
16:       Calculate according to (IV-B)
17:       for  do
18:           Update
19:           Update
20:       end for
22:   end for
Algorithm 1 : UCB-based mobility management algorithm for the cellular-connected UAV

V Numerical Results

Fig. 3: Considering the overall performance indicator, i.e. , learning strategy outperforms benchmark method for low values of .
Fig. 4: HO cost can be reduced by increasing . However, the overall performance is optimized within a limited range of .

In this section, we examine our proposed algorithm by focusing on each target performance individually and also in total on the objective function. To study the total effect of the parameters , we consider an objective function given by where superscript B denotes the benchmark result, indicates the average, and s (i=1,2,3,4) are non-negative real coefficients where . This representation of enables us: 1) to determine the importance of each individual metric as compared to others by adjusting s (For an specific scenario some of the s may be set equal to zero), 2) to fairly evaluate the overall impact of the learning method as compared to the benchmark. Please note that, is equal to 1 for the benchmark and improvements due to learning should be reflected in performances where . The benchmark is obtained by adopting a uniform random velocity in each iteration. The parameters used for the simulations are summarized in Table I. The default values of the learning parameters are one. Furthermore, for the parameters’ values of power consumption and BSs antenna pattern, see [2] and [14] respectively.

noise power target rate bandwidth carrier frequency
-204 dB/Hz 100 kbps 180 kHz 2 GHz
environment inter-site distance BSs height BSs transmit power
6 km 6 km 1 km 25 m 46 dBm
min. velocity max. velocity max. accel. flying altitude
1 m/s 30 m/s 5 m/s 150 m
TABLE I: Notations and values for simulation.

Figure 3 illustrates the impact of velocity learning parameter on each PI and also on the total objective function . As expected, an increase in increases the velocity, and hence proportionally reduces the time of task completion. A higher , also, decreases the disconnectivity time , however, it is disadvantageous for handover rate and power consumption. As can be seen, the total effect of on the objective function can be minimized by adopting proper values of . For instance for the case of , the best choice is .

Figure 4 shows the higher HO cost results in a lower HO rate and velocity. The HO rate is significantly lower than the benchmark using the learning technique. Altogether, the total effect of HO learning parameter can be balanced by choosing between 0 and 5 in our examined cases. It is worth pointing out that in these figures for still the learning method outperforms the benchmark as the objective function is below 1.

Figure 5 shows that an increase in the disconnectivity learning factor reduces the disconnectivity time which confirms the suitability of the reward function in (IV-B) for this problem. Furthermore, with increasing , power consumption increases. The main reason is that the UAV speed increases compared to the optimal speed for the minimum power consumption. The overall performance represented by the objective function can be minimized by choosing , though for the other examined values yet the learning approach benefits the system as lies below 1.

Figure 6 reveals that the higher values of power consumption cost makes this contributor dominant over others, and hence the velocity converges to the optimum velocity for the minimum power consumption. This fact results in a relatively stable behavior of HO rate and disconnectivity time for large values of . In general, the average HO rate and disconnectivity time are not monotonic functions of motivated by the fact that the dependency of power consumption on velocity is not monotonic. As for the objective function , one can see that depending on the weight of each term, i.e. , the larger or lower values of is better. Reducing , i.e. the importance of time completion, and increasing the weight of power consumption and other individual metrics require higher values of for an optimal overall performance.

Fig. 5: Increasing , i.e. the disconnectivity learning parameter, reduces the disconnectivity time and increases the HO rate. Overall, there is an optimal value of that balances all the effects in the objective function.
Fig. 6: Depending on the weight of each performance indicator in the objective function, increasing power consumption’s learning parameter, i.e. , can be detrimental (left figure) or beneficial (right figure).

Vi Conclusion

In this paper, we have addressed key challenges of mobile cellular-connected UAVs including connectivity time, handover rate, energy consumption, and traveling time by using a reinforcement learning approach. Our approach leverages the Upper Confidence Bound algorithm for MAB problems, which we recast in the context of cellular-connected UAV systems by recognition of a proper reward function. Our results show that adequate learning parameters enable significant improvement in the key performance indicators. Interestingly, the optimal combination of learning parameters depends on the weight of each indicator.


  • [1] M. Mozaffari, W. Saad, M. Bennis, Y.-H. Nam, and M. Debbah, “A tutorial on UAVs for wireless networks: Applications, challenges, and open problems,” IEEE communications surveys & tutorials, vol. 21, no. 3, pp. 2334–2360, 2019.
  • [2] Y. Zeng, Q. Wu, and R. Zhang, “Accessing from the sky: A tutorial on UAV communications for 5G and beyond,” Proceedings of the IEEE, vol. 107, no. 12, pp. 2327–2375, 2019.
  • [3] M. M. Azari, F. Rosas, and S. Pollin, “Cellular connectivity for UAVs: Network modeling, performance analysis, and design guidelines,” IEEE Transactions on Wireless Communications, vol. 18, no. 7, pp. 3366–3381, 2019.
  • [4] M. M. Azari, G. Geraci, A. Garcia-Rodriguez, and S. Pollin, “UAV-to-UAV communications in cellular networks,” IEEE Transactions on Wireless Communications, 2020.
  • [5] R. Amer, W. Saad, and N. Marchetti, “Mobility in the sky: Performance and mobility analysis for cellular-connected UAVs,” IEEE Transactions on Communications, 2020.
  • [6] A. Fakhreddine, C. Bettstetter, S. Hayat, R. Muzaffar, and D. Emini, “Handover challenges for cellular-connected drones,” in Proceedings of the 5th Workshop on Micro Aerial Vehicle Networks, Systems, and Applications, 2019, pp. 9–14.
  • [7] H. Sallouha, M. M. Azari, and S. Pollin, “Energy-constrained UAV trajectory design for ground node localization,” in 2018 IEEE Global Communications Conference (GLOBECOM).   IEEE, 2018, pp. 1–7.
  • [8] J. Hu, H. Zhang, L. Song, Z. Han, and H. V. Poor, “Reinforcement learning for a cellular internet of UAVs: Protocol design, trajectory control, and resource management,” IEEE Wireless Communications, vol. 27, no. 1, pp. 116–123, 2020.
  • [9] A. H. Arani, M. M. Azari, W. Melek, and S. Safavi-Naeini, “Learning in the sky: Towards efficient 3D placement of UAVs,” in IEEE PIMRC.   IEEE, 2020, pp. 1–7.
  • [10] A. Azari, F. Ghavimi, M. Ozger, R. Jantti, and C. Cavdar, “Machine learning assisted handover and resource management for cellular connected drones,” arXiv preprint arXiv:2001.07937, 2020.
  • [11] Y. Chen, X. Lin, T. Khan, and M. Mozaffari, “Efficient drone mobility support using reinforcement learning,” arXiv preprint arXiv:1911.09715, 2019.
  • [12] M. M. U. Chowdhury, W. Saad, and I. Guvenc, “Mobility management for cellular-connected UAVs: A learning-based approach,” arXiv preprint arXiv:2002.01546, 2020.
  • [13] 3GPP Technical Report 36.777, “Technical specification group radio access network; Study on enhanced LTE support for aerial vehicles (Release 15),” Dec. 2017.
  • [14] 3GPP Technical Report 36.873, “Technical specification group radio access network; study on 3D channel model for LTE (Release 12),” Jan. 2018.