Wireless communication assisted by unmanned aerial vehicles (UAVs) is a promising technology to enhance the performance enhancement of the cellular networks. Specifically, UAVs are expected to be deployed in the fifth generation (5G) wireless networks [28, 14, 19] by serving as aerial base stations (BSs) or relays to boost the capacity and the coverage of the existing cellular networks [5, 6, 21, 25]
. The key reasons for the potential performance enhancement are their high mobility, efficient deployment, and high probabilities of establishing line-of-sight (LOS) connections towards ground users, which improves the quality of service (QoS). Thus, UAVs can be deployed to provide Internet coverage to rural areas or cell edges with weak signals, provide extra service capacity for temporary events (such as major sports events and outdoor activities), and restoring communications in emergencies.
In recent years, the industry has already started to implement UAV-assisted wireless networks. For example, Google has launched the Loon Project  with the intention to provide Internet access worldwide. To support the deployment of UAVs, the 3rd Generation Partnership Project (3GPP) has studied how the current cellular networks can accommodate UAVs, as well as how to deploy UAV BSs in a convenient and technically feasible manner. For example, its Report in  studied how well the existing Long Term Evolution (LTE) radio network can provide services to low-altitude UAVs and the provision of 5G new radio services from high-altitude platforms. All these efforts aim to combine UAVs and cellular technologies in a mutually beneficial manner in 5G wireless networks .
To efficiently provide communication services through UAVs, it is important to consider the trajectory design problem regarding where and when should the operators deploy the UAVs. Prior researches on UAV-enabled wireless communications have mainly focused on the UAVs’ trajectory design to improve different QoS requirements[33, 20, 38, 31, 22, 37, 34, 29]. The authors in  maximized the minimum throughput over ground users in the downlink communication by optimizing the user scheduling, power control and the UAV s trajectory.  investigated the optimal trajectory of UAVs equipped with multiple antennas for maximizing sum-rate in uplink communications. The work in  studied the throughput maximization problem in mobile relaying systems by optimizing the source/relay transmit power along with the relay trajectory. Moreover, the UAV s transmit power and trajectory were jointly optimized to maximize the minimum average throughput within a given time length in . Furthermore, the authors in  proposed a new cyclical multiple access scheme that the UAV flies cyclically above the ground, and characterized the max-min throughput by optimally allocating the transmission time to ground terminals based on the UAV position. In , a UAV was dispatched to disseminate a common file to a set of ground terminals (GTs) and the authors aimed to design the UAV trajectory to minimize its mission completion time, and  optimized the trajectory design to maximize the amount of energy transferred to all energy receivers during a finite charging period. The above studies mainly focus on communication performance improvement and the technical challenges that exist in trajectory design are energy limitation, interference mitigation, and user mobility.
Since UAVs consume a significant amount of energy to support their mobility, it motivated the design of the energy-efficient UAV communication via trajectory design in [39, 36, 40, 35]. The authors of  focused on the energy efficient maximization of a fixed-wing UAV enabled communication for given flight duration. The work in  minimizes the total rotary-wing UAV energy consumption while satisfying the individual target communication throughput requirement for multiple ground nodes. Moreover, the UAV worked as the mobile data collector has been studied in , which minimized the maximum energy consumption of all sensor nodes while ensuring that the required amount of data is collected reliably from each sensor node. The author in  further studied the trade off between UAV’s energy consumption and that of the ground terminals it communicating with.
Besides energy consumption, another important aspect in the trajectory design is related to the interference management. As UAVs have high probabilities to establish LOS connections with targets, which will increase the intended signal for the target and also the interference for others, like the ground BSs. To address this challenge, current researchers provide a number of techniques, such as the full dimension Multi-input Multi-output (MIMO) multiple antenna BSs, UAVs with directional antennas and beamforming capabilities and power control. The authors in  proposed a novel interference-aware path planning scheme for a multi-UAV network to minimize the interference they cause on the ground network.
into UAVs’ trajectory design. In practice, the users may move from one place to another at different times due to their requirements. For instance, the stadium may experience communication congestion if a concert holds on it, and if not, it may have fewer user demand. If we do not consider the mobility, we will not be able to estimate the demand accurately. As a result, we cannot deploy UAVs to the proper locations at the right times, which reduces the number of users that we can serve.
In this paper, we maximize the throughput of ground users in the downlink communication by optimizing the UAVs’ trajectories, while taking into account the impact of the user mobility, propulsion energy consumption, and UAVs’ mutual interference. We utilize the Mobility Markov Chains (MMC)[18, 17] to model the ground users’ movement during a period, so we can estimate the location-and-time dependent user demand. The throughput depends on the UAVs’ mutual interference. Both the user demand and throughput together define the reward function. Moreover, the UAV propulsion energy consumption, based on the results in the literature (e.g. ), is defined in the cost function. We formulate the problem as a route selection problem in an acyclic directed graph. Each vertex represents a task associated with a reward, and each edge is associated with a cost. First, we study the centralized trajectory design, we propose the shortest path scheme that determines the optimal trajectory for the single UAV case. We also propose the CRS scheme to systematically compute the optimal trajectories for the more general multiple-UAV case. Due to the NP-hardness of the centralized problem, we formulate the UAVs’ interactions as a route selection game. We prove that it is a potential game with the FIP, which guarantees our proposed DRS scheme will converge to a pure strategy Nash equilibrium within a finite number of iterations. To the best of our knowledge, this is the first work that brings the user mobility into the UAVs trajectory design, while taking both the propulsion energy consumption and interference mitigation into account.
We summarize the key results and contributions as follows:
A general model of UAV trajectory design: We present a general model of the UAVs’ trajectory design with the location-and-time dependent user demand, UAV mutual interference, and propulsion energy consumption.
Centralized optimal trajectory design: For the centralized trajectory design, we propose the shortest path scheme for the single UAV case and also the CRS scheme for the multiple-UAV case.
Distributed route selection algorithm: We formulate the UAVs’ trajectory design as the route selection game, which is a potential game with the FIP. This property guarantees that our proposed DRS scheme will converge to a pure strategy Nash equilibrium within a finite number of iterations.
The rest of this paper is organized as follows. Section II describes the system model. Section III presents the problem formulation and the optimal schemes for the centralized trajectory design. Section IV shows the DRS scheme for the distributed trajectory design. Simulation results are provided in Section V and Section VI concludes this paper.
Ii System Model
Ii-a UAVs, Regions, and Time Slots
As shown in Fig. 1, we consider a multiple-UAV assisted wireless communication system, where UAVs work as aerial BSs for providing Internet services to ground users located in regions. The region and UAV sets are denoted as and , respectively. We assume that all the UAVs share the same frequency band for communication over slotted time . Each UAV serves its associated users via multiple access techniques (e.g., TDMA, CDMA, or OFDMA) over a number of orthogonal channels, and it is connected to a nearby macro cell tower with a wireless backhaul link .
We utilize Mobility Markov Chains to characterize the region-and-time dependent user demand, which describes the number of users needed to be served. Based on the demand (will be discussed in Section II-C), reward and cost of each task, the UAVs have to decide when and where to provide wireless communication services in the time slots. Note that the UAVs start from the control station and need to go back the control station at the end of the serving duration of .
Ii-B Tasks Model
For each region-time point, we define a task. Let be the set of tasks, where . We map a region-time point to a task index by the function
We describe the characteristics related to a task as follows.
Definition 1 (Task characteristics)
Each task is associated with:
The region and time slot111Each task is generated at the beginning of the time slot . Note that the UAVs must fly to the region before the beginning of the time slot so that it can execute the task. Otherwise, the UAVs cannot work on the task. .
The reward for UAV completing task . The reward is related to other UAVs’ decisions (will be defined in Sec. II-E).
The UAV’s potential location222Each region is assigned a UAV potential location. When executing task , the UAVs can only stay in this location in region . , where represents the constant altitude of the UAVs. That is, each UAV can only be in the corresponding location while executing a task.
The user demand , which is the density of the users needed to be served in task (will be defined in Sec. II-C). We assume that the distribution of the users follows the two-dimensional Poisson Point Process (PPP) . The users are at the ground level (i.e., zero altitude), so a user’s location is denoted by , which is located within the bounds of region .
To account for the impact of the interference between UAVs on the trajectory design, we will define the reward of each task as the average user throughput subject to the UAVs’ mutual interference. First, we present the user demand and Air-to-Ground (A2G) channel model, on which the reward depends.
Ii-C User Demand
In practice, users change their locations over time, and they may belong to different regions. To properly model the user mobility, we apply the widely adopted Mobility Markov Chains (MMC) [18, 17] to estimate the number of the users needed to be served in the regions, which captures the user demand.
Specifically, we define the probability that a user leaves region for region as for . Without loss of generality, we further define the region outside as . The user mobility is given by a transmission probability matrix . Based on MMC, only the current user location is utilized to predict the next one. We suppose the initial number of users at is for all . Therefore, the expected number of users of region at time slot is
where the second term on the right hand side represents the number of users arriving at region , and the third one is the number of users leaving region . Based on the task model, we define the user demand as (in the unit of number of users per m), where is the area of region .
Ii-D Air-to-Ground (A2G) Channel Model
We assume that UAV is executing task . Based on the task model in Section II-B, the location of UAV is , and the user’s location in region is . We assume that each UAV has the same type of directional antenna with beamwidth , and the main beam covers the region directly beneath the UAV. This coverage cone has a radius333Note that this coverage cone can cover at least one region. . The interference beyond this coverage is negligible [15, 16].
As show in Fig. 2, when a user is in the coverage cone (), the path between the UAV and a user can be a LOS path or a non-LOS (NLOS) path. The LOS probability, which is related to the environment, the user’s and UAV’s location, and the elevation angle, is given by 
where and are constant values determined by the type of environment, and is the elevation angle. More specifically, , where is the Euclidean distance between UAV and the user located in region . Due to the coverage of the directional antenna, it will not generate interference to neighbouring regions that satisfy . The NLOS probability is . The average channel gain  between the UAV and the user is
where , is the carrier frequency, is the speed of light, and is the path loss exponent of the link between the UAV and the user. Besides, and () are the excessive path loss coefficients in LOS and NLOS cases.
As all the UAVs share the same frequency band over each time slot, the user may receive the interference from other UAVs. To avoid severe interference, we assume that each region can only be served by one UAV per time slot. Let be the state of UAV , where means that UAV is at location executing task , while represents that UAV
is moving to other regions and not executing any task at this moment. Moreover, we assume that the UAV’s transmission power is if it is executing a task (i.e., ) and if the UAV is moving (i.e., ), and will not interfere other communication links. As a result, the received transmission rate of a user located in region from UAV at time is
where is the power of the additive white Gaussian noise. In the next subsection, we will define the reward by averaging over the position of a random user in region .
Ii-E Reward Function
Based on (5), we define the reward as the average downlink throughput for all the users in the region. By the PPP assumption, the user locations are independent and identically distributed, so the average throughput can be computed by averaging over the region . Thus, the reward of task served by UAV is
where denotes the coefficient of the reward, and represents the fixed bandwidth that each user is allocated under FDMA. Moreover, is the region corresponding to task . is a function that defines the distance between the boundary and the center of the region. For example, if we consider a hexagonal topology as in Fig.1, it is given by
where is the horizontal angle, and is the side length of each regular hexagon.
Ii-F Cost Function
The cost function takes into account both the energy consumption during flying and hovering444Since the communication energy (in the order of a few watts) is usually much smaller than the propulsion energy (in the order of hundreds of watts), we ignore the former in this paper.. First, we define the propulsion power consumption of the rotary-wing UAVs555For the ease of exposition, we assume that all the UAVs adopt the same power consumption model. However, our proposed algorithms in Section III and IV can be easily extended to the more general case with heterogeneous power consumption models. with speed by 
where , , and are three parameters related to air density and physical properties of the rotor (e.g. rotor solidity and rotor disc area), denotes the tip speed of the rotor blade, and is the mean rotor induced velocity in hovering.
When UAV flies at a constant speed , the power consumption of movement is . On the other hand, the UAV is hovering and executing a task, it is quasi-stationary when executing a task, so the power consumption of hovering is .
Assume that UAV aims to perform task after completing task . The flying distance between these two tasks can be expressed as . Based on the constant speed , the flying duration is . We assume that the length of each time slot is sufficiently small such that the flying duration can be quantized into several time slots, which the length of one time slot is . The interval between these two tasks is . Therefore, task can be served by UAV only when . For ease of illustration, we define the moving time as
By assuming that each task requires one time slot to complete, the hovering time is
Overall, we define the cost of UAV to be the sum of the moving cost and the hovering cost as
where represent the moving and hovering cost per unit watt of the UAV.
Iii Centralized Trajectory Optimization
In this section, we focus on the UAVs’ trajectory design to maximize their total payoff. In Section III-A, we describe the graph representation of the trajectory design problem. In Section III-B, we study the single UAV trajectory design. We convert the graph so that we can apply the shortest path (SP) algorithm to compute the optimal trajectory. Based on the SP scheme, we construct a new graph in Section III-C, and propose the CRS scheme for multiple UAVs’ trajectory design.
Iii-a Graph Representation of Trajectory Design
Based on the task characteristics in Section II-B, we define the graph associated with regions and time slots as follows for each UAV .
Definition 2 (Graph representation)
Let be a graph. We define the set of vertices and the set of edges as follows. The vertex set contains all the region-time points
Each region-time point is associated with a reward defined in (6). The edge set contains the feasible transitions of UAV between any two different region-time points as
Next, we formulate UAV ’s trajectory design as the region-time route selection problem in graph . We define the feasible routes for UAV as follows.
Definition 3 (Feasible route)
Based on graph , feasible route of UAV is
where the vertex , and the edge . For the first region-time point , represents the source and . For the last region-time point , represents the destination and .
We show an example in Fig. 3. We can see that the edge is not feasible, because the UAV cannot arrive in region before the beginning of the second time slot. Besides, we have two feasible routes for UAV and UAV , which are and , respectively. In this example, the source and destination are both in the same666Note that the scheme we propose in Sections III-B, III-C and IV are suitable for an arbitrary source and destination. region (e.g. region ).
We define and as the set of vertices and edges traversed by the route . For example, the set of vertices of route is , and the set of edges is . The strategy profile is the strategies of all the UAVs. Therefore, the payoff (i.e., total rewards minus total costs) that UAV gets for choosing route in a strategy profile is to
The goal of the centralized trajectory optimization is to maximize the UAVs’ total payoff as
which is an NP-hard problem. We first propose the shortest path (SP) scheme for the single UAV trajectory design in Section III-B, which serves as the basis for the more general multiple UAVs’ optimal trajectory design in Section III-C.
Iii-B Single UAV Trajectory Design
For the single UAV trajectory design (i.e., ), there is no interference caused by other UAVs. According to (6) and (11), the reward and the cost can be pre-determined. Based on graph in Fig. 3, we suppose the red and blue solid line represent two feasible routes for the single UAV. Since each vertex is associated with a reward and each edge is associated with a cost, graph is not a standard acyclic directed graph777A standard acyclic directed graph is formed by a collection of vertices and edges, where the vertices are structureless objects that are connected in pairs by edges. Each edge has an orientation, from one vertex to another vertex, and is associated with a weight. However, each vertex is not associated with any weight .. To handle this problem, we propose the shortest path (SP) scheme as shown in Scheme 1.
We first convert it into a new graph , and apply the Bellman-Ford algorithm888We use Bellman-Ford algorithm, because it can compute the shortest path on a graph with both positive and negative edge weight. We can use the Dijkstra Algorithm for our problem with the lower time complexity, but it needs one more step conversion. to find the optimal route in problem (16), which takes the following steps.
The weight of edges: The newly defined edges incorporate both the rewards and the costs, where the weight of an edge in graph as
In this way, we will not associate any rewards with the vertices.
Add a virtual vertex and a virtual edge to graph : The UAV sets off from the control station. We have the reward of the starting point defined in (14). We use to represent the virtual vertex, and add a virtual edge to connect it with the starting point (see the third subfigure in Fig. 4). Therefore, the weight of this virtual edge is . We use to represent this virtual edge. Fig. 4 illustrates how these two steps convert graph into graph .
Conversion from cost minimization to payoff maximization: The minimal cost computed by the Bellman-Ford algorithm is opposite to the maximum payoff. That is,
The UAV can determine its optimal route within time.
The Bellman-Ford Algorithm has a time complexity , where and represent the number of vertices and edges, respectively. For graph , we have and . Therefore, the SP scheme has a time complexity .
Iii-C Multiple UAVs’ Trajectory Design
For the multiple UAVs’ trajectory design (i.e., ), the reward of each task cannot be pre-determined, because it is related to the states of all the UAVs. Therefore, we cannot obtain graph directly in this scenario.
be the set of all possible state vectors, whererepresents the total number of state vectors . Similar to the discussion in Section III-B, we will define the vertex set, vertex reward, edge set, and edge cost of the new graph and compute the optimal trajectories.
Iii-C1 Vertex set and rewards of the new graph
Each vertex is represented by a state-time point , and we define the vertex set as
The vertex reward of vertex is the summation of each UAV’s reward in (6):
Iii-C2 Edge set and costs of the new graph
Each edge indicates changes in the UAVs’ states. Before calculating the associated cost, we first analyze the condition for a feasible edge. For each UAV, we need to record the starting region and the number of time slots it has been moving, so that we can decide whether the edge exists or not. More specifically, we define a mobility vector for each vertex as , which decides whether the UAVs are moving at time slot . The element of mobility vector is
where is the indicator function. We further define the state vector for each vertex , which records the starting region for each UAV. Moreover, we define the updating mobility vector for to record how many time slots that each UAV is moving. We consider that is an edge connected vertex and . We update the state vector and the updating mobility vector for vertex as
which records how many time slots each UAV is moving until the current time. According to Section II-F, the condition for the feasible edges is
It indicates that each UAV has to arrive in the corresponding regions of state at the beginning of time slot .
We define the set of edges as
Based on the aforementioned discussion, some UAVs’ states may be equal to , which means they are moving for one time slot. On the other hand, some UAVs may be hovering for one time slot to serve the corresponding regions. Therefore, the edge cost for is
In the pink solid route in Fig. 5, UAV and UAV are serving the same region at . The state vector is , and the updating mobility vector is because both UAVs are providing services. At , UAV is moving, and UAV is still serving region . Therefore, the state vector, which records the starting region for all the UAVs, is . The updating mobility vector is , which records UAV is moving for one time slot. Based on the movement speed and distance, we have that UAV can arrive at region after flying for one time slot. According to (24), the second edge is feasible. Accordingly, the mobility vector and state vector for each vertex in the route can be computed. Based on these two vectors, we can be computed the pre-determined cost for each feasible edge by (26). In graph , each vertex is associated with a reward in (20), and each edge is associated with a cost in (26). Furthermore, the route of UAV and UAV are and .
Iii-C3 Feasible route in graph
Based on graph , the feasible route of all the UAVs is
where represents the feasible route set for the UAVs in graph . Also, we have the vertex , and the edge . For the first state-time point , represents the sources of all the UAVs. For the last state-time point , represents the destinations of all the UAVs.
Iii-C4 Multiple UAVs’ optimal trajectories
We define and as the set of vertices and edges traversed by the route . Therefore, the payoff that all the UAVs get for choosing route is equal to
so that we can convert the original problem (16) into
In the graph , with vertex set in (19), vertex reward in (20), edge set in (25), and edge cost in (26) as input, we can adopts the ideas in the SP scheme in Algorithm 1 to compute the optimal trajectories for multiple UAVs.
Based on the SP scheme, multiple UAVs can determine their optimal route within time.
The idea is similar to the proof of Proposition 1 with and in graph .
Multiple UAVs’ centralized trajectory design is an NP-hard problem, which is computationally complex to solve for large networks. Besides, it requires a central entity with the full knowledge of the current state of the network and the ability to communicate with all UAVs at all time. However, this might not be feasible in case the UAVs belong to different operators or in scenarios in which the environment changes dynamically. Therefore, we next adopt a distributed trajectory design in which each UAV decides autonomously on its trajectory.
Iv Distributed Route Selection Game
In this section, we first formulate the UAVs’ trajectory design problem as the Route Selection Game (RSG), and then propose the distributed route selection algorithm (DRS). Our objective is to develop a distributed approach that allows each UAV to choose its trajectory in an autonomous manner.
Iv-a Non-cooperative Route Selection Game
To avoid multiple UAVs from serving in the same region, which is inefficient due to their severe mutual interference, we define the reward of task as
where is the number of UAVs serving task . The first line is the reward of task served by UAV in (6) when there is no interference caused by other UAVs. Since each task can only be served by one UAV, we define that the reward of task will be if multiple UAVs serve the same task. According to (15), the payoff of UAV with route is
where is the number of UAVs providing services for task based on the current route .
Based on the graph representation in Section III-A, we formulate the multiple UAV trajectory design as a RSG, where UAVs act as players to choose the available region-time routes.
Definition 4 (Route selection game)
Let be the strategies of all the UAVs except UAV . For the RSG, we are interested in whether the UAVs can reach an equilibrium strategy profile , in which no UAV can further increase its own payoff by changing its strategy, i.e., a pure Nash equilibrium of the game .
Definition 5 (Pure Nash equilibrium)
A Nash equilibrium (NE)  of the game is a strategy profile such that
Definition 6 (Best response update)
Given a strategy profile , we say that strategy is a best response update for UAV if it maximizes UAV ’s payoff. That is, .
Definition 7 (Finite improvement property)
A game possesses the finite improvement property (FIP)  when each UAV’s best response update always converge to a pure NE within a finite number of steps, irrespective of the initial strategy profile or the updating order of the UAVs.
Every route selection game possesses the FIP.
The proof of Theorem 1 is given in Appendix A.
Iv-B Distributed Route Selection Algorithm
Based on the FIP, we propose a distributed route selection algorithm (DRS) as shown in Algorithm 2, and which consists of the following steps.
Initialization: According to Definition 2 in Section III-A, we prepare the reward set and the cost set (line ). Since each UAV has different speed , we calculate the edge cost in (11) for all the UAVs (line ). To obtain the vertex reward in the next step, we need to find the feasible route set for each UAV (line ), and initialize the routes for all the UAVs (line ).
Best response update iteration process: At the iteration process, each UAV intends to maximize its payoff by choosing an optimal route. For UAV , we first calculate the number of UAVs , who are serving task (line ). Based on , we calculate the vertex reward (line ). Then, we compute the payoff under current strategy profile and update (line ). Next, UAV needs to find the optimal route based on the current strategy of the others. We update the vertex reward and run the SP algorithm to find the optimal route (line -), so each UAV can compute its best response update within time from Proposition 1. If the new payoff has improved (line ), the strategy of UAV will be updated (line ).
Output: This process repeats until the strategy profile does not change any more, so it is a pure NE.
V Performance Evaluations
In this section, we present the numerical results to evaluate the performance of our CRS and DRS schemes. We first describe the basic simulation setting in Section V-A. Then, we discuss the greedy path (GP) scheme and circular path (CP) scheme in Section V-B, which serve as the benchmark schemes. We compare our proposed DRS scheme with these benchmark schemes in Section V-C.
V-a Simulation Setting
In our simulations, we consider the network topology as shown in Fig. 1. multiple UAVs provide services for seamlessly connected regular hexagonal regions with m side length. We consider time slots. Moreover, we suppose UAV potential location is the center of each region. The UAV control station is in region (see Fig. 1). According to PPP, ground users are independent and identically distributed in regions, and the user transmission probability matrix in Section II-C has been provided. The ground users change their location over duration . We consider the rotary-wing UAVs, so that the propulsion energy consumption is given by (8). Note that we assume all the UAVs have the same transmission power . For each set of parameters, we run the simulations times with randomized UAVs’ sources and destinations, and user demand in MATLAB. Other simulation parameters are listed in Table I.