## I Introduction

Due to the novel types of services and ongoing advances in unmanned aerial vehicle (UAV) technologies, there is a growing consensus on integrating UAVs into cellular networks. It is expected that UAVs will play a prominent role for traffic offloading, capacity enhancement and disaster recovery [19]

. From an economic perspective, deploying small cell base stations (BSs) and/or advanced fifth generation (5G) components, such as massive multiple-input and multiple-output (MIMO), may not be cost-effective for temporary events. In this regard, deployment of UAVs can be considered as an alternative or complement solution. UAVs are able to establish line-of-sight (LoS) communication links with high probability for ground users resulting in increased coverage, enhanced reliability and agility

[12].Mobility of UAVs and their flexibility in adjusting their locations significantly impact the LoS probability and the network performance. Most research efforts have addressed this issue from a non-learning perspective [22, 12, 8]. However, there has been a growing attention recently devoted to the use of learning algorithms for the deployment problem of UAVs [4, 15, 18, 16]. In [4] and [15] learning based approaches to find the two-dimensional (2D) trajectory of UAVs flying with fixed altitudes were proposed. In [18], a learning based approach for three-dimensional (3D) placement of a single UAV is developed to maximize network throughput. However, these works do not address the 3D deployment of multiple UAVs integrated into already existing terrestrial networks. In [16], the problem of 3D placement was dealt as a two separate optimization problems in horizontal 2D plane and altitude of UAVs, where a prior knowledge of the users locations is required. However, providing this information for UAVs in real time can be challenging. Moreover, the proposed algorithm is not able to adapt to the environment changes. Finally, to our best knowledge, the channel models in the previous learning based reports do not capture the dependency of path loss exponents to the height of UAVs which might have a significant influence on the performance of UAVs [1, 12].

In this paper, we address the optimal 3D locations of multiple UAVs aiming to assist the existing terrestrial cellular networks. We formulate an optimization problem using a novel framework. This framework is based on a noncooperative game in satisfaction form which is implemented at the level of UAVs. Furthermore, as opposed to the related works, we take into account the network load effect, representing the BSs capabilities in serving users. Then, we leverage a learning based approach to develop a low complexity and robust algorithm allowing UAVs to autonomously adjust and optimize their locations adapted to dynamic environments. Finally, in order to examine our proposed algorithm, we employ a third generation partnership project (3GPP)-based height dependent channel model. Our findings from the simulation results show that the proposed learning based approach significantly improves the performance of the network, and reduces the required number of UAVs for an arbitrary target gain.

The rest of this paper is organized as follows. In Section II, we describe the system model. Section III presents the proposed algorithm for the 3D deployment of UAVs in the network. In Section IV, we evaluate the performance of the proposed approach. Finally, Section V concludes the paper.

## Ii System Model

In this section, we describe the system model, including network topology, channel model, and user association method.

Network Topology: we consider the downlink of a cellular network consisting of a set of terrestrial BSs , and a set of UAVs as aerial BSs to support the users in a particular area . The set of total users and the set of users associated to BS are denoted by and , respectively, where indicates the total BSs in the system. We assume that all the BSs transmit over the same channel, i.e. co-channel deployment. Let be the location of BS at time , where and are the location of BS in the horizontal dimension and its altitude at time , respectively.

Radio Propagation and Signal Quality: we assume that the link between each user and each BS comprises LoS and non-LoS propagation conditions. Let denote the probability of having a LoS link between user and BS at time which is determined as follows [1]:

(1) |

where , and represent statistical environment-dependent parameters. Here, and are the altitude of BS and the altitude of user , respectively. The horizontal location of user , and its horizontal distance to BS at time are denoted by and , respectively. Consequently, the non-LoS probability can be determined as .

Let denote the 3D distance between BS and user at time . The path loss between BS and user can be expressed as [1]:

(2) |

where superscript denotes LoS and non-LoS components on the link. Here, and are the reference path loss and the path loss exponent, respectively. Therefore, the signal to interference plus noise ratio (SINR) experienced by user can be formulated as:

(3) |

where and are the transmit power of BS and the channel gain between BS and user , respectively. Parameter represents the additive white Gaussian noise (AWGN) power.

The achievable data rate provided by BS to user using Shannon’s capacity formula is given by:

(4) |

where is the total bandwidth.

User Association Policy: we utilize a user association policy capturing both received signal power and load (which represents the BSs capabilities for serving new users). Let denote the traffic influx rate of user . The fraction of time BS requires to serve the traffic to the location of user is defined as . Therefore, the load of BS at time is given by [5].

To associate the users to the BSs we assume that each BS

broadcasts its estimated load,

, at time which is obtained as follows[5]:(5) |

where is the learning rate of the load estimation for BS . Then, each user selects its serving BS based on the received signal power and the load of the BSs as follows [5]:

(6) |

## Iii Learning Based Placement Algorithm

In this section, we aim at optimizing the 3D locations of the UAVs for maximizing the throughput, without global network information. The problem of finding optimum 3D locations of the UAVs is complex mainly due to the mobility of the UAVs and temporal traffic statistics. Therefore, we leverage the tools of machine learning to solve the problem. In this regard, for each UAV

, a utility function is defined as follows:(7) |

where and

are the weight parameters that indicate, respectively, the impact of throughput and the activation function

on the utility, and is a normalization parameter. Here, is an activation function for the safety of UAVs in order to avoid collision, which is defined as follows[4]:(8) |

where and represent the 3D distance between UAV and UAV , and a certain minimum distance, respectively. According to (8), if the distance between UAV and UAV is less than a minimum distance , the function returns value one which is considered as a cost in the utility function defined in (7). Let denote the locations of the UAVs at time . Therefore, the optimization problem is formulated as follows:

(9a) | ||||

s.t. | (9b) | |||

(9c) | ||||

(9d) |

where is the total working time of the UAVs. The parameters and denote the minimum and maximum altitude of the UAVs, respectively. For the optimization problem (9), the constraints in (9b)-(9c) define the feasible 3D space for the locations of the UAVs. The constraint in (9d) corresponds to the definition of load, in which it avoids outages and ensures service for the users in the network.

The problem of 3D deployment of the UAVs can be formulated as a noncooperative game in satisfaction form. In a satisfaction-form game, each UAV is interested in the satisfaction of its constraints. A game in satisfaction form can be described by the where the set of the UAVs, , is considered as the set of players in the game. The set denotes the set of strategies for UAV . We define the set as the movement in different directions: . Let be the strategies of all UAVs except UAV . Here, the correspondence denotes the set of strategies that can satisfy the constraints of UAV given the strategies played by all other UAVs. Therefore, the correspondence can be defined as follows [5]:

(10) |

where is the satisfaction threshold for UAV . According to the observed utility, each UAV updates a satisfaction indicator at time as follows:

(11) |

where is the strategy of UAV at time . For a game in satisfaction form, an important outcome is called satisfaction equilibrium where all players are satisfied, and is not empty for each player . The notion of satisfaction equilibrium can be formulated as a fixed point as follows [5]:

Definition 1 (Satisfaction Equilibrium): A strategy profile is a satisfaction equilibrium if , .

However, for a given satisfaction threshold, some UAVs may be faced with the situations where they can not be satisfied. Therefore, a satisfaction equilibrium may not always exist. In this context, we can reduce the satisfaction threshold for unsatisfied UAVs after a certain time interval. Therefore, we use an adaptive threshold approach described in [5]. To solve the game in a distributed manner, we use a learning algorithm, and assume that each UAV can observe its obtained utility. If the UAV is satisfied with its current utility (i.e.

), the UAV has no incentive to change its location. Otherwise, it may change its location according to the probability distribution

, where is the probability assigned to strategy with . Therefore, each UAV updates the probability assigned to each strategy as follows [5]:(12) |

Here,

(13) |

where is the learning rate of UAV . The parameter is computed as , where is the maximum utility that UAV can achieve. The function denotes the indicator function which equals 1 if event is true and 0, otherwise.

The pseudocode for the proposed approach is presented in Algorithm 1. The algorithm converges to an equilibrium of the game in finite time which is proved in [5, Theorem 2].

Note that the proposed approach offers several distinct advantages. First, since only unsatisfied UAVs need to update their locations, it reduces the complexity, especially for deploying a multitude of UAVs. Second, due to the fact that our proposed algorithm can be executed in a distributed manner, the signaling overhead and exchanged information are expected to be negligible. Furthermore, it can gain the intrinsic advantage of distributed approaches such as improving network’s robustness against failures and attacks.

## Iv Simulation Results

To evaluate the performance of our proposed approach, we consider a hexagonal layout with radius

m. The set of users are uniformly distributed in the area, and one terrestrial BS located in the center of the area. The simulation parameters are summarized in Table

I. Furthermore, we demonstrate the performance gain of our proposed learning based scheme over the following benchmark references: 1) Strategic horizontal method proposed in [11] with fixed altitude m given predefined potential horizontal locations, referred to hereinafter as “strategic”approach. This approach is a heuristic approach, in which the horizontal location of a new UAV is determined from the predefined horizontal locations to achieve the furthest distances from the BSs in the system. 2) Random horizontal placement with fixed altitude

m, referred to hereinafter as “random-fixed altitude” placement algorithm. 3) Random horizontal and altitude placement given and predefined potential horizontal and altitude locations, respectively, referred to hereinafter as “random” approach.System Parameters | ||

Parameter | Value | |

Carrier frequency / Channel bandwidth | GHz/ MHz | |

Noise power spectral density | dBm/Hz | |

Kbps | ||

Learning rate exponent for | ||

m | ||

m, m | ||

Altitude of users | m | |

Altitude of terrestrial BS | m | |

BS Parameters | ||

Parameter | Terrestrial BS | UAV |

Transmit power | dBm | dBm |

Reference path loss
( in GHz) |
LoS:
NLoS: |
LoS:
NLoS: |

Path loss exponent |
LoS:
NLoS: |
LoS:
NLoS: |

Fig. 1 shows the average throughput per BS versus the number of users for a network with UAVs. The figure shows that the learning based UAV placement approach significantly improves the average throughput compared to the benchmark algorithms. For instance, for a network with users, the proposed approach improves the average throughput , , and compared to the strategic, random-fixed altitude, and random approaches, respectively. The main reason is that the UAVs adjust their locations in order to maximize the utility function defined in (7) which comprises their throughput.

Fig. 2 illustrates the average rate per user. We can observe that as the number of users increases, average rate per user decreases due to increasing load and the availability of limited resource in the network. Since the learning based approach optimize the locations of UAVs in terms of maximizing throughput, it improves the average rate of users compared to the benchmark algorithms

In Fig. 3, we show the average number of dropped users per BS. The figure shows that the average number of dropped users increases with an increase in the number of users. This is mainly due to the fact that, a limited resource is available in the network. Therefore, with increasing the number of users, some of them may experience reductions in their rates due to the overloaded BSs. Since the proposed approach improves the average rate compared to the other approaches, it yields better performance in terms of average number of dropped users. The performance gain of the learning based approach compared to the strategic, random-fixed altitude, and random approaches is up to , , and , respectively. Accordingly, the learning approach is more resistant to the higher traffic demand.

Fig. 4 depicts the average number of dropped users as the number of UAVs varies. From the figure, as the number of UAVs increases the average number of dropped users per BS decreases. The main reason is that the users associated to the highly loaded BSs can be offloaded to the lightly loaded BSs. Moreover, Fig. 4 reveals that by using the learning based approach, the number of UAVs required to reduce the number of outage users below a certain threshold is significantly lower. For instance, using learning based method only UAVs are required for less than users in outage, which is notably lower than UAVs needed in the other methods.

## V Conclusion

In this paper, we have proposed a low-complexity robust algorithm for the 3D placement of UAVs integrated into terrestrial cellular networks. The proposed approach leverages the tools from game theory and machine learning to learn optimal locations of UAVs in a distributed manner. Our results have shown that the proposed approach significantly outperforms the benchmark algorithms in terms of both throughput and dropped users. We have also shown that the number of UAVs for serving ground users using the learning based approach is significantly lower than the benchmark algorithms resulting in a more cost-effective networking with UAVs.

Comments

There are no comments yet.