Reinforcement Learning-Based Trajectory Design for the Drone Base Stations
In this paper, the trajectory optimization problem for a multi-unmanned aerial vehicle (UAV) communication network is investigated. The objective is to find the trajectory of the UAVs so that the sum-rate of the users served by each UAV is maximized. To reach this goal, along with the optimal trajectory design, optimal power and sub-channel allocation is also of great importance to support the users with the highest possible data rates. To solve this complicated problem, we divide it into two sub-problems: UAV trajectory optimization sub-problem, and joint power and sub-channel assignment sub-problem. Then, based on the Q-learning method, we develop a distributed algorithm which solves these sub-problems efficiently, and does not need significant amount of information exchange between the UAVs and the core network. Simulation results show that although Q-learning is a model-free reinforcement learning technique, it has a remarkable capability to train the UAVs to optimize their trajectories based on the received reward signals, which carry decent information from the topology of the network.
READ FULL TEXT