Nowadays, user equipments (UEs) such as smart phones, tablets, wearable devices and other Internet of smart things are becoming increasingly popular and bringing huge convenience to our daily life. Moreover, many emerging mobile applications (e.g., augmented reality, smart navigation and interactive service) are receiving more and more attention but most of those applications are resource intensive, which makes the UEs very difficult to execute them, due to limited battery and computation resource (e.g. CPU, storage or memory) in UEs.
Fortunately, mobile edge computing (MEC) has recently been proposed as a means to enable UEs with intensive computational tasks to offload them to the edge cloud, which can not only prolong the battery life of UEs, but also increase UEs’ computational capacity. Offloading decision making and resource allocation have been studied in [1, 2], while MEC with Cloud Radio Access Network (C-RAN) has been investigated in [3, 4, 5]. The above works either consider there is only one MEC (e.g., [1, 7]), or consider the MECs have fixed location (e.g., [8, 3]), which may not be practical in some scenarios. For instance, the single MEC is normally resource-limited and may not be able to meet the requirement of all the UEs at the same time. Also, MEC with fixed location lacks flexibility and may not be suitable to the cases where the number and the requirement of UEs keep changing.
Unmanned aerial vehicle (UAV), due to the features of low cost, high flexibility and easy to deployment, have recently attracted much attention in wireless communication, e.g., serving as base station  or mobile relays . UAV enabled MEC (e.g., ) have been proposed by integrating MEC server to UAVs (i.e., UAVE), to provide computing resource to ground UEs. Compared with the traditional fixed location MEC, UAVE is of particular interest to the scenario such as 1) temporary events (i.e., in case of a large number of people gathering in the ground celebrating a big event or watching football match); 2) emergency situations (i.e., in case of earthquake and the infrastructure may be destroyed or temporary unavailable) or other on-demand services. However, the operation of UAVE faces many challenges, two of which are how to achieve 1) the association between multiple UEs and UAVs and 2) the resource allocation from the UAVs to the UEs, while meeting the quality of service (QoS) and minimizing the whole energy consumption for all the UEs.
To address these challenges, we formulate above problem into a mixed integer non-linear programming (MINLP), which is very difficult to be addressed in general, especially in the large-scale scenario (e.g., when there is a large number of UEs in the ground waiting to be served). We then propose a Reinforcement Learning-based user Association and resource Allocation (RLAA) algorithm to deal with this problem efficiently and effectively. Numerical results show that the proposed RLAA can achieve the optimal performance compared to the exhaustive search in small scale, and have considerable performance gain over other typical algorithms in large-scale scenarios.
The rest of the paper is organized as follows. We show the system model and the optimization problem in Section II. Then, our proposed RLAA algorithm is introduced in Section III. The simulation result is given in Section IV, followed by the conclusion remarks in Section V.
Ii System Model
As shown in Fig. 1, we consider there are UEs, each of which has a computation-intensive task to be executed. Also, we consider there are UAVs deployed as the MEC platform, flying in a circle with radius
. Define a new vectorto denote the possible place where the tasks from ground UEs can be executed at, in which denotes that UE conducts task itself without offloading. Similar to , we assume that the -th UAV’s flight period can be discretized into time slots. Define a new vector to denote the possible time slots when the tasks from ground UEs can be executed at, in which denotes that UE conducts task itself. Also we assume that the UAV’s location change within each time slot can be ignored, compared to the distances from the UAV to all UEs.
Denote the coordinate of the -th UAV at -th time slot as and the coordinate of the -th UE as .
Similar to , assume -th UE has a computational intensive task to be executed as
where denotes the total number of CPU cycles required to complete this task and denotes the amount of data needed to be transmitted to UAV if deciding to offload, in which and can be obtained by using the approaches provided in . Assume that each UE can decide either to execute the task locally or choose to offload to one of the UAVs in one time slot and also assume that the task can be completed in this time slot. Similar to , we do not consider the time for returning the results back to UE from UAV. Thus, one can have
where , , denotes that the -th UE choose the -th UAV in the -th time slot to offload, while , , denotes that -th UE execute the task itself and otherwise, . Note that , if and only if .
Also, assume that the -th () UAV can serve more than one UE in each time slot and this task has to be completed either via offloading or local execution. Therefore, one can have
Ii-a Task Offloading
In offloading scenario, we assume the horizontal distance between -th UE and the -th UAV in -th time slot as
Then, the offloading data rate can be given by
where is denoted as the channel bandwidth, as the transmission power of the -th UE, =, 2.2846, as the channel power gain at the reference distance 1 and as the noise power .
Also, one can see that the time to offload the data from -th UE to the -th UAV in -th time interval can be given as
Also, the time to execute the task can be expressed as
where is the computation resource that the th UAV could provide to the -th UE. Then, we can have the total time consumption as
Moreover, the total energy consumption of the -th UE to the -th UAV in -th time slot can be given as
Similar to , we assume each UAV in every time slot can only accept limited amount of offloaded task. Then, one has
where is the maximal number of UEs that each UAV can accept in each time slot.
Ii-B Local Execution
If the UE decides to execute the task locally, the power consumption for the -th UE can be given by , where , , and 0 is the effective switched capacitance and can be normally to 3 . Then, the local execution time can be given by , (, ) and then, the total energy consumption can be given as .
Ii-C Problem Formulation
Then, one can have the energy consumption of each UE as
Also, the total time spent to complete each task can be expressed as
One can assume that the maximal computation resource which the -th UAV can provide is as . Then, one can have
Also, as the task normally has to be completed in certain amount of the time and thus without loss of generality, we assume the task must be completed in time without loss of generality. In our paper, assume all the transmitting and computing process for each task must be completed within one time interval , Then, we have
Denote = , = . Then, one can have
Note that the above problem is a MINLP problem, which is difficult to be solved optimally in general. Some existing algorithms like exhaustive search or branch and bound algorithm may solve this problem, but with prohibitive complexity. Therefore, in this paper, we aim to obtain an efficient solution to solve this problem. To this end, we propose the RLAA algorithm to deal with effectively and efficiently.
Iii Proposed Algorithm
In this section, we show our proposed RLAA algorithm. First, we introduce three important elements in RLAA (i.e., actions, states, and reward functions).
Actions: At each episode , each UE takes an action. If the UE decides to offload the task to the -th UAV in -th time interval, the action is denoted as , . If UE decides to execute the task locally, the action is as . Then, one can define the collection of actions as follows:
For above offloading action , , the minimal computation resources of the -th UE can be given by
For local execution action , the minimal computation resources of the -th UE is given as
Note that not all actions can guarantee that the task can be completed within one time interval, as the available computation resources may be less than the minimal computation resources (i.e., in (17) and (18)). Similarly, the communication resource can also not be guaranteed (i.e., in (15e)). Therefore we may remove some actions in , resulting in the collection of feasible actions for the -th UE as .
States: Then, we define the states as follows:
where represents the decision of the -th UE. Specifically, if the -th UE offloads the task to the -th UAV in -th time interval, we assign action to state . It is worth mentioning that if the -th UE decides to execute the task locally, we assign action to state .
Reward Functions: We define the reward function as
The above proposed reward function can keep reducing the energy consumption of each UE and may finally achieve the minimization of the energy consumption of all UEs.
Then, we present RLAA in Algorithm 1. In the beginning, states is initialized. The -table is also initialized, which is used to record every state and action (i.e., line 1 in Algorithm 1). At each episode, we obtain the collection of the actions for the -th UE. Then, according to the -greedy policy , the
-th UE either chooses a random action with probabilityor follows the greedy policy with probability , which is expresses as
where is an action randomly selected from
, rand(0,1) denotes a random number uniformly distributed over the interval [0,1] (i.e., line 4 - line 8 in Algorithm1).
Then, the resource allocation is conducted for the -th UE (i.e., line 9 in Algorithm 1). If the -th UE offload the task to the -th UAV in -th time slot, the minimal computation resource in (17) is allocated. If the -th UE execute task locally, the minimal computation resource in (18) is allocated. Based on the proposed reward function in (20), the -th UE can then obtain a reward (i.e., line 10 in Algorithm 1).
Next, we update the -table (line 11), where the updating rule of -table is given as
where is the reward decay over the interval [0,1], is the learning rate over the interval [0,1], and is the next state. Also, states is updated based on action . Specifically, we assign action , to state .
The above process will be repeated until the maximum episode () is reached. Finally, each UE selects an action according to -table (line 16). Specifically, for the -th UE, the action in corresponding to the largest value of -table is selected.
Iv Simulation Results
|Radius for all UAVs||800 m|
|Flying height for all UAVs||350 m|
|Transmission power||1 W|
|Channel power gain||1.42|
|Data Size|| KB|
|Execution task|| cycles|
|Time duration||1 s|
|Location of UEs||m|
|-greedy policy probability|
|for all UEs|
|for all UEs||3|
|for all UAVs||12|
In this section, the simulation for the proposed multi-UAV enabled MEC system is conducted, where the parameters of the tests are shown in Table. I, in which the channel bandwidth is set to = 1 MHz, the noise variance is set to = dbm/Hz, the channel power gain at the reference distance 1 is set to = , the transmission power is set to 1 W, the time interval is set to 1 s, the is set to for all the UEs. Also, we assume each UAV can support = 150 UEs in one time slot. All UEs are assumed to be randomly distributed in a rectangle area of coordinates m. We randomly select the data size of each task from the interval of KB and select from the interval of cycles.
In order to evaluate the performance of our proposed RLAA, the following four algorithms are used as comparison algorithms.
Exhaustive search (ES): We examine all the possibilities, with the objective of minimizing the overall energy consumption for all the UEs.
Local execution (LE): We assume all tasks are executed locally and there are no offloading.
Random offloading (RO): Each UE randomly selects the UAV and the time slot to offload its task.
Greedy offloading (GO): Each UE selects the nearest UAV to offload its task. If the UAV is overloaded (i.e., is violated), then selects the second nearest UAV to offload and so on.
Firstly, we compare the performance of RLAA with its four compared algorithms on a set of small scale instances (i.e., the number of UEs ranges from 3 to 7). We assume that there are two UAVs flying in circles with the same radius and the center coordinates of two UAVs are set to and , respectively. From Fig. 2, one can see that RLAA has the same performance as ES, both of which can achieve the minimal enery consumption. Also, one can see that GO achieves better performance than RO, whereas LE achieve the worst performance for all examined values. This is because that our proposed RLAA can choose most energy-efficient action for all the UEs according to computation and communication requirement, while others either make UE to execute all the task locally (i.e., LE), or randomly offload the tasks (i.e., RO), or just find the nearest UAV (i.e., GO), resulting in worse performance.
Next, we compare the performance of RLAA with LE, RO and GO on a set of large scale instances, where the number of UEs is increased to 1001000. The number of the UAVs is set to 3, where the center coordinates are , and , respectively. Note that we do not examine ES here, due to its prohibitive complexity. From Fig. 3, one can see that our proposed RLAA still performs best, followed by GO, RO and LE, as expected.
In Fig. 4, we further increase the number of UAVs to 5, where the center coordinates are set to as , , , and , respectively. One sees that our proposed RLAA still outperforms other compared algorithms, with significant amount of energy being saved for all the UEs.
In this paper, we studied a multi-UAV enabled MEC system, in which the UAVs are assumed to fly in circles over the ground UEs to provide the computation services. The proposed problem is formulated as a MINLP, which is hard to deal with in general. We propose a RLAA algorithm to address it effectively. Simulation results show that RLAA can achieve the same performance as the exhaustive search in small scale cases, whereas in large case scenario, RLAA still have considerate performance gain over other traditional approaches.
This work was supported in part by the Zhongshan City Team Project (Grant No. 180809162197874), National Natural Science Foundation of China (Grant No. 61620106011 and 61572389) and UK EPSRC NIRVANA project (Grant No. EP/L026031/1).
-  X. Lyu, H. Tian, W. Ni, Y. Zhang, P. Zhang, and R. P. Liu, “Energy-Efficient Admission of Delay-Sensitive Tasks for Mobile Edge Computing,” IEEE Transactions on Communications, vol. 66, no. 6, pp. 2603–2616, June 2018.
-  K. Yang, S. Ou, and H. Chen, “On effective offloading services for resource-constrained mobile devices running heavier mobile internet applications,” IEEE Communications Magazine, vol. 46, no. 1, pp. 56–63, January 2008.
-  L. Zhang, K. Wang, D. Xuan, and K. Yang, “Optimal Task Allocation in Near-Far Computing Enhanced C-RAN for Wireless Big Data Processing,” IEEE Wireless Communications, vol. 25, no. 1, pp. 50–55, February 2018.
-  H. Mei, K. Wang, and K. Yang, “Multi-layer cloud-ran with cooperative resource allocations for low-latency computing and communication services,” IEEE Access, vol. 5, pp. 19 023–19 032, 2017.
-  X. Wang, K. Wang, S. Wu, S. Di, K. Yang, and H. Jin, “Dynamic resource scheduling in cloud radio access network with mobile cloud computing,” in 2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS), June 2016, pp. 1–6.
-  Q. Wu and R. Zhang, “Common throughput maximization in uav-enabled ofdma systems with delay consideration,” IEEE Transactions on Communications, vol. 66, no. 12, pp. 6614–6627, Dec 2018.
-  X. Chen, “Decentralized Computation Offloading Game for Mobile Cloud Computing,” IEEE Transactions on Parallel and Distributed Systems, vol. 26, no. 4, pp. 974–983, April 2015.
-  X. Wang, K. Wang, S. Wu, S. Di, H. Jin, K. Yang, and S. Ou, “Dynamic Resource Scheduling in Mobile Edge Cloud with Cloud Radio Access Network,” IEEE Transactions on Parallel and Distributed Systems, vol. 29, no. 11, pp. 2429–2445, Nov 2018.
-  J. Lyu, Y. Zeng, R. Zhang, and T. J. Lim, “Placement Optimization of UAV-Mounted Mobile Base Stations,” IEEE Communications Letters, vol. 21, no. 3, pp. 604–607, March 2017.
-  Y. Chen, N. Zhao, Z. Ding, and M. Alouini, “Multiple UAVs as Relays: Multi-Hop Single Link Versus Multiple Dual-Hop Links,” IEEE Transactions on Wireless Communications, vol. 17, no. 9, pp. 6348–6359, Sept 2018.
-  L. Yang, J. Cao, S. Tang, T. Li, and A. T. S. Chan, “A Framework for Partitioning and Execution of Data Stream Applications in Mobile Cloud Computing,” in 2012 IEEE Fifth International Conference on Cloud Computing, June 2012, pp. 794–802.
-  H. He, S. Zhang, Y. Zeng, and R. Zhang, “Joint Altitude and Beamwidth Optimization for UAV-Enabled Multiuser Communications,” IEEE Communications Letters, vol. 22, no. 2, pp. 344–347, Feb 2018.
-  V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.