Recently, a tremendous number of mobile smart devices, such as autonomous vehicles, wearable devices and smartphones have been extensively employed in people’s daily life. These devices enables various IoT applications, such as autonomous driving, smart health, smart city and smart home. Owing to the high volume and fast velocity of data streams generated by mobile IoT devices, the cloud can be utilized to provision flexible computation and storage resources for these IoT devices . However, since the the data source is far away from the cloud and the data streams have to go through the Internet before being transmitted to the cloud, the transmission delay of IoT tasks may be unbearable for some delay sensitive applications such as autonomous driving and augmented reality. To tackle this problem, the fog computing is introduced to place computation resources at gateways and thus processes IoT tasks at the network edge, which significantly reduces the transmission delay of IoT tasks [2, 3]
. Due to the complex network, intelligent fog network leveraging machine learning methods (i.e., consisting both deep learning and reinforcement learning)[4, 5, 6] is promising to learn the network features and thus effectively manage the network resources.
In fog-assisted mobile IoT networks, the task delay consists of both the wireless transmission delay and computing delay and thus is impacted by the resource allocation in both the wireless channel and fog node. As tasks are generated dynamically, the optimal decision on radio resource allocation requires the complete network information such as available bandwidth, channel conditions of IoT devices, and the traffic sizes of all tasks. The real time radio resource allocation for different IoT tasks are coupled with each other owing to the limited bandwidth of the system. Specifically, more bandwidth allocated for the current task deprives the bandwidth for the following tasks. However, it is challenging to obtain the future information such as the channel condition and task information in advance. In this case, optimizing the radio resource allocation based on the complete network information is impossible, and thus the online algorithm based on the current network information in absence of further information is required to obtain the sub-optimal solution. Similarly, at the side of a fog node, the computation resource allocated for current task will also affect the computation resources for future tasks. Meanwhile, due to the QoS requirement of IoT tasks (i.e., in terms of maximum allowed task delay), the radio resource allocation and computation resource allocation are coupled with each other for each task. In other words, a task allocated with more bandwidth owing to its desirable channel condition can be provisioned with less computation resource, thus saving computation resources for other devices with poor channel conditions.
To solve this problem, we propose a delay-aware online resource allocation algorithm based on reinforcement learning to allocate radio and computation resources for IoT tasks to reduce their task delay. Our contribution can be summarized as follows:
We formulate the resource allocation problem to minimize the delay of IoT tasks. In this paper, both radio and computation resource allocation are taken into account to improve the performance of IoT tasks. Specifically, the radio resource of a task is impacted by the wireless channel conditions, data size of a task, and available radio resource of the system. Similarly, the computation resource allocation is affected by the computation intensity and available computation resource of the system. In addition, the resource allocation for different tasks are coupled with each other due to limited radio and communications resource of the system. For a task, the radio resource and computation resource also affect each other due to the QoS constraint of each task.
To solve the resource allocation problem for IoT tasks, we design an online resource allocation algorithm based on reinforcement learning. We modeled the problem based on the Markov Decision Process (MDP) model and define a joint action that decides the radio and computation decision for each task simultaneously. Considering the large action space and state space of the problem, we employ the actor-critic approach of reinforcement methods to efficiently train the model and allocate resources for each task without the future task information. The performance of the designed algorithm has been verified by extensive simulations.
The remainder of this paper is organized as follows. In Section II, we briefly review related works. In Section III, we illustrate the fog-assisted mobile IoT network and introduce the system model. In Section IV, we formulate and analyze the resource allocation problem for IoT tasks. In Section V, the resource allocation algorithm based on reinforcement learning is proposed to obtain the suboptimal solution of the above problem. Section VI shows the simulation results, and concluding remarks are presented in Section VII.
Ii Related Works
Fog computing is promising to provide low latency service for IoT tasks, owing to its proximity to IoT devices. As workload distribution in the network is spatially and temporally dynamic, some studies have focused on workload allocation in fog computing, especially for delay sensitive application such as autonomous vehicle, augmented reality [7, 8, 9]. Zeng et al.  jointly optimized the task scheduling and image placement to improve the task delay in fog networks. Fan and Ansari  designed a workload allocation scheme based on the different cloudlet capacities in a hierarchical cloudlet network to minimize task delay, where the wireless transmission delay is neglected. Jia et al.  investigated to place cloudlets in the network and balance the workload among distributed cloudlets and thus reduce the task delay, where the radio resource allocation is ignored. Fan et al.  investigated to migrate virtual machines from green energy deprived cloudlets to green energy overprovisioning cloudlets to fully utilize the green energy in the network. However, all these works emphasize utilizing workload allocation among edge servers to enhance the user experience or energy efficiency of the network instead of focusing on resource allocation [13, 14].
Some researchers also considered the computation resource allocation or radio resource allocation in fog-assisted IoT network to further enhance the network performance. Tong et al.  investigated the cloudlet selection and computation resource allocation for tasks in hierarchical cloudlet network, instead of radio resource allocation. To solve this problem, they decomposed the primary problem into sub-problems: cloudlet selection and computation resource allocation, and solved them subsequently. Fan et al.  proposed to offload each application’s workloads among different cloudlets and allocate computation resources of each cloudlet to different types of tasks based on its workload; however, they neglected the radio resource allocation. Tran et al.  proposed a task offloading and resource allocation scheme in mobile edge computing to maximize the offloading gains in terms of both the delay reduction and energy reduction. In this work, the joint problem is decomposed into two subproblems, and thus the authors make the task offloading decision and allocate computation resource of each edge server for user tasks. Lyu et al. 
proposed a heuristic algorithm to allocate computation resources to the offloaded tasks. Since the each user accesses one wireless channel, the radio resource allocation is neglected. In addition, other researchers emphasized the radio resource allocation instead. Dabet al. designed a new joint task assignment and radio resource allocation scheme in the WiFi-based mobile edge computing. The objective of the work is to reduce the energy consumption of users while satisfying QoS requirement. Zhao et al.  employed multi-agent reinforcement learning algorithm to jointly associate users to base stations and allocate channels to users thus achieving the maximum long-term overall network utility.
Most existing works assume that the workload of the network is given in advance, and optimize the network performance (e.g., the average task delay within a long period) based on the global network information. However, as the future task information and network status are usually hard to predict, it is impractical to allocate optimal resources for the arriving tasks in real time based on the global information. On the other hand, IoT task delay is impacted by both the radio and computation resource allocation as it is composed of both the transmission delay and computing delay. However, few works have paid attention on the radio resource allocation and computation resource allocation simultaneously in real time, and this issue remains an open challenge. Therefore, we propose an online resource allocation algorithm to enhance the task delay, where both the radio resource and computation resource are taken into account. In our scheme, the resource allocations of the current task and future task are coupled with each other while the radio resource allocation is related to the computation resource allocation for an individual task. Different from other papers that continuously allocate radio or computation resource to tasks, we also consider the granularity of these resources to make it more applicable to the realistic network. As the wireless channel of a mobile IoT device is time-varying as well as the fog node status, the resource allocation decision should be determined based on different wireless channel conditions, fog node status and task information.
Iii System Model
A fog-assisted mobile IoT network has been illustrated in Fig. 1. In this paper, we employ the cellular network as our IoT network infrastructure with base stations (BSs) acting as IoT gateways (GWs) to provision communications service for IoT devices. Each GW is equipped with a fog node to provide computation and storage resources at the network edge . The fog node is responsible for making resource allocation decisions in real time based on the network status. The GW can detect the wireless conditions towards IoT devices and send them to the fog node. Based on this fog-assisted mobile IoT network, tasks of IoT devices can be transferred to their GW and processed by the corresponding fog node. Generally, each mobile IoT device may visit several locations based on a certain route. At each location, it collects data and transfers IoT tasks to the fog node for processing . Owing to the mobility of IoT devices, their channel conditions are time-varying. Meanwhile, as IoT tasks are generated at different time, the fog node status keeps changing with the time-varying workload. As the task delay consists of both the transmission delay and computing delay, it is impacted by both the radio and computation resource allocation in the network. If an IoT device has bad channel condition while the available bandwidth is insufficient, it requires more computation resource to ensure low task delay; otherwise, it can be allocated with more bandwidth while saving computation resource for other IoT devices. Note that both the radio and computation resource cannot be continuously allocated for tasks in practical engineering, and thus the basic granularity of radio and computation resource are denoted as (in Hz)  and (in CPU cycle/s) , respectively. Accordingly, we define a resource block as the granularity of radio resource (i.e., ) and a computation unit as the granularity of computation resource (i.e., ).
In this paper, we denote as the set of all IoT tasks and as the index of an IoT task within . Denote as the number of resource blocks allocated to task , and as the number of computation units allocated to task . Hence, the radio and computation resource allocated to task becomes and , respectively. The key notations used in this paper are listed in Table I.
|Number of resource blocks for a task|
|Number of computation units for a task|
|Computation intensity of task||CPU cycle/bit|
|Data size of task||bits|
|Computation size of task||CPU cycles|
|Computation unit||CPU cycles/s|
|Task delay of task|
|Transmission delay of task|
|Computing delay of task|
|Maximum number of resource blocks of the system|
|Maximum number of computation units of the system|
Iii-a Transmission Delay
In order to process IoT tasks at a fog node, an IoT device has to transmit its tasks to the GW via uplink communications. The wireless uplink rate is mainly dependent on the wireless channel condition and the allocated radio resource. After the fog node processes a task, it needs to feedback the processing results to the corresponding IoT device. However, since the processing results are much smaller than IoT tasks and have high data rate in the wireless downlink channel, the downlink delays of the results have been neglected . In this paper, we just focus on the uplink communications of IoT devices.
Denote as the transmission power of the IoT device with task , as the channel gain between the IoT device and GW, as the noise power. The frequency efficiency of the IoT device can be derived according to the Shannon Hartley theorem  as follows:
Hence, as the allocated radio resource is , the uplink data rate can be expressed as
Given the data size of task , the transmission delay of task can be expressed as
Iii-B Computing Delay
The computing delay of task depends on the allocated computation resource and the computation size of task . As the computation intensity of task is denoted as (CPU cycle/ bit), the computation size of task is a function of its data size and can be expressed as . Therefore, the computing delay of task can be derived as
Aggregating both the transmission delay and computing delay, we can derive the task delay of task as
Iv Problem Formulation
The task delay will be affected by different factors, such as channel condition, the available radio and communications resource of the network, and computation intensity. First, if a task has bad channel condition, it is preferable to be allocated with less radio resource, and thus more radio resources can be allocated to other tasks with the desirable channel conditions. Therefore, the high spectrum efficiency of the network will significantly improve the task delay of all tasks. Second, the resources (i.e., either radio or computation resource) allocated for different tasks are coupled with each other. For example, if task A obtains a large number of resource blocks, the system may not have sufficient resource blocks for the following task B even if task B has better channel conditions than task A. Third, if the remaining radio resource is insufficient and incurs a high transmission delay for a task, the fog node is forced to allocated more computation resources to the task to meet the QoS requirement. Forth, the computation resource allocation is also impacted by the heterogeneous computation intensities of tasks. The main goal of this paper is to minimize the task delay of IoT tasks offloaded by IoT devices, while satisfying the QoS requirement of each task. Thus, we can formulate the resource allocation problem as follows:
Here, is the QoS requirement of a task in terms of maximum allowed task delay. Constraint (7) ensures each task to satisfy the QoS requirement. Constraint (8) imposes that the total utilized resource blocks to be no more than the maximum number of resource blocks of the system. Constraint (9) imposes the total utilized computation resources to be no more than the capacity of a fog node.
Optimizing the resource allocation requires the complete task information. However, the complete future task information is difficult to predict in advance, and thus it is impractical to obtain the optimal solution with the existing network status. On the other hand, even if the complete task information is provided, the above problem is an integer non-linear problem and thus is challenging to solve. To obtain the optimal resource allocation decision, a brute-force search leads to iterations where represents the total number of tasks. The computational complexity of the brute-force search increases exponentially with respect to the total number of tasks. Hence, optimizing the resource allocation in real time becomes impractical, especially for a large scale network .
V The Resource Allocation Algorithm
Due to the unawareness of future task information and high complexity of P1, we hence design an Online Resource Allocation algorithm (ORA) based on reinforcement learning to efficiently solve the above problem in real time. Essentially, ORA learns the environment over many epochs, in each of which it takes actions for many steps (i.e., for task arrivals) to maximize the reward of the system.
In the network, the amount of available radio and computation resource is impacted by different events such as the arrival and departure of an IoT task. When an IoT task arrives, the system has to make a decision to allocate both the radio resource and computation resource to process the task. Meanwhile, when the task departs the system after task processing, the system just updates the available resources accordingly without making any decision . Through the resource allocation decision, the system can significantly improve a reward that depends on the QoS of tasks.
To solve P1, we further employ a Markov Decision Process (MDP) model to formulate the problem. A MDP can be represented as a four-dimensional tuple , where is the set of all possible states, is the set of all possible actions, is the state transition function mapping from a state and an action to the next state, and is the reward function measuring the benefit of selecting an specific action under a given state .
In this paper, a state stands for the set consisting of the remaining radio resource, the remaining computation resource, data size of the arriving task, and the computation size of the arriving task. Once a task arrives, the action of an agent reflects both the radio resource and computation resource allocated to the task, and thus is defined as joint action. Note that the state and joint action are denoted as and , respectively. Since the goal of this paper is to minimize the task delay, the reward of the joint action is defined as , where is the delay of the task. Essentially, with the arrival of a task, we need to select a joint action based on current state, and thus enhance the reward of the system.
In ORA, the edge server servers as an agent that iteratively learns to make a right decision to react to the current state, i.e., trying to find an optimal policy, , in terms of maximizing a discounted future reward , where is the time horizon, is the immediate reward at time , and is a discount factor. In this paper, due to the large action space of the joint action , we employ the actor-critic approach of reinforcement learning with high computational efficiency to achieve the policy 
, where the agent is equipped with two neural networks: actor network and critical network. Note that the actor-critic approach is a combination of Q-learning algorithm and policy gradient algorithm.
Q-learning is a family of value-based reinforcement learning algorithms, which estimate the action-value functionunder the policy . The action-value function can be derived through the well-known Bellman function . Since the exact form of action-value function can be extremely difficult to obtain in practice, we generally parameterize as using a deep neural network, where is the network parameters. The action-value function corresponding to the optimal policy can be obtained by minimizing the loss
where and is the experience buffer. The optimal policy can be written as .
V-2 Policy Gradient
Differing from value-based reinforcement learning paradigm, policy gradient algorithms directly parameterize the policy as
, which represents the probability of choosing actionunder a given state . The parameter is updated to maximize the objective , where is a value function to measure how good the action is. Then, the policy can be optimized by adjusting the parameters along the direction of policy gradient
Different definitions of lead to different algorithms. For example, REINFORCE algorithm simply uses a sample return as the value function. On the other hand, using the action-value function defined for Q-learning as the value function results in actor-critic
algorithms, which have the advantage of ameliorating variance during training. In practice, the action-value functionis usually replaced by an advantage function , where is a state-related baseline to further mitigate variance and accelerate training. Actor-critic algorithms combines the merits of Q-learning and policy gradient, and it is very popular in recent years.
By combining Q-learning with policy gradient, we employ actor-critic to allocate radio and computation resources for tasks in real time. Specifically, in actor-critic, an agent is equipped with two neural networks, namely actor network and critic network. When a task is generated at the mobile device, the actor network takes the state input , where is the number of remaining resource blocks, is the number of remaining computation units, is the data size, and is the computation size. By forwarding the state , the actor network outputs two category distributions and , where , , and is the parameters of the actor network. The policy is then denoted as , which gives the probability of choosing the joint action . According to the two distributions, the actor selects a joint action , where is the number of allocated radio resource blocks, and is the number of allocated computation units. The corresponding reward of the joint action is given by , where is the task delay of the task. The critic network takes the state as input and generates a state-value , where is the network parameter, to estimate the expected future reward starting from state . Then, an advantage can be calculated as , which measures how the joint action performs compared to our expectation.
The actor is trying to select an joint action with larger expected advantage, so that it updates the network parameters to maximize
which results in the gradient direction
To estimate a more accurate state-value, the critic will minimize the Euclidean norm between and , and it leads to the gradient direction
The actor network and the critic network will be updated alternatively to maximize the expected future reward. We will update the two networks in each epoch until the predefined number of epochs is reached. The detailed procedure of the ORA algorithm is shown in Algorithm 1.
Computational complexity. We further analyze the complexity of the designed algorithm. The number of iterations (form Line 1 to Line 10) is determined by the number of epochs (denoted as ). The loop from Line 3 to Line 7 are executed for times (i.e., equal to the number of IoT tasks), where the complexity of each time can be expressed as . In addition, the complexity of Line 9 is related to the batch size and thus can be expressed as . Therefore, the designed algorithm yields a computational complexity of , and thus can achieve a solution in polynomial time.
Vi Numerical Results
In this section, we have set up simulations to verify the performance of the designed algorithm. To further validate the performance of the designed ORA algorithm, we also select two existing algorithms as baselines: Computation-only and Transmission-only. We utilize the Computation-only algorithm inspired by  for comparison, which focuses on the computation resource allocation based on reinforcement learning, while the radio resource of the system is averagely allocated to tasks in each second, i.e., each task has the same radio resource. Meanwhile, Transmission-only focuses on the radio resource allocation by reinforcement learning, while the total computation resource of the system is averagely allocated to tasks in one second.
In the simulation, we consider an area of 1
, i.e, the coverage area of a GW. There are 50 locations uniformly distributed in the network, where mobile IoT devices visit and offload IoT tasks to the fog node for task processing. Note that each mobile IoT device may select 5 locations and visit them, where the user mobility pattern does not affect the problem since we just consider that the IoT device offloads tasks when stopping at a location. The total number of tasks over all locations is 500, and they are randomly generated among these 50 locations within a time duration of 50s. For the channel model, we employ the wireless path loss model, i.e., 128.1+37.6 from 3GPP specification , where
is the distance in km. The data sizes of tasks are chosen according to the Normal distribution with an average of 1 Mbits and a variance of 0.3 Mbits, i.e.,. The computation intensity for different tasks is chosen based on (CPU cycle/bit). The QoS requirement is 1 s. Note that if the system does not have enough available resources for a task to satisfy the QoS requirement, we assume the task is dropped and the corresponding task delay is set to be 10 s. The remaining parameters are summarized as Table II.
|Number of IoT tasks||500|
|Data sizes of tasks||bits.|
|Computation intensity of tasks||CPU cycle/bit|
|Computation capacity of a fog||CPU cycle/s|
|System bandwidth||5 MHz|
|Radio resource block||180 kHz|
|Computation granularity||CPU cycle/s|
|Transmission power of IoT device||200 mW|
|Noise power||-104 dBm|
|Path loss model||( in km)|
|QoS constraint||1 s|
Fig. 2 shows how the task delay changes in different epochs. After leaning for a certain number of epochs, the performance does become relatively stable. Meanwhile, we have investigated the impact of the total number of tasks on the average task delay. As shown in Fig. 3, with the increase of the number of tasks, the task delay of all these three algorithms also increase accordingly. Note that when the number of tasks is small, ORA has the similar task delay with two other algorithms. This is attributed to the fact that all these algorithms have sufficient resources for the arriving task and thus incur a low task delay. However, when the number of tasks becomes large, ORA yields a significantly lower task delay than two other algorithms. Since ORA can learn to dynamically allocate radio and computation resources to each task in real time, it can provision more resources to the current task without significantly degrading the delay of future tasks. In contrast, two other algorithms cannot provision enough resources for other tasks after allocating too many resources to the current tasks, thus degrading the average delay of all tasks.
We further investigate the impact of the total number of tasks on the average transmission delay. Fig. 4 shows that the designed algorithm has lower transmission delay than two other algorithms, as the number of tasks increases. Meanwhile, the Transmission-only algorithm has a lower delay than the Computation-only algorithm. As ORA dynamically allocates resources to each task based on the data sizes of tasks and the remaining ratio and computation resource without significantly devastating the performance of the future tasks, it can provision low delay service for tasks. As we know, Transmission-only dynamically allocates radio resources to tasks while provisioning the fixed computation resource for each task, and thus the computing delay becomes a bottleneck. Thus, it has to allocate much more radio resources for some tasks with high computing delay to impose its task delay to meet the QoS constraint, which directly sacrifices the remaining radio resources for other tasks. Therefore, the transmission delay of Transmission-only is higher that that of ORA. On the other hand, while Transmission-only dynamically allocates radio resource to tasks based on their channel conditions and data sizes, Computation-only offers fixed radio resources to tasks and thus incurs a higher transmission delay.
We also study the impact of the total number of tasks on the average computing delay. Fig. 5 shows that computing delay of ORA is much lower than those of other algorithms when the total number of tasks changes. It is attributed to the fact that ORA considers the current state information such as the channel condition of the IoT device, the data size and computation size of the arriving task, the available radio resource and computation resource of the system. Thus, it can dynamically and fully utilize radio and computation resources to reduce the transmission delay and computing delay. In contrast, the computation resource allocation of Computation-only is affected by its high transmission delay because some tasks with high transmission delay must be allocated with more computation resources to satisfy their QoS requirements. For Transmission-only, since all tasks have the fixed computation resource, it has a higher computing delay than ORA. In addition, we can see that Computation-only has lower computation delay than Transmission-only when the number of tasks is small, and then its computation delay degrades gradually when the number of tasks increases. With small workload, the system have sufficient radio and computation resources for all tasks, and thus Computation-only can dynamically allocates more computation resources to different tasks based on their computation sizes while Transmission-only allocates a fixed computation resource to each task. However, when the workload increases, the performance of Computation-only becomes worse than that of Transmission-only. This is because the tasks in Computation-only are constrained by their fixed radio resources even if they have good channel conditions, and thus incurs a high transmission delay. In this case, Computation-only needs to allocate much more computation resources to these tasks to meet their QoS requirement, thus the remaining computation resources for other tasks are insufficient. As a result, Computation-only has a higher computation delay than Transmission-only when the workload becomes heavy.
As shown in Fig. 6, we have studied the impact of average data size of tasks on task delay. It can be seen that the task delay of all these algorithms increases when the average data size increases given the number of tasks (=500). Meanwhile, ORA always has a significantly lower task delay than two other algorithms. It is attributed to the fact that ORA can dynamically adjust the radio and computation resource allocation when the average data size increases, and thus keeps a lower task delay as compared to other algorithms. In contrast, when the average data size increases, the transmission delay becomes a bottleneck for Computation-only while the computing delay is the bottleneck for Transmission-only.
Fig. 7 illustrates how the task delay changes when the average computation intensity increases. We can see that ORA incurs a significantly lower task delay as compared to two other algorithms. Note that the increase of the average computation intensity impacts the computation sizes of tasks while the data sizes of tasks keep the same. In this case, ORA can learn to adjust the radio and computation resources for different tasks based on their computation sizes and data sizes, and thus incurs a lower task delay than two other algorithms. Furthermore, for a low average computation intensity, the network has much low computation load, and thus the transmission delay becomes the dominating factor of the task delay. In this case, Transmission-only can dynamically allocate the radio resource to tasks and thus incurs a lower task delay than Computation-only in which the radio resource of each task is fixed. However, as the average computation intensity increases, the computation load dramatically increases and thus the computing delay becomes the dominating factor instead. Since Computation-only dynamically allocates computation resources based on tasks’ computation sizes, it yields a lower task delay than Transmission-only which allocated fixed computation resource to different tasks.
In this paper, we have designed an online resource allocation algorithm based on reinforcement learning to dynamically allocate resources to IoT tasks to improve the task delay of tasks. As tasks are generated dynamically and the future task information is hard to predicted, the resource allocation for different tasks are coupled with each other. Meanwhile, as the task delay consists of both the transmission delay and computing delay, we have jointly considered the radio and computation resource allocation to improve the task delay of all tasks. Due to the QoS constraint of each task, the radio resource allocation and computation resource allocation are also coupled with each other. The designed algorithm employed actor-critic method to iteratively learn the environment and thus make a right resource allocation decision in real time based on the current state information without the future task information. We have demonstrated the performance of the designed algorithm over other baseline algorithms via extensive simulations.
-  L. Wang and R. Ranjan, “Processing distributed internet of things data in clouds,” IEEE Cloud Computing, vol. 2, no. 1, pp. 76–80, 2015.
-  Q. Fan and N. Ansari, “Workload allocation in hierarchical cloudlet networks,” IEEE Communications Letters, vol. 22, no. 4, pp. 820–823, April 2018.
-  ——, “On cost aware cloudlet placement for mobile edge computing,” IEEE/CAA Journal of Automatica Sinica, vol. 6, no. 4, pp. 926–937, July 2019.
-  J. Wang, Y. Liu, J. H. Noble, and B. M. Dawant, “Automatic selection of landmarks in t1-weighted head mri with regression forests for image registration initialization,” Journal of Medical Imaging, vol. 4, no. 4, p. 044005, 2017.
-  T. Q. Dinh, Q. D. La, T. Q. S. Quek, and H. Shin, “Learning for computation offloading in mobile edge computing,” IEEE Transactions on Communications, vol. 66, no. 12, pp. 6353–6367, Dec 2018.
-  J. Wang, F. Chen, L. E. Dellalana, M. H. Jagasia, E. R. Tkaczyk, and B. M. Dawant, “Segmentation of skin lesions in chronic graft versus host disease photographs with fully convolutional networks,” in Medical Imaging 2018: Computer-Aided Diagnosis, vol. 10575, 2018, p. 105750N.
-  R. Deng, R. Lu, C. Lai, T. H. Luan, and H. Liang, “Optimal workload allocation in fog-cloud computing toward balanced delay and power consumption,” IEEE Internet of Things Journal, vol. 3, no. 6, pp. 1171–1181, Dec 2016.
-  Q. Fan and N. Ansari, “Towards workload balancing in fog computing empowered IoT,” IEEE Transactions on Network Science and Engineering, DOI:10.1109/TNSE.2018.2852762, early access, 2018.
-  J. Wan, B. Chen, S. Wang, M. Xia, D. Li, and C. Liu, “Fog computing for energy-aware load balancing and scheduling in smart factory,” IEEE Transactions on Industrial Informatics, vol. 14, no. 10, pp. 4548–4556, Oct 2018.
-  D. Zeng, L. Gu, S. Guo, Z. Cheng, and S. Yu, “Joint optimization of task scheduling and image placement in fog computing supported software-defined embedded system,” IEEE Transactions on Computers, vol. 65, no. 12, pp. 3702–3712, Dec 2016.
-  M. Jia, J. Cao, and W. Liang, “Optimal cloudlet placement and user to cloudlet allocation in wireless metropolitan area networks,” IEEE Trans. on Cloud Computing, vol. 5, no. 4, pp. 725–737, Oct 2017.
-  Q. Fan, N. Ansari, and X. Sun, “Energy driven avatar migration in green cloudlet networks,” IEEE Communications Letters, vol. 21, no. 7, pp. 1601–1604, 2017.
-  S. F. Abedin, M. G. R. Alam, S. M. A. Kazmi, N. H. Tran, D. Niyato, and C. S. Hong, “Resource allocation for ultra-reliable and enhanced mobile broadband IoT applications in fog network,” IEEE Transactions on Communications, vol. 67, no. 1, pp. 489–502, Jan 2019.
-  Y. Yu and J. Wang, “Uplink resource allocation for narrowband internet of things (NB-IoT) cellular networks,” in 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Nov 2018, pp. 466–471.
-  L. Tong, Y. Li, and W. Gao, “A hierarchical edge cloud architecture for mobile computing,” in 35th Annual IEEE Intl. Conf. on Comp. Comm. (INFOCOM 2016), San Francisco, CA, April 2016, pp. 1–9.
-  Q. Fan and N. Ansari, “Application aware workload allocation for edge computing-based IoT,” IEEE Internet of Things Journal, vol. 5, no. 3, pp. 2146–2153, June 2018.
-  T. X. Tran and D. Pompili, “Joint task offloading and resource allocation for multi-server mobile-edge computing networks,” IEEE Transactions on Vehicular Technology, vol. 68, no. 1, pp. 856–868, Jan 2019.
-  X. Lyu, H. Tian, C. Sengul, and P. Zhang, “Multiuser joint task offloading and resource optimization in proximate clouds,” IEEE Transactions on Vehicular Technology, vol. 66, no. 4, pp. 3435–3447, April 2017.
-  B. Dab, N. Aitsaadi, and R. Langar, “Joint optimization of offloading and resource allocation scheme for mobile edge computing,” in 2019 IEEE Wireless Communications and Networking Conference (WCNC), Marrakesh, Morocco, 2019.
-  N. Zhao, Y. Liang, D. Niyato, Y. Pei, M. Wu, and Y. Jiang, “Deep reinforcement learning for user association and resource allocation in heterogeneous cellular networks,” IEEE Transactions on Wireless Communications, vol. 18, no. 11, pp. 5141–5152, Nov 2019.
-  Q. Fan and N. Ansari, “Towards traffic load balancing in drone-assisted communications for IoT,” IEEE Internet of Things Journal, vol. 6, no. 2, pp. 3633–3640, April 2019.
-  J. Yao and N. Ansari, “Task allocation in fog-aided mobile iot by lyapunov online reinforcement learning,” IEEE Transactions on Green Communications and Networking, early access, 2019.
-  V. J. Kotagi, R. Thakur, S. Mishra, and C. S. R. Murthy, “Breathe to save energy: Assigning downlink transmit power and resource blocks to lte enabled iot networks,” IEEE Communications Letters, vol. 20, no. 8, pp. 1607–1610, Aug 2016.
-  Q. Fan and N. Ansari, “Green energy aware user association in heterogeneous networks,” in 2016 IEEE Wireless Communications and Networking Conference, April 2016, pp. 1–6.
-  ——, “Towards throughput aware and energy aware traffic load balancing in heterogeneous networks with hybrid power supplies,” IEEE Transactions on Green Communications and Networking, vol. 2, no. 4, pp. 890–898, Dec 2018.
-  Q. Wu, H. Liu, R. Wang, P. Fan, Q. Fan, and Z. Li, “Delay-sensitive task offloading in the 802.11p-based vehicular fog computing systems,” IEEE Internet of Things Journal, vol. 7, no. 1, pp. 773–785, Jan 2020.
-  K. Li, W. Ni, M. Abolhasan, and E. Tovar, “Reinforcement learning for scheduling wireless powered sensor communications,” IEEE Transactions on Green Communications and Networking, vol. 3, no. 2, pp. 264–274, June 2019.
-  P. K. Tathe and M. Sharma, “Dynamic actor-critic: Reinforcement learning based radio resource scheduling for lte-advanced,” in 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), Aug 2018, pp. 1–4.
-  D. C. Pompermayer, M. A. Có, and C. B. Donadel, “Design and implementation of a low-cost intelligent device to standby mode consumption reduction in already existing electrical equipment,” IEEE Transactions on Consumer Electronics, vol. 63, no. 2, pp. 145–152, May 2017.
-  S. Sesia, I. Toufik, and M. Baker, LTE-the UMTS long term evolution: from theory to practice. John Wiley & Sons, 2011.