With the increasing quantity and quality of rich multimedia services over mobile networks, there has been a huge increase in the traffic and computation for mobile users and devices over recent years, which imposes a huge amount of workload on today’s already-congested backbone networks and the mobile networks.
Naturally, the emerging idea of Mobile Edge Computing (MEC) is proposed  as a novel paradigm for easing the burden of backbone networks by pushing the computation/storage resources to the proximity of the User Equipments (UEs). On the other hand, MEC circumvents the long propagation delays introduced by transmitting data from mobile devices to remote cloud computing infrastructures, and hence is able to support latency-critical mobile and Internet of Things (IoT) applications. Specifically, edge nodes, i.e., base stations equipped with computation/storage capability, could deal with the computation and content requests of UEs, and consequently this scheme improves the Quality-of-Service (QoS) of Mobile Network Operators (MNOs) and the Quality-of-Experience (QoE) of UEs and relieves the load of backbone networks, and pressure of clouds (data centers) .
Fulfilling the requirement of QoE of UEs is not a trivial even by virtue of MEC. The key and difficult point lies in that the computation offloading requires wireless data transmission and might bring about the congestion of wireless channels, and hence raises the decision making or optimization problem on the whole communication and computation integrated system, i.e., how to jointly allocate communication resources and computation resources of edge nodes.
Several pioneer works have been proposed and realize quite good results in their assuming settings based on convex optimization, game theory and so on  . Nevertheless, considering the particular use cases in MEC, these optimization methods may suffer from the following issues: 1) Uncertain Inputs: they assume the that some key information factors are given as inputs, but actually some of them are difficult to obtain due to variant wireless channels and privacy policies; 2) Dynamic Conditions: dynamics of the integrated communication and computation system are not well addressed; 3) Temporal Isolation: most of them do not consider the long-term effect of current decision on resource allocation except for Lyapunov optimization, viz., in a highly time-varying MEC system, most of proposed optimization algorithms is optimal or close-to-optimal only for a snapshot of the system. In a word, the problem existed in the resource allocation optimization of the MEC system is “lack of intelligence”.
In view of the increasing complexity of mobile networks, e.g., a typical 5G node is expected to have 2000 or more configurable parameters, a recent new trend is to optimize wireless communication by Artificial Intelligence (AI) techniques  , include but not limited to the application of AI for Physical Layer (PHY), Data Link Layer, and traffic control . Particularly, related studies on edge computing and caching, such as  , have shown that reinforcement learning  (include Deep Q-Learning  in Deep Reinforcement Learning) has the potential to be effective in joint resource management. But  is based on Q-Learning, which is not feasible for the practical MEC system where the state-action space is tremendous and the focus of  is on vehicular networks. Besides, none of these works has thought over 1) in what form the training data shall be gathered (whether in a distributed or a centralized way), 2) where the reinforcement learning agent should be placed and trained (whether in UEs, edge nodes or remote cloud infrastructures), 3) how the update process of reinforcement learning agents should be proceeded and collaborated, and 4) the privacy protection of training data.
Therefore, in this article, we use Deep Reinforcement Learning (DRL) to jointly manage the communication and computation resources. And both cases of computation offloading and edge caching among the MEC system (illustrated in Fig. 1) is discussed. In addition, Federated Learning  is introduced as a framework for training DRL agents in a distributed manner while 1) largely reducing the amount of data which should be uploaded via the wireless uplink channel, 2) reacting cognitively to the mobile communication environment and conditions of cellular networks, 3) adapting well with heterogeneous UEs in a practical cellular network, and 4) preserving the personal data privacy,
To the best of our knowledge, we are the first group to study the application of DRL coupled with Federated Learning for intelligent joint resource management of communication and computation in MEC systems. Our contribution can be summarized as follows:
1) we discuss the methodology of utilizing DRL (specifically, Deep Q-Learning), and Distributed DRL for optimizing the Edge Caching and Computation;
2) we propose “In-Edge AI” framework to further utilize the “Federated Learning” for a better deployment of intelligent resource management in the MEC system;
3) we provide proof-of-concept evaluation and verify that the proposed scheme has advantages on the balance of performance and cost.
In addition, we also discuss opportunities and challenges to hopefully unveil the upcoming future of edge architecture supporting various AI-based applications.
Ii Optimizing the Edge by DRL
We intend to use AI techniques (particularly DRL) as the method of cognitive computing for building an intelligentizing mobile edge computing, caching and communication system. The cognitive process among protocol stacks of wireless communication is given as Fig. 2, where we partition the whole process into three main parts.
Information Collecting: Sense and collect the indispensable observing data for cognitive computing among the MEC system, including but not limit on the usage of communication and computation resources, wireless environments and intensities of UEs’ requests;
Cognitive Computing: By making use of the observing data of the system, cognitive computing is performed to fuse the massive observed data and further give the decision of scheduling;
Request Handling: The MEC system deals with the request of UEs on the basis of the scheduling decision given by cognitive computing.
In this section, two representative use cases in the MEC system are investigated.
Ii-a DRL over the MEC System for Caching
Recently, we have observed the emergence of promising mobile content caching and delivery techniques, by which popular contents are cached in the intermediate servers (or middleboxes, gateways or routers) so that demands from users for the same content can be accommodated easily without duplicate transmissions from remote cloud servers, and hence significantly reducing redundant traffic.
Thereinafter, we focus on the scenario of caching contents in edge nodes. In the MEC system depicted by Fig. 1, there is a library of popular content files, denoted as , that all mobile users may request in the system. The content popularity is defined as
, which is the probability distribution of content requests from all users. Content popularity indicates the common interests of all users in the network. In related works, the content popularity is always described by MZipf distribution. Moreover, for simply asserting the efficiency of DRL in edge caching, we assume that the content popularity changes slowly and all contents have the same size. For each request, the DRL agent in the edge node can make a decision to cache or not cache, and if yes, the agent determines which local content shall be replaced. We assume that all content popularity, user preference and average arrival rate of requests are static during a relatively long period. We model the cache replacement problem in all edge nodes as a Markov Decision Process (MDP) and use DRL to solve it, and the results will be shown in SectionIV.
Ii-B DRL over the MEC System for Computation Offloading
Ii-B1 Communication Model
As illustrated in Fig. 1, considering an environment where a set of UEs is covered by a set of base stations , UEs could choose to offload their intensive computation tasks to an edge node via the wireless channel or execute these tasks locally. There are wireless channels and the set denotes channels of one base station. Specifically, among the decision choice , UE could choose to offload the computation to the edge via a wireless channel or compute its tasks locally as a decision
. For the purpose of simulating the variation of wireless channels, the channel gain state between a UE and a base station (belongs to an edge node) is independently picked from a finite state space, by which the channel state transitions are modelled as a finite-state discrete-time Markov chain. In this wireless scenario, the achievable data rate could be evaluated by Shannon-Hartley theorem.
Ii-B2 Computation Model
In order to portrait the long-term effects within the MEC system, computation tasks are generated according to Bernoulli distribution across the time horizon. A computation task is represented by, where , denote the size of computation input data (in bits) and the total number of CPU cycles needed to complete the computation task, respectively. All these generated tasks are stored in a task queue and executed sequentially on the UE or edge node according to a FIFO (First-in First-out) principle. When the task is executed locally, the computation execution time of it is given as , where is the the computation capability (i.e., CPU cycles per second) of the UE and determined by the amount of energy the UE decides to allocate. And when the task is scheduled to be executed on the edge node, the execution time of this offloaded task can be calculated as , where is the computation capability the edge node allocated to the UE and the formula holds (the computing performance of edge nodes is much stronger than the UE’s).
Ii-B3 Problem Formulation for Computation Offloading
For efficiently performing computation offloading over the MEC system, the UE shall make a joint communication and computation resource allocation decision in terms of a control action , where is the computation offloading decision denoting the UE chooses to execute the task locally () or to offload the task via which wireless, and denotes the amount of allocated energy for wireless communication and locally computation. In the MEC system, our attention in this section is focus on how to improve the task executing experience (viz., QoE) of UEs.
High complexities of the MEC system will be introduced especially when dealing with task offloading problems and the difficulty of acquiring the global necessary information when involving massive UEs. Hence, DRL algorithm such as Deep Q-Leaning is utilized here as the agent, which handles the joint control action of the UE. The whole problem could be summarized as that the UE decides a joint wireless channel selection and energy allocation decision according to a stationary control policy , while it keeps observing the network state which involves the task queuing state, the cumulative energy consumption of the UE, the occupying wireless channel of the UE and qualities of all wireless channels. In addition, we define an immediate utility function to evaluate the QoE of UEs, which is inversely proportional to the execution delay of tasks (include the wireless transmission delay and the computation delay), the task queuing delay, the energy consumption of the UE and the count of task dropping and failing. By taking advantage of Deep Q-Learning and its improved version, e.g., Double DQN , the control policy could be trained and achieves increasing the utility of UEs for the long-term performance optimization.
Iii In-Edge AI with Federated Learning
In Section II, two use cases of DRL in the MEC system are put forward. However, one key challenge is still pending in practical, namely the deployment of DRL agent. Taking the computation offloading use case in Section II-B as example, if the DRL agent is trained on edge nodes or remote cloud servers just as depicted in Fig. 4(a), due to the wireless communication characters of the MEC system, three deficiencies existed: 1) the training data is large in quantity when considering massive UEs, and it will increase the burden of uplink wireless channels; 2) the training data which should be uploaded to edge nodes or cloud is privacy-sensitive, and it might cause potential privacy accidents; 3) if training data is transformed for privacy consideration, server-side proxy data is less relevant than on-device data. And if the DRL agent is trained on the UE individually, there are still another two defects: 1) the computation capability of a UE is relatively weak, and it will consume long time or is even impossible to train the DRL agent on large-scale data; 2) the training process of a DRL agent may introduce extra energy consumption of a UE.
DRL techniques require intensive computation capacity for finding the optimal solutions. Particularly, if there are huge amount of data factors, parameters, and criteria for optimizing the resource over large-scale MEC systems (an operator’s network in cities), advanced distributed Deep Learning (DL) approaches should be utilized. Hence, thinking about training DRL agents in a distributed fashion efficiently is natural. As the Fig. 3 illustrates, though keeping and training a DRL agent in every UE and edge node can achieve the best performance, it is only practical to apply distributed DRL in the MEC system, because there shall not have enough time and data for training. However, most distributed DRL architectures, described by Fig. 4(b), could not handle unbalanced and non-IID data and cope with the privacy issues . In addition, they usually reduce the performance of DRL agents when UEs and network states are heterogeneous. Therefore, Federated Learning (FL) is introduced in this paper for training DRL agents in the MEC system on account of the fact it could deal with several key challenges below, which differentiates it from other distributed DL approaches :
Non-Independent and Identically Distributed (Non-IDD): The training data (transition memories in DRL) on the UE is based on the wireless environment it experienced and its own computation capability and energy consumption. Hence, any individual training data of a UE will not be able to represent the training data of all UEs. In FL, this challenge could be met by merging the updates of models with FedAvg in .
Limited communication: UEs are often unpredictably off-line or allocated with poor communication resources. Nevertheless, using additional computation could decrease the consumption of communication rounds needed to train a model. In addition, FL only asks a part of clients, in one round, to upload their updates, which handles the circumstance where clients are often unpredictably off-line.
Unbalanced: Some UEs may have more computation tasks to be handled and some may experience more states of mobile networks, resulting in varying amounts of training data among UEs. Also, this challenge could be coped with the FedAvg algorithm.
Privacy and Security: The information needs to be uploaded for FL is the minimal update necessary to improve the DRL agent. Further, techniques of secure aggregation and differential privacy could be applied naturally, which could avoid that privacy-sensitive data are contained in local updates. Nonetheless, the privacy and security is not our focus in this work, more information about these issues could be found in references of 
Iii-a Integration of Federated Learning within Edge for In-Edge AI
Between AI in cloud and AI in UEs, there is an “edge”. With the aid of FL framework, edge nodes equipped with AI computation in “edge” could combine the cloud and massive UEs together and form a powerful AI entity with strong cognitive ability provided by massive UEs and edge nodes. Throughout this envisioned architecture, each edge node can support AI tasks on system level dynamically, not only for its own but for the global optimization and balancing of the whole MEC system.
Iii-A1 From Edge to Cloud
Considering edge caching in the MEC system, the DRL agent employed in an edge node makes decisions on caching appropriate content according to requested contents of UEs dynamically. However, space-time popularity dynamics of requesting contents put forward the requirement for collaborated edge caching. Thus, the cloud server belongs to MNOs could be the central server for coordinating the edge nodes. Among all edge nodes, each of them keeps a DRL agent and updates it by its own local training data.
Iii-A2 From UEs to Edge
In the scenario of computation offloading, each UE shall decide whether one computation task should be offloaded to edge nodes, offloading the task via which wireless channel and the energy consumption according to the inference result of the DRL agent in it. Though UEs such as mobile phones, industrial IoT devices, and smart vehicles are able to perform some AI computation, the computation capability and the energy consumption still limit their abilities in AI computing (DRL training on large-scale data). Therefore, we proposed to use all edge nodes in “edge” as an integral server for coordinating massive UEs covered by them. By virtue of this scheme, UEs with relative weak computation capability could be able to hold a complex DRL agent.
The methodology of tackling two aforementioned scenarios is similar. For training a general DL model, as depicted in Fig. 4 (c), FL iteratively solicits a random set of clients (distributed devices which train DL model) to 1) download parameters of the DL model from a certain server, 2) perform the training process on the downloaded model with their own data, and 3) upload only the new model parameters to the server, which aggregates uploaded updates of the client to further improve the model. The process inside an individual client is also illustrated in Fig. 4(c), where symbols are taken as the same version in  for accessible reading. Specifically, the client trains its model, which is download from the central server before, based on local training data and upload the updated weight of MainNet backward when it is available.
To summarize, FL enables resource-constrained edge to compute devices (include UEs and edge nodes) to learn a shared model, while keeping the training data local.
Iii-B Pros and Cons of Federated Learning for In-Edge AI
The core benefit of FL lies in distributing the quality of knowledge across a large number of devices without necessarily centralizing the training data. Extended by this core benefit, several particular benefits are brought further in the MEC system.
Server-side proxy data is less relevant than on-device data. In the MEC system, massive UEs and edge nodes could act as perceptrons and acquire various, abundant and personalized data for updating the global DL model. On the perspective of UEs, these data could include the quality of the wireless channel, the remanent battery life and the energy consumption, the immediate computation capability and so on. Concerning edge nodes, the cognitive data could include the computation load, the storage occupation, the number of wireless communication links, the task queue states waiting for handling, and etc. Thus, using FL based on these raw data instead of centralized DL renders the MEC system more cognitive.
Robust: FL inherently could address key issues about the availability of UEs and edge nodes and the unbalanced and non-IID data. Consequently, the performance of In-Edge AI shall not be easily affected by the unbalanced data and sometimes the poor communication environment. Further, its ability of handling non-IDD data allows massive UEs in different wireless environments to train their own DL model without concerning about the negative effect.
Flexible: In FL, additional computation could be used to decrease the number of communication rounds to train a model. An effective way to add computation is increasing computation per UE, i.e., adding more local SGD updates per round. Hence, a trade-off between computation and communication existed. Specifically, powerful (both in computation and energy) UEs could decide to perform more mini-batches in training for further decreasing the communication cost and vice versa.
Certainly, due to the limitation of federated optimization, FL could not outperform the centralized DL trained model, but could still achieve the near best performance, which will demonstrated in Section IV.
Iii-C Practicability Discussion
To some extent, our proposed In-Edge AI is a future-oriented concept. We envision a near future where most of the mobile devices, particularly smartphones, are endowed with the ability of not only inferring but also training the Deep Learning model. As is well known, even the state-of-the-art edge chips, such as Edge TPU (brought by Google and powered by TensorFlow Lite), could only support elementary training processes in DL. Therefore, the practicability should be discussed with considering both practical deployment and delay requirement.
Iii-C1 Deploy Challenges
Learning takes long time of training as well as inferring according to the required accuracy level. Obviously, the DRL model should not be deployed directly while setting weights of neural networks at random. Otherwise, the MEC system will be paralyzed because the DRL model could only make random decisions at preliminary exploration.
Nonetheless, this may be solved if the DL model is not trained from scratch. We are now working on using transfer learning to boost the training process in MEC systems. The basic idea is to simulate the wireless environment and requests of UEs. Just as evaluating and adjusting the antenna settings in a simulated testbed, the simulated environment is used to train an off-line Deep Reinforcement Learning agent. Then, the established DRL model could be distributed to mobile UEs.
Iii-C2 Delay Requirement
Learning always takes long time of training as well as inferring according to the required accuracy level. However, if using transfer learning, there shall be less time or computation consumption on the side of training.
Based on the pre-established DRL, the requirement of training (not inferring) is alleviated, i.e., the DRL agent could reach the satisfied accuracy level after a small number of mini-batches or even directly inferring. And when the wireless environment or the pattern of UEs’ requests is changed, the UE could also take advantage of the established DRL to adjust neural networks in its DRL model.
Unlike using an enormous number of hidden layers and neurons in CNN or RNN, the neural network in DRL is merely a Multi-Layer Perceptron (MLP, shown in Fig. 4(c)). And in our simulated scenarios, the MLP has only one hidden layer with 200 neurons. Therefore, once the model has been trained, the inferring process could be performed quickly due to the low complexity of the MainNet in DRL agents.
Besides, our work  aims to meet the challenge of running DL on resource-constrained mobile devices. By incorporating this work, the requirement on computation capacity of UEs could be relaxed along with decreasing the inferring delay.
Iv Data-Driven Evaluation of Proof-of-Concept In-Edge AI Framework
Iv-a Experiment Settings
are presented, respectively. Among all simulations, the time horizon is discretized into time epochs.
We capture Xender’s trace for one month (from 01/09/2016 to 30/09/2016), including 9,514 active mobile users, conveying 188,447 content files, and 2,107,100 content requests , of which the content popularity distribution can be well fitted by a Zipf distribution with . In edge caching simulations, we use this mined Zipf distribution as the content popularity distribution to generate the content request of UEs, and consider the cooperation of edge nodes. Once an edge node receives a request from UE, the local DRL agent in it will make a decision to cache which content or not cache, and then obtain the reward of this action for its own training. In edge caching, the cloud server belongs to MNOs is the central server for coordinating the edge nodes.
On investigating the capabilities of DRL coupled with FL over the MEC system for computation offloading, we suppose the whole bandwidth MHz of an edge node is divided into wireless channels, and take UEs as the clients in FL framework to individually train their DRL agents and merge them among edge nodes. The channel gain states between the UE and the edge node are from a common finite set, which quantifies the quality of the wireless channel into levels. Throughout the simulation, the number of tasks generated on each UE follows Bernoulli distribution with average arrival rate per time epoch.
As for the DRL settings in UEs, edge nodes, and cloud, we choose the Double DQN algorithm and select tanh as the activation function and Adam optimizer. We used a single-layer fully-connected feedforward neural network, which includesneurons, to serve as the target (TargetNet) and the eval (MainNet) Q network. Other parameter values in Double DQN are set as follows: replay memory capacity , mini-batch size , discount factor , exploration probability , learning rate and the period of replacing the target Q network is . In addition, to build a baseline for the performance of DRL agent with FL, we construct a centralized DRL agent for comparison and it is assumed to be able to receive all data used for reinforcement learning.
Iv-B Evaluation Results
To elucidate the performance of our In-Edge AI framework, experiments on edge caching and computation offloading are carried out under various settings.
Fig. 5 depicts the performance of DDQN with FL both on edge caching and computation offloading. Three edge nodes (, and ) and three UEs (, and ) are chosen for exhibiting the capabilities of their own DRL agents. On the perspective of edge caching, , and are chosen to cache contents, and hit rates of them are improving and finally maintain within a certain range along with the process of training DRL agents. In the simulation of computation offloading, the average utilities of three UEs , and are increasing with the decreasing of training loss, and also maintain within a certain range at last. This means that the efficiency of handling offloaded tasks in the MEC system is improved.
Further, Fig. 6 gives the details of performance comparison as follows.
In Fig. 6(a), the edge caching performance of the DDQN with FL is compared to the centralized DDQN, and some other caching policies, such as known LRU (Least Recently Used), LFU (Least Frequently Used) and FIFO (First in, First out). It can be easily seen that the specific performance of FL is near close to the results of centralized DDQN in terms of achieved hit rates of clients (edge nodes), and outperform LRU, LFU, and FIFO.
In the experiment on computation offloading, the performance of the DDQN with FL is compared to the centralized DDQN, and another three baseline computation offloading policies, i.e., mobile execution, edge node execution, and greedy execution. Here, mobile execution, edge node execution and greedy execution mean that the UE processes all computation tasks locally, all computation tasks are offloaded from the UE to edge nodes, and the UE makes decision on executing a computation task locally or offloading it to edge nodes with the aim of maximizing the immediate utility, respectively. In Fig. 6(b), it can be observed that the DDQN with FL achieves the near performance of centralized DDQN and is superior to another three baseline policies;
On investigating the detailed performance in training process, we assume that the capability of wireless communication is not the hinder, i.e., both massive training data for centralized DDQN and lightweight model updates for DDQN with FL can be uploaded to the target position within a discretized time epoch. As illustrated in Fig. 6(c), the performance of centralized DDQN is better than that of DDQN with FL at the beginning of training. However, once the model merging of FL has been processed several times, the performance of DDQN with FL becomes near to that of centralized DDQN. Certainly, if the client wishes to obtain the expected performance using DDQN with FL, it must take time to wait for the model merging, i.e., exploiting training results of other clients. Nonetheless, this experiment assumes an ideal wireless environment. In practice, massive training data are actually unable to be uploaded without any delay. Therefore, performing DDQN with FL is more practical in MEC systems, at least for now when the wireless resource is also the major consideration.
We also gather statistics of the total wireless transmission until the DRL agent is well trained both in the scenario of edge caching and computation offloading. In the framework of FL, every client only needs to upload the update of its model. Without FL framework, viz., using centralized DRL, UEs must upload the whole training data via wireless channels and thus consume more communication resources, which is demonstrated by Fig. 6(d).
With no doubt, FL must trade something for its advantages. Specifically, due to the fact that the coordinating server in FL only executes tasks of merging updates instead of taking over the whole training, the computation load of clients is inevitably heavier on account of the local training process. This will cause more energy cost on UEs or burden the computation of both UEs and edge nodes, and these issues are still open questions.
V Opportunities and Challenges
We hereby discuss a few essential promising research directions of In-Edge AI, and highlight several pending problems from the perspective of elaborating and widening the usage of AI and Edge Computing.
V-a Accelerating AI Tasks by Edge Communication and Computing Systems
Aforementioned sections is to optimize the edge of mobile communication systems by AI techniques, but it is also important to exploit specialized methods of optimizing learning computation tasks by the support of edge . DRL can be treated as a type of specialized edge computing task at the service-level. How to find correct edge nodes to collaborate, and how to allocate appropriate resources, for the large amount of AI tasks with various priorities, deadlines, scales, different requirement on CPU, memory and so on are also indispensable. Interestingly, this is the reverse direction of the aforementioned Federated Learning, but also a kind of application of the “In-Edge AI”. Potential research topics may include game theoretic algorithms for the competition of edge nodes and mobile users over the multi-dimensional resource for AI acceleration by the edge, and the dynamic and adaptive splitting of AI tasks becomes quite challenging.
V-B Efficiency of In-edge AI for Real-time Mobile Communication System towards 5G
In 5G, the defined Ultra-Reliable Low-Latency Communications (URLLC) scenarios strongly desire very small delays and high reliability of the mobile system. However, general deep learning-based optimization and prediction schemes take quite long running time of recursions for converging to the results, which is inappropriate for mobile edge systems; particularly the system-level edge computing tasks desiring rapid responses in the scale of millisecond. In order to manage the guaranteed QoS of delays and bandwidth for caching, communication and computation, In-Edge AI should be able to provide differentiated support for various types of services, and fine-grained collaborative scheduling of the AI tasks (or split ones) over the edge nodes and mobile devices should be accelerated in nearly real-time, which is critical but remain unsolved in current literature. Furthermore, the integration of Federated Learning framework with AI chipset hardware should be explored as well.
V-C Incentive and Business Model of In-Edge AI
In-Edge AI is a tight confederation among mobile operators, SPs/CPs, and mobile users, while capable entities may contribute more to the overall optimization of the edge caching and computing tasks, but a large number of UEs may rely on the AI of edge nodes and other UEs. Also, AI computation requirements from SPs and CPs should be satisfied across edge nodes of different mobile operators. And hence, more reliable and accurate design of the incentive framework of In-Edge AI become interesting, which should further provide a loop of the digital copyrights of the content and services. Blockchain techniques may be integrated into In-Edge AI framework , but how to distribute the huge computation load of the proof of work over the edge system and how to evaluate the contribution of In-Edge AI computation on heterogenous scenarios are still unexplored.
In this article, we discussed the potential of integrating the Deep Reinforcement Learning techniques and Federated Learning framework with the mobile edge system and optimizing the mobile edge computing, caching and communication with it. We perform experiments on investigating the scenarios of edge caching and computation offloading in mobile edge system, and the “In-Edge AI” is evaluated and proved to have the abilities to achieve near-optimal performance. For our future work, we concentrate on not only optimizing learning computation tasks by the support of edge but also scheduling the AI tasks over the edge nodes and mobile devices in a fine-grained and collaborative manner.
-  ETSI, “Mobile Edge Computing - Introductory Technical White Paper”, Sep. 2014.
-  X. Wang, M. Chen, T. Taleb, A. Ksentini, V. C. M. Leung, “Cache in the Air: Exploiting Content Caching and Delivery Techniques for 5G Systems,” in IEEE Communications, vol. 52, no. 2, pp. 131-139, Feb. 2014.
-  M. Chen and Y. Hao, “Task Offloading for Mobile Edge Computing in Software Defined Ultra-Dense Network,” in IEEE J. Sel. Areas Commun., vol. 36, no. 3, pp. 587-597, Mar. 2018
-  Y. Mao, C. You, J. Zhang, K. Huang, K. B. Letaief, “A Survey on Mobile Edge Computing: The Communication Perspective”, in IEEE Commun. Surv. Tutorials, vol. 19, no.4, pp. 2322-2358, Aug. 2017.
-  Q. Mao, F. Hu, and Q. Hao, “Deep Learning for Intelligent Wireless Networks: A Comprehensive Survey,” in IEEE Commun. Surv. Tutorials, Early Access, 2018.
-  F. Tang, B. Mao, Z. M. Fadlullah, N. Kato, O. Akashi, T. Inoue, and K. Mizutani, “On Removing Routing Protocol from Future Wireless Networks: A Real-Time Deep Learning Approach for Intelligent Traffic Control,” in IEEE Wireless Communications, vol. 25, no. 1, pp. 154-160, Feb. 2018.
-  A. Sadeghi, F. Sheikholeslami, and G. B. Giannakis, “Optimal and Scalable Caching for 5G Using Reinforcement Learning of Space-Time Popularities,” in IEEE J. Sel. Top. Signal Process., vol. 12, no. 1, pp. 180-190, Feb. 2018.
-  Y. He, N. Zhao, and H. Yin, “Integrated Networking, Caching, and Computing for Connected Vehicles: A Deep Reinforcement Learning Approach,” IEEE Trans. Veh. Technol., vol. 67, no. 1, pp. 44-55, Jan. 2018.
-  R. S. Sutton and A. G. Barto, “Reinforcement Learning: An Introduction”. Cambridge, MA, USA: MIT Press, 2016.
-  V. Mnih et al., “Human-Level Control through Deep Reinforcement Learning,” in Nature, vol. 518, no. 7540, pp. 529-533, Feb. 2015.
-  H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-Efficient Learning of Deep Networks from Decentralized Data,” in Proceedings of the International Conference on Artificial Intelligence and Statistics, Apr. 2017.
-  H. Van Hasselt, A. Guez, and D. Silver, “Deep Reinforcement Learning with Double Q-Learning,” in Proceedings of the 30th AAAI Conference on Artificial Intelligence, vol. 16, 2016, pp. 2094-2100.
-  X. Li, X. Wang, P.-J. Wan, Z. Han, V. C.M. Leung, “Hierarchical Edge Caching in Device-to-Device Aided Mobile Networks: Modeling, Optimization, and Design”, IEEE Journal on Selected Areas in Communications, Special Issue on Caching for Communication Systems and Networks, Early Access, 2018.
-  E. Li, Z. Zhou, X. Chen, “Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy,” in ACM SIGCOMM, MECOMM Workshop, 2018.
-  Z. Xiong, Y. Zhang, D. Niyato, P. Wang, and Z. Han, “When mobile blockchain meets edge computing: Challenges and applications,” arXiv preprint arXiv:1711.05938, 2017.