Due to restricted battery power, memory and computational capacity, mobile devices face challenges in executing delay-sensitive and resource-hungry mobile applications such as augmented reality and online gaming. Mobile Edge Computing (MEC) is foreseen as a remedy to alleviate this problem. In MEC, the mobile edge is enhanced with analysis and storage capabilities, possibly by a dense deployment of computational servers or by strengthening the already-deployed edge entities such as small cell base stations. Consequently, mobile devices are able to offload their computationally expensive tasks to the edge servers while requesting some specific quality of service. This process, referred to as computation offloading, is feasible due to the fact that edge servers are deployed in close proximity of mobile users, specifically in comparison to the remote cloud servers. An illustration of MEC is provided in Fig. 1.
Notwithstanding its numerous perks, MEC suffers from some short-comings that should be addressed for the concept to become realizable. Most importantly, the limited radio and computational resources of mobile edge servers shall need to be efficiently utilized so that the users’ quality of service requirements are met with the minimal effort. Moreover, from the mobile devices’ perspective, the consumed energy shall be minimized. The problem becomes aggravated when the randomness and dynamics of wireless networks are taken into account. The factors that contribute to this issue include, but are not limited to, users’ mobility, random channel quality, time-varying and random task-arrival, non-deterministic energy resources (for instance in case of energy harvesting), and other similar variables.
Naturally, the aforementioned challenges cannot be addressed by conventional centralized resource allocation schemes, since such mechanisms necessitate the availability of global information at a central node. This is infeasible to acquire in ultra-dense distributed networks. Moreover, the computational complexity becomes overwhelming as well. As a result, it is vital to develop distributed and autonomous approaches, where the individual mobile devices and mobile edge servers make decisions for the system to settle at efficient and stable operating points. Therefore, game theory and reinforcement learning are considered as two mathematical tools with great potential to address such problems.
Game theory is well-established as a classic tool to mathematically model the wireless resource allocation problems. Based on the fact that many of the wireless resource allocation problems can be reduced to distributed decision making problems, game theory becomes an ideal fit. Game theory focuses on strategic interactions among players and thus eliminates the need for a central controller which is a major advantage. As it is well-known, game theory has two main branches: non-cooperative and cooperative. Non-cooperative game theory studies the interactions of rational and self-interested players that compete against each other, and the goal is to achieve an efficient equilibrium point. Important notions of equilibrium include Nash, correlated and Walrasian equilibrium. Cooperative game theory promotes a cooperative behavior that is supposedly beneficial to all agents; a well-known example is coalitional games. In such games, players form coalitions, and enforce cooperative behavior in each coalition, so as to maximize the value of the coalition, which is regarded as a utility measure. In addition to coalition formation, there are also some other types of games based on strategic cooperation. In summary, game theory offers a variety of game models in which, each game has its own distinct set of properties, that make them suitable for different types of decision making problems based on the context.
Minority game (MG) is a type of non-cooperative game that can be used to model distributed resource allocation problems in MEC. More precisely, MG is a congestion game in which, an odd number of selfish players choose between two actions in the hope of maximizing their own payoffs. Only the players landing in the minority are rewarded. The players neither communicate with each other nor they have any information about the actions of the other players. Therefore, decision making is almost entirely autonomous. The winning action is broadcast to all players at the end of each round of play which is the only external information provided. This lack of availability of information calls for adaptive methods to be used in order to determine the best action to be chosen in the next round. To this end, the existing literature offers a significant number of learning methods that include reinforcement learning methods and stochastic strategies. These methods help players improve the coordination among their actions and achieve better social and individual welfare, by forming larger minorities.
The applicability of MG as a tool for modeling resource allocation problems is quite obvious. In most congestion problems, being in the minority tends to be more beneficial. Many wireless resource allocation problems are in fact congestion problems, where a number of users compete for a limited resource. If the resource is uncrowded, users are rewarded, which is analogous to the minority being rewarded in an MG. For instance, consider the previously mentioned MEC system where, the offloading users attempt to utilize the limited amount of available computational resources of the edge servers. This scenario essentially maps to a congestion problem, since the utility of each user depends on the number of users using the same resource. Hence, such problems can be easily modeled using MG. The advantages of MG include simple implementation, low overhead, and scalability to large set of players, which are of vital importance in a dense wireless MEC system. More details on MG can be found in . Later in this article, we provide an MG model for distributed server activation problem in MEC where we compare the performance of several learning algorithms used in MG.
Reinforcement learning is another well-known technique applicable to distributed decision making. In reinforcement learning, autonomous agents learn the best action by using the rewards and penalties received in each round of play. Since agents do not know which action is the best, they learn by balancing exploration of unknown actions and exploitation of the current knowledge of used actions. In other words, agents use trial and error approach to maximize their utilities over the horizon. Some well-known reinforcement learning methods include Q-learning, learning automata and Roth-Erev learning. Reinforcement learning mechanisms are very-well suited for learning in MG, since adaptation to the collective action of the other agents in the presence of information scarcity can be achieved using such methods .
In this article, first we provide a concise review of the state-of-the-art in computation offloading and efficient resource allocation of edge servers, with an emphasis on the solutions developed by using game theory and reinforcement learning. Then, we explore the research outlook and open problems. To this end, we formulate an example distributed server activation problem as a minority game and apply reinforcement learning to solve the game. We present numerical results on the performance of the different learning techniques.
In this section we briefly explore the cutting edge research in the area of computation offloading and resource management for MEC. In doing so, we focus on computation offloading and resource management methods that are developed based on game theory and/or reinforcement learning. Note that a comprehensive survey of the state-of-the-art is out of the scope of this article, and our goal is to capture the research trend by reviewing some exemplary research works.
In , the authors consider a multi-cell, quasi-static environment, and cast the computation offloading problem as a dynamic sequential game. They further establish the existence of Nash equilibrium and develop a distributed convergent offloading scheme. In , the authors consider the offloading problem with the set of mobile devices varying randomly during the offloading period. The problem is modeled using a stochastic game framework, which is afterward shown to be equivalent to a potential game. The existence of Nash equilibrium is proved and a stochastic learning algorithm is developed. For cloud-enhanced vehicular networks with edge computing capability, an offloading mechanism based on a Stackelberg game is proposed in . The servers and the offloading vehicles are modeled as the leaders and the followers, respectively. Similar to the aforementioned references, the existence of Nash equilibrium is proved and a distributed algorithm is designed that maximizes the edge server’s utility while meeting the tasks’ latency constraints. In 
, the authors investigate the multi-user offloading decision making problem in a dynamic environment, where users’ states and offloading requests are time-variant. The number of tasks offloaded to each server (machine) is modeled as an a priori unknown time-varying Markov process. The authors then formulate the offloading problem as a Markov decision process. Online learning algorithms are developed to solve for the optimal offloading policy for both centralized and decentralized scenarios.
Radio and Computational Resource Management
In , the authors use coalitional game theory to solve a resource allocation problem in MEC-enabled IoT networks with software-defined network (SDN) capability. In such a network, delay sensitive tasks are offloaded to the edge servers by the IoT applications. The developed game-theoretical framework is guaranteed to adaptively provision the available computational resources in the MEC servers in order to satisfy the quality of service requirements of IoT applications. Moreover, a deterministic algorithm is proposed to minimize the task processing cost and the latency. Reference  investigates joint offloading decision making and dynamic edge server provisioning in an offloading mobile edge network with energy harvesting capability. They model the problem as a Markov decision process. A reinforcement learning algorithm is developed for offloading computation jobs and activating edge servers while minimizing the overall cost and delay. The authors of  propose a resource allocation mechanism using auction theory. Therein, service providers in the mobile edge network design contracts with the edge node infrastructure providers. The contracts enable the edge servers to efficiently provision their assigned computational resources and to schedule the offloaded tasks in a way that the latency is minimized. In , the focus is on a dynamically-changing vehicular networks with MEC capabilities including computation and caching. A network operator allocates computation, caching and network resources to the vehicles for different vehicular applications. To address high complexity, the authors develop a deep reinforcement learning algorithm based on deep Q-learning.
|||minimize users’ energy and latency cost||dynamic sequential game||quasi-static|
|||minimize users’ energy and latency cost||stochastic game||dynamic|
|||maximize utilities of users and servers||Stackelberg game||vehicular|
|||minimize unprocessed offloading requests||Markov decision process||dynamic|
|||optimize resource usage and QoS guarantee||coalitional game||edge IoT|
|||minimize overall cost and latency||Markov decision process||energy harvesting MEC|
|||minimize latency||auction theory||dynamic workload arrival|
|||efficient resource allocation||deep Q-learning||vehicular|
|||minimize servers’ energy and QoS guarantee||minority game||random|
Despite its great potential in improving the latency and energy consumption, realizing the concept of MEC is associated with a variety of challenges. In particular, decision making for computation offloading as well as joint radio-computational resource allocation are challenging. The challenge mainly arises due to resource scarcity and distributed nature of MEC, as well as the uncertainty and randomness in wireless networks. This includes, but is not limited to, the randomness in channel quality and the amount/type of offloaded tasks. In what follows, we briefly discuss some important problems, including computation offloading and few other closely-related issues. We also investigate the ability of game theory and reinforcement learning to address the challenges and obtain efficient solutions.
Computational Resource Allocation: As a result of being deployed at the edge, MEC suffers from restrictions of computational resources, in particular when compared to the central mobile cloud computing. As a result, it becomes imperative to allocate the limited resources in an efficient manner. This includes, but is not limited to, MEC server activation and scheduling, load balancing, request management, task allocation, and the like. Such problems can be in particular addressed by cooperative games, where a set of entities form coalitions to achieve a specific goal, and then share the reward. Moreover, by combining reinforcement learning with game theory, the uncertainty and lack of prior information can be addressed.
Radio Resource Allocation: Enhancing the wireless network with MEC complicates the radio resource management. For instance, the necessary uploading and downlinking of task-related data results in radio bandwidth consumption and interference. Consequently, smart bandwidth allocation shall need to be performed for mobile devices/servers. Moreover, the energy consumption at the servers should be kept at the minimum. To increase energy efficiency, servers might share the energy resources and/or harvest ambient energy. Such remedies however introduce uncertainty in the system, in contrast to using deterministic power resources such as a grid. The problem can be addressed by using models from cooperative game theory and reinforcement learning.
Computation Offloading: While the allocation of computational resources is performed on the MEC servers’ side, mobile devices decide about computation offloading. In essence, each device decides which and what part of every task shall be offloaded to an edge server. In some cases, the specific server to which the task is uploaded can be determined by the device as well. Moreover, mobile devices might be able to demand a specific quality of service guarantee. Naturally, mobile devices might compete with each other for limited computational services, whereas each server would compete with others to increase its number of offloaded tasks. Moreover, a conflict arises between the set of servers and the set of devices, since the latter requests low prices for services, whereas the former benefits from high service prices. All such scenarios can be modeled and solved by using competitive games and models from economic markets. As before, a convergence to an efficient solution can be achieved by performing the game repeatedly and learn from the outcomes.
Information-Centric MEC: Inspired by the concept of caching of popular files, in information-centric MEC, the data and/or services can be saved at different edge servers to promote an efficient computation. In fact, by using this concept, the amount of data which should be uploaded/downlinked dramatically reduces. Naturally, not all the data/services can be cached at every server. In addition, the service demand for users might change over time. Thus the problem to address is as follows: How much and which data/services shall be saved at each server? In addition, the servers should be motivated to cooperate with each other, so that if necessary, the tasks/data/services can be exchange among servers. Such problems can be addressed by using cooperative game theory, repeated auctions, and exchange economy.
Economics of MEC Server Virtualization: Mobile network operators (MNOs) or service providers (SPs) may lease the MEC servers/resources from infrastructure providers (InPs). The InPs then will need to virtualize their MEC resources among different MNOs/SPs. The economics of the virtualization of MEC resources can be modeled and analyzed using game theory models. As an example, for a scenario with multiple InPs and multiple MNOs/SPs, a multi-leader and multi-follower Stackelberg game model can be formulated to determine the equilibrium prices that the MNOs/SPs need to pay to the InPs. In a more general scenario, virtualization of MEC resources/servers can be combined with virtualization of other resources including infrastructures (e.g., base stations), spectrum resources, as well as caching storage. Modeling and analysis of such a general virtualized network under users’ quality of experience (QoE) constraints is an interesting research problem.
Energy-Efficient Computation Offloading
Following the previous discussions, in this section we formulate an energy-efficient server activation problem. We then solve the formulated problem by using minority games in conjunction with reinforcement learning.
Consider a virtual pool of edge computational servers, gathered in a set . At consecutive rounds , the pool receives a fixed number of offloaded computational tasks to perform. Tasks are delay-sensitive with some execution deadline. At every time slot ,
servers are active and the offloaded computing tasks are equally divided among the active servers. On one hand, since each task requires a random time to be performed, the number of servers should be large enough to guarantee an acceptable user experience. On the other hand, initial activation of a server, as well as performing each task, require some fixed amount of energy. Every active server is reimbursed for its performed tasks. Thus, the number of tasks per servers shall be large enough to insure an acceptable revenue. Based on this trade-off, one can determine the required number of active servers at each offloading round so that (i) the system is energy-efficient; and (ii) the user’s quality of experience is satisfactory with high probability. We show this threshold number with, and take it as given in this paper. An example calculation of can be found in .
In a distributed MEC system, prior to task arrival, every server independently decides whether to
accept computation jobs (active mode); or
not to accept any computation job (inactive mode).
That is, each server has two possible actions. Based on the discussion above, desired is to have at every offloading round . In what follows, we model this problem as a minority game and use a variety of learning algorithms to solve the game.
Modeling the Problem as a Minority Game
A MG can model the interaction among a large number of players competing for limited shared resources. In a basic MG, the players select between two alternatives and the players belonging to the minority group win. The minority is typically defined using some cut-off value. The collective sum of the selected actions by all players is referred to as the attendance.
We model the formulated server mode selection problem as an MG, where the servers represent the players, with a cut-off value for the number of active servers. The game is repeated at consecutive rounds. The action of an agent at time is denoted by . A server being active and inactive correspond to and , respectively. Thus is equivalent to the attendance. If , active servers are winners, and each receives a unit reward. In contrast, promotes inactive servers as winners, yielding a unit reward for each of them. We use
to denote the standard deviation of the attendance value. We define the volatility as . Note that volatility corresponds to the inverse global efficiency (social welfare) of the MG, since smaller volatility implies larger minority size, thereby larger number of satisfied agents. It should be mentioned that zero volatility is considered as the Nash equilibrium of MG.
Distributed Learning Algorithms
In an MG, the agents apply an algorithm to learn the best action to be played in the next round of play. In the seminal studies of MG, a distributed learning algorithm is introduced, where each agent plays MG with the help of a given set of strategies. Each strategy specifies an action to be played for every possible history data string. The agents evaluate their strategies by scoring them for the accuracy of their predictions as the game evolves, and use the strategy with the highest score in each round . Apart from this seminal mechanism, a variety of learning algorithms are available in the MG literature that can be used by agents to learn the best action. Many of these algorithms fall into the category of reinforcement learning, where the learners balance the exploration-exploitation trade-off in order to maximize their utilities. In addition, learning methods based on stochastic strategies are also available where agents choose their actions with some probability. In what follows, we introduce some of these algorithms and their applicability in an MG setting.
Exponential Learning: In , exponential learning is applied in MG. Each agent is given strategies, and the agent scores each of these strategies based on the accuracy of its prediction of the winning action. Each agent selects a strategy with some probability , defined as: , where is the score of strategy at time slot . Moreover, is the learning rate of each agent. Note that, corresponds to selecting the strategy with the highest score which is the seminal MG learning algorithm.
Q-Learning: In , Q-learning is applied in an MG, where each agent keeps track of the Q-value of two actions. Every agent uses the following rule to update the Q-values, where is the utility received by agent as a result of some action . This rule makes use of the utility information () possessed by the agents in order to learn the best action (i.e., exploitation of the available information). The Q-learning in MG is two fold; (i) Q-values are determined for the two actions (we refer to this as Action-based Q-learning ) and (ii) Q-values are determined for agents’ strategies (we refer to this as Strategy-based Q-learning). In the second scenario, an agent keeps track of the Q-values for each of her strategies:
Given Q-values and some , every agent selects the action with the highest Q-value with probability , and with probability selects an action uniformly randomly (i.e., exploration).
Adaptive Strategy: Authors in  developed an adaptive learning strategy for MG. Therein, for each actions , each agent calculates a parameter called attractiveness () defined as: , where is the attitude of action , which is initially selected randomly from . Moreover, is the fraction of rounds in which action has won in a given history of the game. The action with the highest attractiveness is chosen by the agent in the next round of the play. As the game evolves, in each round of play, an agent adapts her attitude values such that if agent selects action and wins, will be increased by some constant whereas if agent selected action and lost, will be decreased by some constant .
Win-Stay Lose-Shift Strategy: In , this learning method is presented as a simple behavioral model for the agents playing an MG. This is a stochastic strategy-based learning method. If an agent wins in the current round of the game, she selects the same action in the next round. In contrast, if the agent loses, she will choose the other action with some probability . Authors analytically showed that for small enough values, the social welfare (i.e., volatility) of the system approaches the optimal value. More precisely, for the MG with odd players and cutoff value, is chosen such that where .
Roth-Erev Learning: This learning method is applied in MG in . Similar to the Q-learning, an agent determines a weight for each of her actions, denoted by and referred to as action weights. However, unlike Q-learning, is defined as the sum of the initial action weight and the discounted sum of all past utility values received for playing action ( is referred to as the discount factor). Agents use the following rule to update the actions’ weight:
Given the values of , the selection probability of action is defined as .
Learning Automata: According to , learning automata can be applied as an MG learning mechanism, by using the following rule to update the probability of playing every action , denoted by , after each round of play:
Here and are known as the reward rate and penalty rate, respectively.
Random selection: In a random selection scenario, agents simply select one of the two actions uniformly at random.
We choose and . Simulations are carried out for runs and in each run, the servers repeatedly execute the MG for offloading periods. We compare all aforementioned learning methods based on the social- and individual welfare of servers as well as users’ QoE measure. For different learning schemes, the parameters are selected as follows, using the best values as suggested in the literature:
Exponential learning: .
Q-learning: , .
Adaptive strategy: Initial attitude values , and , .
Win-stay lose-shift strategy: .
Roth-Erev learning: .
Learning automata: and .
Seminal MG: .
In Fig. 2, we show the variations in the volatility as a function of the parameter , with being the memory size, i.e., the length of the historical data used by the agents for learning. It can be seen that exponential learning method achieves the best social welfare (inversely proportional to the volatility), with its lowest volatility approaching to .
In addition to examining the social welfare of the system, we also investigate the performance of each learning method in terms of individual welfare of the servers. In doing so, we illustrate the average utility per server during the entire the game in Fig. 3. It can be concluded that using an appropriate learning method, a near-optimal average utility is achievable by the servers, despite not having any prior information.
The learning methods such as exponential learning, adaptive strategy, win-stay lose-shift strategy and Q-learning that exhibit better performance than the seminal inductive learning method help servers achieve better coordination and thus form larger minorities. This reduces wastage of computation server resources and hence improves the resource allocation efficiency. Therefore, these methods can be recommended as more efficient and sophisticated learning rules for the formulated MG-based server selection problem.
Extension of the Model
Since MEC networks typically consist of a variety of edge nodes such as small base stations, macro base stations, wireless access points, etc., the edge servers are not homogeneous in practice. Therefore, heterogeneities in their computational capability, power and storage should be taken into account when developing efficient resource allocation mechanisms. To model such scenarios, games that incorporate different types of players could be applied. In addition, to ensure fairness among the servers, analyses using various equilibrium notions need to be carried out. Moreover, mathematical tools such as queuing theory and Markov decision processes can be used to more accurately model the randomness in the offloading system such as random arrival of computation tasks and the users’ status change.
We have outlined the major challenges that arise in MEC, primarily focusing on computation offloading. We have investigated the state-of-the-art and studied the applicability of distributed solution approaches such as game theory and reinforcement learning for deriving efficient solutions for the identified challenges. Moreover, we have formulated the energy efficient edge server activation problem in a MEC offloading system using minority games and obtained some preliminary results by applying a number of reinforcement learning techniques. Extension of the model to consider several practical aspects of the efficient resource management problem for MEC servers has also been discussed.
-  D. Catteeuw and B. Manderick, Heterogeneous Populations of Learning Agents in the Minority Game. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 100–113.
-  S. Ranadheera, S. Maghsudi, and E. Hossain, “Minority games with applications to distributed decision making and control in wireless networks,” IEEE Wireless Communications, vol. PP, no. 99, pp. 2–10, 2017.
-  M. Deng, H. Tian, and X. Lyu, “Adaptive sequential offloading game for multi-cell mobile edge computing,” in 2016 23rd International Conference on Telecommunications (ICT), May 2016, pp. 1–5.
-  J. Zheng, Y. Cai, Y. Wu, and X. S. Shen, “Stochastic computation offloading game for mobile cloud computing,” in 2016 IEEE/CIC International Conference on Communications in China (ICCC), July 2016, pp. 1–6.
-  K. Zhang, Y. Mao, S. Leng, S. Maharjan, and Y. Zhang, “Optimal delay constrained offloading for vehicular edge computing networks,” in 2017 IEEE International Conference on Communications (ICC), May 2017, pp. 1–6.
-  C. Tekin and M. van der Schaar, “An experts learning approach to mobile service offloading,” in 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton), Sept 2014, pp. 643–650.
-  S. O. Aliyu, F. Chen, Y. He, and H. Yang, “A game-theoretic based qos-aware capacity management for real-time edgeiot applications,” in 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS), July 2017, pp. 386–397.
-  J. Xu, L. Chen, and S. Ren, “Online learning for offloading and autoscaling in energy harvesting mobile edge computing,” IEEE Transactions on Cognitive Communications and Networking, vol. 3, no. 3, pp. 361–373, Sept 2017.
-  J. Xu, B. Palanisamy, H. Ludwig, and Q. Wang, “Zenith: Utility-aware resource allocation for edge computing,” in 2017 IEEE International Conference on Edge Computing (EDGE), June 2017, pp. 47–54.
-  T. Y. He, N. Zhao, and H. Yin, “Integrated networking, caching and computing for connected vehicles: A deep reinforcement learning approach,” IEEE Transactions on Vehicular Technology, vol. PP, no. 99, pp. 1–1, 2017.
-  S. Ranadheera, S. Maghsudi, and E. Hossain, “Computation offloading and activation of mobile edge computing servers:A minority game,” CoRR, vol. abs/1710.05499, 2017. [Online]. Available: http://arxiv.org/abs/1710.05499
-  D. Challet, M. Marsili, and Y. C. Zhang, Minority Games: Interacting Agents in Financial Markets. Oxford, UK: Oxford University Press, 2014.
-  M. Marsili, D. Challet, and R. Zecchina, “Exact solution of a modified El Farol’s bar problem: Efficiency and the role of market impact,” Physica A: Statistical Mechanics and its Applications, vol. 280, no. 3–4, pp. 522–553, 2000.
-  K.-m. Lam and H.-f. Leung, “An adaptive strategy for minority games,” in Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, ser. AAMAS ’07. New York, NY, USA: ACM, 2007, pp. 194:1–194:3.
-  G. Reents, R. Metzler, and W. Kinzel, “A stochastic strategy for the minority game,” Physica A: Statistical Mechanics and its Applications, vol. 299, no. 1, pp. 253 – 261, 2001, application of Physics in Economic Modelling.