I Introduction
The proliferation of mobile devices and the advancement of Internet of things are promoting the emergence of resourceintensive and delaysensitive mobile applications, such as objective recognition, augmented reality, and mobile gaming. Mobile cloud computing proposes to offload these applications to central clouds, which, however, suffers from the uncontrolled wide area network delay and is hard to guarantee the quality of service of delaysensitive applications [1, 2, 3]. Moreover, according to the prediction of Cisco, the growth rate of mobile data required to be processed will far exceed the capacity of central clouds in 2021 [4]. Limiting the outsourcing traffic to central clouds becomes a critical concern of network operators. Mobile edge computing has emerged as a promising solution to addressing above concerns [5, 6]. A typical form of mobile edge computing is to endow mobile base stations (also named as edge nodes) with cloudlike functions by deploying storage and computation capacities distributedly. Through caching the services (including the program codes and the related databases) of mobile applications at edge nodes, mobile edge computing is able to process the corresponding computation tasks at network edge, benefiting from the reduced service response time and outsourcing traffic to central clouds.
Compared with mobile cloud computing which has elastic resource capacity, the main limitation of mobile edge computing is the limited resource capacities of edge nodes. When there is no cooperation among edge nodes, the edge resource capacities are prone to be underutilized for two reasons. First, the heterogeneity of edge resource capacities can cause resource underutilization. For an edge node that has insufficient storage capacity to cache a service or cannot provide sufficient computation capacity for an application, the corresponding computation tasks have to be outsourced to central clouds rather than to nearby powerful edge nodes, resulting in underutilization of edge resources [7]. Moreover, the inconsistence of storage and computation capacities of edge nodes further aggravates edge resource wasting. An edge node with large computation capacity cannot process substantial computation tasks when it has insufficient storage capacity to cache the services, leading to underutilization of edge computation capacities. To fully utilize both the storage and computation capacities of edge nodes, it is crucial to explore the potential of cooperation among edge nodes .
In this paper, we consider cooperation among edge nodes and investigate cooperative service caching and workload scheduling in mobile edge computing. As shown in Fig. 1, nearby edge nodes are connected by local area network or wired peertopeer connection [8]. For an edge node that is not caching a service or does not provide sufficient computation capacity, the corresponding computation tasks can be offloaded to nearby underutilized edge nodes that have cached the service or outsourced to the cloud. Through exploiting the cooperation among edge nodes, the heterogeneous edge resource capacities can be fully utilized and the resource capacity inconsistence of individual edge nodes can be alleviated. The existing work which considers edge cooperation and jointly optimizes service caching and workload scheduling has sought to maximize the overall requests served at edge nodes while ensuring the service caching cost within the budget [9, 10]. However, it is hard to determine the exact value of the budget in practical scenarios. Furthermore, while the reduced delay is the main advantage of mobile edge computing, the service response time is not considered as a performance criteria in the existing work. In this paper, we investigate the cooperative service caching and workload scheduling with the objective of minimizing the service response time as well as the outsourcing traffic (denoted as problem 1).
Solving this problem is challenging in three folds. First, service caching and workload scheduling are coupled. Service caching policies determine the decision space of workload scheduling, and in return, the workload scheduling results reflect the performance of the service caching policies. Solving problem 1 needs to consider the interplay between the two subproblems. Second, minimizing the service response time requires to properly trade off the computation and the transmission delay. While offloading computation tasks from overloaded edge nodes to nearby underutilized edge nodes is beneficial reduce the computation delay, task offloading causes additional transmission delay on LAN. Solving problem 1 optimally should deal with the computationcommunication tradeoff. Third, solving problem 1 needs to deal with the heterogeneity of edge resource capacities. Edge nodes are heterogeneous in both the storage and computation capacities. Optimizing problem 1 needs to balance the workloads among the heterogeneous edge nodes, causing exponential computation complexity. How to deal with edge heterogeneity and design algorithms with reduced computation complexity is challenging.
To deal with the challenge of subproblem coupling, we formulate problem 1 as a mixed integer nonlinear programming problem to jointly optimize service caching and workload scheduling. A twolayer Iterative Caching updatE (ICE) algorithm is designed to illustrate the interplay of the two subproblems, with the outer layer updating the edge caching policies iteratively based on Gibbs sampling (the service caching subproblem) and the inner layer optimizing the workload scheduling polices (the workload scheduling subproblem). To properly trade off the computation and communication delay, we use queuing models to analyze the delay in each part of the system and thereby compute the average service response time. A proper computationcommunication tradeoff can be achieved when the average service response time is minimized. To deal with the exponential complexity of workload scheduling caused by edge heterogeneity, we exploit the convexity of the workload scheduling subproblem and propose a heuristic workload scheduling algorithm with polynomial computation complexity based on the idea of water filling.
The contributions of this paper are summarized as follows:

We investigate cooperative service caching and workload scheduling in mobile edge computing, aiming at minimizing service response time and outsourcing traffic. We formulate this problem as a mixed integer nonlinear programming problem and show the nonpolynomial complexity by analyzing simplified cases of this problem.

We use queuing models to analyze the delay in each part of the system, based on which the convexity of the workload scheduling subproblem is proved.

We propose twolayer ICE to solve problem 1, with the outer layer updating the service caching policies iteratively based on Gibbs sampling and inner layer optimizing the workload scheduling policies. By exploiting the convexity of the workload scheduling subproblem, we further propose a heuristic workload scheduling algorithm with reduced computation complexity based on the idea of waterfilling.

We conduct extensive simulations to evaluate the effectiveness and the convergence of the proposed algorithm.
This paper is organized as follows. Section II reviews the related work. Section III analyzes the system model and provides problem formulation. In Section IV, algorithm design is presented in detail, and Section V illustrates simulation results. Finally, the concluding remarks are given in Section VI.
Ii Related Work
Mobile edge computing has been envisioned as a promising computing paradigm with the benefits of reduced delay and lower outsourcing traffic. Due to the limited storage and computation capacities of edge nodes, properly placing services of mobile applications and scheduling computation tasks among edge nodes are crucial to optimize the quality of services with high resource efficiency. There has been extensive work devoted to workload scheduling, service caching, or joint service caching and workload scheduling.
Although mobile edge computing enables mobile users to access powerful resources within onehop range [11], a lot of prior work has evolved to allow task offloading to edge nodes (or the remote cloud) within more than one hop and solve the workload scheduling problem. Online workload scheduling among edgeclouds has been studied in [12, 13] to accommodate dynamic requests in mobile edge computing In [14], Tong et al. have developed a hierarchical architecture of edge cloud servers and optimized workload placement in this architecture. Cui et al. [15] have proposed the software defined control over request scheduling among the cooperative mobile cloudlets. All the above work on workload scheduling has a common assumption that each edge node (also named as edgecloud or cloudlet) have cached all the services and can process any types of computation tasks, which is impractical due to the limited storage capacities of edge nodes. Service caching among edge nodes should also be taken into consideration.
Caching services at edge nodes is an effective approach to relieving the burden of backhual network and the central clouds, and increasing efforts are devoted to edge service caching. Borst et al. [16] have presented popularitybased distributed caching algorithms for content distribution network (CDN). Dynamic edge service caching has been extensively studied in [17, 18, 19, 20]. Predictionbased content placement has been investigated in [17] and approximations of dynamic content allocation have been provided for the hybrid system of cloudbased storage and CDN. Historybased dynamic edge caching have been proposed in [18] without predicting future requests or adopting stochastic models. In mobile edge computing, due to the limitation of both storage and computation capacities of edge nodes, service caching and workload scheduling should be jointly optimized to improve the system quality of service with high resource efficiency.
Joint optimization of edge caching and request routing for dataintensive applications (such as video streaming) has been studied in [21] to minimize the average access delay. Nevertheless, this work cannot be directly applied to applications which are both dataintensive and computationintensive (such as augmented reality) and need to consider computation delay at edge nodes. To address the above issue, joint optimization of service caching and workload scheduling have been investigated in [7, 9, 10]. The work [7] has jointly optimized service caching and task offloading without considering cooperation among edge nodes, which can lead to underutilization of heterogeneous edge resource capacities. The work [9] and [10] have investigated joint service caching and request scheduling without taking the service response time (including the transmission delay and the computation delay) as the performance criteria, which cannot highlight the benefit of reduced delay in mobile edge computing. Different from the existing work, we study the cooperative service caching and workload scheduling in mobile edge computing, aiming at minimizing the service response time and the outsourcing traffic to central clouds. We solve this problem by developing the iterative caching update algorithm based on Gibbs sampling and further proposing the heuristic workload scheduling algorithm with polynomial complexity based on the idea of water filling.
Iii System Model and Problem Formulation
Iiia System Model
In this paper, we investigate cooperative service caching and workload scheduling in mobile edge computing. As shown in Fig. 1, nearby edge nodes are connected by local area network or wired peertopeer connection. For an edge node that is not caching a service or does not provide sufficient computation capacity, the corresponding computation tasks can be offloaded to nearby underutilized edge nodes that have cached the service or outsourced to the cloud. We consider a multiedge system consisting of a set of edge nodes, each of which is equipped with the computation capacity () and storage capacity (). The system provides a library of services, such as mobile gaming, object recognition, video streaming, etc, which are differentiated by the computation and storage requirements. To process a type of mobile application at network edge, an edge node should provision certain storage capacity to cache the service of the application. Let be the required storage capacity to cache service . For each service
, we consider that the computation requests of the corresponding computation task (in CPU cycles) follow exponential distribution with the expectation of
, and the task arrival at each edge node is a Poisson process with the expected rate , which is a general assumption [7]. There is a centralized cloud with ample storage and computation capacity, thus the cloud stores all the services and the processing delay in the cloud is mainly caused by the transmission delay from edge nodes to the cloud.IiiA1 Edge Caching and Workload Scheduling Policies
Two questions should be answered in this study: 1) which edge nodes cache each type of service? and 2) how to schedule the computation workloads among the connected edge nodes that have cached the same services? We use two set of variables to model the edge caching and workload scheduling results: indicates whether service is cached at edge node , and represents the workload ratio of service that are executed at edge node . We refer by edge caching and workload scheduling
policies to the respective vectors:
(1) 
Denote by the caching decision of edge node , and the action space of , i.e., . The services cached at each edge node cannot exceed the storage capacity, i.e.,
(2) 
Let denote the workload ratio of service outsourced to the cloud, there is
(3) 
IiiA2 Service Response Time
Denote by the set of nearby edge nodes that have direct connection with edge node , and the transmission delay on LAN to edge node . The computation workload executed at edge node should be no more than the overall arriving tasks of nearby edge nodes,
(4) 
where is the overall computation workload of service in the system, i.e. . We can notice that if , all the tasks are from edge node ; Otherwise, the excessive tasks () are from nearby edge nodes.
At each edge node, the computation capacity is shared by the cached services. Let the function represent the computation allocation mechanism at edge node , i.e., the computation capacity allocated to service is . For each service , as the computation requests of the responding computation task follow exponential distribution, the serving time at edge node also follows exponential distribution with the expectation . Moreover, the task arrival of service at edge node is a Poisson process with the expectation . Thus for each service , the serving process of computation tasks at edge node can be modeled as an M/M/1 queue, and the computation delay is
(5) 
where . To ensure the stability of the queue, there should be
(6) 
By combining Eq. (4) and Eq. (6), is constrained as
(7) 
where .
When outsourcing tasks to the cloud, the processing time is mainly caused by the transmission delay in the core network. Similar as the task arrivals at edge nodes, the task arrival in the core network is also a Poisson process, with the expected rate . Let be the amount of transmission requests (e.g. input data) when outsourcing one unit of computation requests for service (in CPU cycle). Here, is a constant related to the specific service [8], [15]. Then for the service , the transmission requests of a corresponding task follow exponential distribution with the expectation . The transmitting time of a task in the core network also follows exponential distribution with the expectation , where represents the core network bandwidth to transmit service . Hence, the transmitting delay in the core network is given as
(8) 
where
(9) 
The average response time of service can be computed as a weighted sum of delay at each part of the system, including the computation delay at edge nodes, the transmission delay on LAN and the transmission delay to the cloud, i.e.,
(10) 
Here, represents the ratio of the workload offloaded to edge node from nearby edge nodes.
IiiB Problem Formulation
This paper jointly optimizes the edge service caching and workload scheduling policies, aiming at minimizing the service response time and the overall outsourcing traffic to the cloud:
(11)  
Here is a weight constant which is positively related to the transmitted data traffic when outsourcing tasks of service . Constraint C1 ensures the cached services at each edge node do not exceed the storage capacity. C2 is the combined result of Eq. (4) and Eq. (6), ensuring that each edge node only admits computation requests from nearby edge nodes, and the computation workload scheduled to each edge node does not exceed the computation capacity for each service.
IiiC Complexity Analysis
Problem P1 is a mixed integer nonlinear programming problem. In this section, we present the nonpolynomial computation complexity of P1 by analyzing the simplified cases including noncooperation among edge nodes and considering one single type of service.
IiiC1 Simplified Case 1: Noncooperation among Edge Nodes
In the first case, we assume that there is no cooperation among edge nodes. With this assumption, the computation tasks of different services are either processed locally or directly outsourced to the cloud. Thus, the computation tasks outsourced to the cloud are not only decided by the edge computation capacity, but also highly dependent on the storage capacity of each individual edge node. In this scenario, problem P1 is reduced to the service caching and task oursourcing problem, similar as [7]. Specifically, workload scheduling among edge nodes in P1 is reduced to independent task outsourcing subproblems. Each edge node only needs to decide the oursourced computation requests ( which is given as ) according to its own service caching policy and the computation capacity limitation. It is indicated in [7] that the reduced service caching and task outsourcing problem remains challenging since it is still a mixed integer nonlinear programming problem and has an nonpolynomial computation complexity.
IiiC2 Simplified Case 2: Considering One Single Type of Service
In this simplified case, we assume that only one single type of service is considered in the system. Then, the caching result at each edge node can be simply determined by the relationship of the service storage requirement and the edge storage capacity: The service is cached at one edge node if it has ample storage capacity; Otherwise, the service is not cached at the edge node. With this assumption (i.e., the service caching policy is given), problem P1 is reduced to a workload scheduling problem, which schedules computation workloads among the edge nodes that have sufficient storage capacity to cache the service.
Solving the workload scheduling problem is challenging in two aspects. First, edge nodes are heterogeneous in both computation task arrivals and edge computation capacities. Balancing the workloads among the heterogeneous edge nodes is critical to minimize the service response time and the outsourced traffic to the cloud, which, however, can cause exponential computation complexity when achieved in a centralized manner. Second, scheduling workloads among edge nodes should consider the computationtransmission tradeoff. Offloading computation tasks from overloaded edge nodes to nearby lightloaded edge nodes or to the cloud is beneficial to reduce the computation delay, but meanwhile causes additional transmission delay. Minimizing service response time demands to properly trade off the computation and transmission delay.
By summarizing the above two simplified cases of problem P1, both the reduced service caching and task outsourcing and workload scheduling problems have nonpolynomial computation complexity. Therefore, problem P1 also has nonpolynomial computation complexity and it is crucial to solve this problem with reduced computation complexity.
Iv Algorithm Design
As clarified in the above section, even the simplified cases of problem P1 remain to have nonpolynomial computation complexity. This section presents the main idea of algorithm design which jointly optimizes the service caching and workload scheduling policies with reduced computation complexity. Specifically, we design a twolayer Iterative Caching updatE algorithm (ICE), with the outer layer updating service caching policies based on Gibbs sampling [22]. In inner layer, the edge caching policies are given and problem P1 is reduced to the workload scheduling subproblem among the edge nodes that have cached a certain type of service (similar to Simplified case 2). We demonstrate the exponential computation complexity of the reduced problem with convexity analysis and further propose a heuristic workload scheduling algorithm (Algorithm 2) with reduced computation complexity based on the idea of water filling.
Iva Iterative Caching updatE Algorithm (ICE)
Gibbs sampling is a Monte Carlo Markov Chain technique, which can deduce the joint distribution of several variables from the conditional distribution samples. The main idea of Gibbs sampling is to simulate the conditional samples by sweeping through each variable while maintaining the rest variables unchanged in each iteration. The Monte Carlo Markov Chain theory guarantees that the stationary distribution deduced from Gibbs sampling is the target joint distribution
[23]. In this work, we exploit the idea of Gibbs sampling to determine the optimal service caching policies iteratively, as shown in Algorithm 1. The key point of the algorithm is to associate the conditional probability distribution of edge caching policies with the objective of
P1 (Step 7). Through properly designing the conditional probability in each iteration, the deduced stationary joint distribution can converge to the optimal edge caching policies with high probability.The ICE algorithm works as follows. In each iteration, randomly select an edge node () and a feasible edge caching decision while maintaining the caching decisions of the rest edge nodes unchanged (Step 3). With the given caching policies of all the edge nodes, P1 is reduced to the workload scheduling subproblem:
(12)  
After solving P2, we can compute the optimal objective value (defined as ). Assume that when the selected edge node changes its caching decision from to , the optimal objective value varies from to . Associate the conditional probability distribution of edge caching policies with the objective value as: the selected edge node changes its caching decision from to with the probability () and maintains the current caching decision with (Step 7). Finally, the iteration is ended if the stop criteria is satisfied.
ICE has the following property.
Theorem 1.
ICE can converge to the globally optimal solution of problem P1 with a higher probability as decreases. When , the algorithm converges to the globally optimal solution with the probability of 1.
Proof.
Please refer to Appendix A. ∎
Remark: Theorem 1 demonstrates that in each iteration of the Gibbs sampling technique, through properly selecting in which associates the service caching update process with the objective value, the algorithm can converge to the optimal edge caching policy with high probability.
IvB Heuristic Workload Scheduling Algorithm
When the edge caching policy is given, problem P2 should be solved to compute the optimal workload scheduling policy and the corresponding object value. In this part, we first demonstrate the exponential complexity of P2 through theoretical analysis and further propose a heuristic workload scheduling algorithm by exploiting the convexity of the problem.
IvB1 Computation Complexity of P2
Theorem 2.
Problem P2 is a convex optimization problem over the workload scheduling policy .
Proof.
Please refer to Appendix B. ∎
A convex optimization problem can be solved by searching for results satisfying the KarushKuhnTucker (KKT) conditions [24]. We first provide the KKT conditions of P2. When the caching policy is given, the computation resources allocated to each service are determined according to . Thus for one service, the workload scheduling policy among edge nodes that have cached the service is independent of the other services. Solving problem P2 is equivalent to optimizing the workload scheduling policy for each type of service. Task a service () as the representative. Define the Lagrange function as
(14)  
where and are Lagrange multipliers, and is the upper bound of the inequation constraints defined as () and .
Then the KKT conditions are given as
(15)  
Here, (C4), (C5) and (C6) arise from the inequation constraints of P2. For each inequation constraint, there are two possible results in Eq. (15): 1) (or ), indicating that the optimal results are at the extreme points derived from (C1); 2) , (or ), indicating the optimal results are at the boundary. As there are inequation constraints in problem P2 (i.e., the computation capacity constraints of edge nodes), directly searching for the results satisfying the KKT conditions can cause computation complexity . To reduce the computation complexity of P2, we propose the heuristic workload scheduling algorithm.
IvB2 Algorithm Design
The main idea of the algorithm is to first remove the computation capacity constraints of edge nodes and the transmission bandwidth constraint of the core network (i.e., the inequation constraints in P2) to derive the correlation of workload scheduling results of edge nodes and the cloud. Then we search for the optimal results satisfying the KKT conditions within the resource constraints.
When removing the inequation constraints, the KKT conditions only keep (C1) and (C3) in (15), with (C1) changed to
(16) 
for each . However, is not partially derivable over when (), which is caused by in (10). We solve this problem by dividing into two cases: and , and () can be derived as
(17) 
and is given as
(18) 
After removing the inequation constraints, the workload scheduling policy is given as the functions of ( Eq. (17), (18)) to satisfy (C1) in the KKT conditions. To obtain the optimal solution of which satisfies the equation constraint and the inequation constraints in P2, we search for the workload scheduled to () based on the idea of water filling. As shown in Fig. 2, scheduling workloads to edge nodes (or the cloud) is similar to filling water to tubes. When the water level is above the upper bound or beneath the lower bound of the tube, the water cannot be decreased or increased anymore. By combining Eq. (17), (18) with (C2), and we have the following conclusion: Let , then is constant or monotone decreasing with (The proof is omitted). Thus, we can search the optimal by the bisection method with the details summarized in Algorithm 2.
V Simulation Results
In this section, extensive simulations are conducted to evaluate our algorithms. We simulate a 100m
100m area covered with 12 edge nodes which serves a total of 8 services. The edge nodes are empowered by heterogeneous storage and computation capacities, both of which follow uniform distribution. The total arrival rates of computation tasks at different edge nodes
() are uniformly distributed. At each edge node, the popularity of services follow Zipf’s distribution, i.e., , where is the rank of service andis the skewness parameter
[10]. Thus, the arrival rate of computation tasks of service at edge node can be computed as , where is the total arrival rate of computation tasks at edge node . The main parameters are listed in Table I.Parameter  Value 

Service storage requirement,  [20, 80] GB 
Service computation requirement,  [0.1, 0.5] Giga CPU cycles/task 
Edge node storage capacity,  [100, 200] GB 
Edge node computation capacity,  [50, 100] Giga CPU cycles 
Data transmission ratio of service,  [0.1,1.0] Mb/GHz 
Core network bandwidth for service,  160 Mbps 
Skewness parameter,  0.5 
Smooth parameter, 
We compare the performance of ICE with two benchmark algorithms.
Noncooperation algorithm [8]: Edge nodes cache services according to Gibbs Sampling.
At each edge node, the computation workloads of a service are either processed locally or outsourced to the cloud.
Greedy algorithm: Edge nodes cache services according to popularity.
Popular services have higher priority to be cached at edge nodes.
For the cached services, each edge node optimizes the workloads processed locally and outsourced to the cloud to minimize the edge process delay and the outsourcing traffic.
Va Performance Comparison
We compare the three algorithms in terms of the objective value, total service response time and outsourcing traffic by varying the average arrival rate of tasks at edge nodes (i.e., average ), and the results are shown in Fig. 3.
Compared with the Noncooperation algorithm and the Greedy algorithm, our Cooperation algorithm always yields the minimum object value and outsourcing traffic, and close to minimum total service response time . In the Greedy algorithm, all the edge nodes cache the popular services with high priority, thus the computation tasks of less popular services have to be outsourced to the cloud. Moreover, the Greedy algorithm only relies on service popularity to determine the edge caching policy without considering storage requirements of services. Caching multiple less popular services with low storage requirements at edge nodes can be more beneficial to fully utilize both the computation and the storage capacities compared with caching one popular service with large storage requirement. The Cooperation and Noncooperation algorithms cache services based on Gibbs sampling, taking both the storage requirements of services and service popularity into consideration. Therefore, the Greedy algorithm generally induces more outsourcing traffic and service response time than the other two algorithms. The Noncooperation algorithm cannot fully utilize the computation capacities of edge nodes which has low storage capacity due to the absence of cooperation among edge nodes. In the Cooperation algorithm, both the storage and computation capacities of edge nodes can be coordinated and fully utilized through careful design of service caching and workload scheduling among the connected edge nodes.
VB Convergence of ICE
According to the theoretical analysis in Theorem 1, the Gibbs sampling based service caching algorithm (Algorithm 1) can converge to the optimal service caching results with probability 1 when the smooth parameter is close to 0. This part illustrates the influence of on the convergence of ICE with the results shown in Fig. 4.
As shown in Fig. 4, the objective value can converge to the nearoptimal results when , and the converging rate is faster as decreases. When , the objective value converges slowly to higher value () or even cannot converge (). These results can be explained by Step 7 of ICE and Eq.(23). According to ICE, the smaller is , the more probable that the selected edge node updates to the better caching decision in each iteration. Thus, when is small, the objective value converge quickly (within less iterations). In addition, it can be concluded from Eq.(23) that stationary probability of the optimal caching result increases with , and the probability when . Therefore, the smaller is , the more probable that ICE converges to the optimal caching result.
VC The Impact of Edge Node Connectivity
This part analyzes the impact of edge node connectivity on the performance of ICE. As shown in Fig. 5, the system with all the edge nodes connected converges to the minimum objective value while the system with no edge nodes connected has the highest objective value. In the system with all the edge nodes connected, the benefits of cooperation can be achieve at system level through scheduling workloads among all the edge nodes. When edge nodes are partially connected, the cooperation benefits can only be explored within clusters (the edge nodes within a cluster are connected and different clusters are not connected with each other). Therefore, the higher extent that edge nodes are connected with each other, the more cooperation benefits can be achieved by ICE.
Vi Conclusions
In this paper, we have investigated cooperative service caching and workload scheduling in mobile edge computing. Based on queuing analysis, we have formulated this problem as a mixed integer nonlinear programming problem, which is proved to have nonpolynomial computation complexity. To deal with the challenges of subproblem coupling, computationcommunication tradeoff and edge node heterogeneity, we have proposed ICE based on Gibbs sampling to achieve the nearoptimal service caching policy in an iterative manner. We have further presented a waterfilling based workload scheduling algorithm, which has polynomial computation complexity. Extensive simulations have been conducted to evaluate the effectiveness and convergence of the proposed algorithm, and the impact of edge connectivity is further analyzed.
Appendix A Proof of Theorem 1
Let be the caching decision space of edge nodes, and in each iteration, a random edge node randomly chooses a caching decision from . With Algorithm 1 iterating over the edge nodes and the caching decision space, the edge caching policy evolves as a dimension Markov chain, in which each dimension represents the caching decision of each edge node. For the convenience of presentation, we analyze the scenario with 2 edge nodes, and the 2dimension Markov chain is denoted as . In each iteration, one randomly selected edge node () virtually changes its current caching decision to a random caching decision from , thus there is
(19) 
where is the objective value when the caching policy is . In this scenario, . Denote by the stationary probability distribution of caching policy , then can be derived by the fine stationary condition of the Markov chain as
(20) 
Substitute (19) into (20), it can be derived that
(21) 
It can be observed that Eq. (21) is symmetric and can be balanced if has the form of , where is a constant. Let be the caching policy space. To ensure that , the stationary probability distribution should be given as
(22) 
Eq. (22) can be rewritten as
(23) 
Let be the globally optimal solution that minimizes the objective value, i.e., for any . It can be concluded that increases as decreases, and when .
Appendix B Proof of Theorem 2
An optimization problem should satisfy that the objective function and the inequation constraint functions are convex, and the equation constraint function is affine over the decision variables. It is easy to identify that the inequation and equation constraint functions satisfy these conditions. We just need to prove the convexity of the objective function.
In Eq. (13), it is intuitive that and are convex over . Let . Denote by the Hessian matrix of , and (, ) can be given as
(24) 
In Eq. (24), if , and otherwise, . Therefore, is a positive definite matrix, and is convex over [24]. The objective function is the sum of several convex functions over , so is also convex over . Thus, we can conclude that problem P2 is a convex optimization problem over the workload scheduling policy .
References
 [1] E. Cuervo, A. Balasubramanian, D. Cho, A. Wolman, S. Saroiu, R. Chandra, and P. Bahl, “Maui: making smartphones last longer with code offload,” in Proc. ACM International Conference on Mobile Systems, Applications, and Services (MobiSys’10), 2010, pp. 49–62.
 [2] B. G. Chun, S. Ihm, P. Maniatis, M. Naik, and A. Patti, “Clonecloud: elastic execution between mobile device and cloud,” in Proc. ACM European Conference on Computer Systems (EuroSys’11), 2011, pp. 301–314.
 [3] M. Satyanarayanan, P.Bahl, R. Caceres, and N. Davies, “The case for vmbased cloudlets in mobile computing,” IEEE Pervasive Computing, vol. 8, no. 4, pp. 14–23, 2009.
 [4] Cisco, “Cisco global cloud index: Forecast and methodology, 20162021,” White Paper, 2018.
 [5] ETSI. Mobile edge computing (mec); framework and reference architecture, etsi gs mec 003 v1.1.1, 2016.
 [6] K. Ha, P. Pillai, W. Richter, Y. Abe, and M. Satyanarayanan, “Justintime provisioning for cyber foraging,” in Proc. ACM International Conference on Mobile Systems, Applications, and Services (MobiSys’13), 2013, pp. 153–166.
 [7] J. Xu, L. Chen, and P. Zhou, “Joint service caching and task offloading for mobile edge computing in dense networks,” in IEEE Conference on Computer Communications (INFOCOM’18), 2018, pp. 207–215.
 [8] L. Chen, S. Zhou, and J. Xu, “Computation peer offloading for energyconstrained mobile edge computing in smallcell networks,” IEEE/ACM Transactions on Networking, vol. 26, no. 4, pp. 1619–1632, 2018.
 [9] T. He, H. Khamfroush, S. Wang, T. La Porta, and S. Stein, “It’s hard to share: joint service placement and request scheduling in edge clouds with sharable and nonsharable resources,” in IEEE International Conference on Distributed Computing Systems (ICDCS’18), 2018, pp. 365–375.
 [10] V. Farhadi, F. Mehmeti, T. He, T. La Porta, H. Khamfroush, S. Wang, and K. S. Chan, “Service placement and request scheduling for dataintensive applications in edge clouds,” in IEEE Conference on Computer Communications (INFOCOM’19), 2019, pp. 1279–1287.
 [11] B. Liang, Mobile edge computing, V. W. S. Wong, R. Schober, D. W. K. Ng, and L.C. Wang, Eds. Cambridge University Press, 2017.
 [12] L. Wang, L. Jiao, J. Li, and M. Mühlhäuser, “Online resource allocation for arbitrary user mobility in distributed edge clouds,” in IEEE International Conference on Distributed Computing Systems (ICDCS’17), 2017, pp. 1281–1290.
 [13] H. Tan, Z. Han, X.Y. Li, and F. C. Lau, “Online job dispatching and scheduling in edgeclouds,” in IEEE Conference on Computer Communications (INFOCOM’17), 2017, pp. 1–9.
 [14] L. Tong, Y. Li, and W. Gao, “A hierarchical edge cloud architecture for mobile computing,” in Proc. IEEE International Conference on Computer Communications (INFOCOM), 2016, pp. 1–9.
 [15] Y. Cui, J. Song, K. Ren, M. Li, Z. Li, Q. Ren, and Y. Zhang, “Software defined cooperative offloading for mobile cloudlets,” IEEE/ACM Transactions on Networking, vol. 25, no. 3, pp. 1746–1760, 2017.
 [16] S. Borst, V. Gupta, and A. Walid, “Distributed caching algorithms for content distribution networks,” in IEEE Conference on Computer Communications (INFOCOM’10), 2010, pp. 1–9.
 [17] G. Dán and N. Carlsson, “Dynamic content allocation for cloudassisted service of periodic workloads,” in IEEE Conference on Computer Communications (INFOCOM’14), 2014, pp. 853–861.
 [18] I. Hou, T. Zhao, S. Wang, K. Chan et al., “Asymptotically optimal algorithm for online reconfiguration of edgeclouds,” in ACM International Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc’16), 2016, pp. 291–300.
 [19] S. Wang, R. Urgaonkar, T. He, K. Chan, M. Zafer, and K. K. Leung, “Dynamic service placement for mobile microclouds with predicted future costs,” IEEE Transactions on Parallel and Distributed Systems, vol. 28, no. 4, pp. 1002–1016, 2016.
 [20] Q. Zhang, Q. Zhu, M. F. Zhani, R. Boutaba, and J. L. Hellerstein, “Dynamic service placement in geographically distributed clouds,” IEEE Journal on Selected Areas in Communications, vol. 31, no. 12, pp. 762–772, 2013.
 [21] M. Dehghan, B. Jiang, A. Seetharam, T. He, T. Salonidis, J. Kurose, D. Towsley, and R. Sitaraman, “On the complexity of optimal request routing and content caching in heterogeneous cache networks,” IEEE/ACM Transactions on Networking, vol. 25, no. 3, pp. 1635–1648, 2017.

[22]
S. M. Lynch,
Introduction to Applied Bayesian Statistics and Estimation for Social Scientists
. Springer, 2007.  [23] W. R. Gilks, S. Richardson, and D. J. Spiegelhalter, Markov Chain Monte Carlo in Practice. Chapman and Hall, 1996.
 [24] S. Boyd and L. Vandenberghe, Convex optimization. Cambridge university press, 2004.