A growing challenge for mobile computing is the proliferation of data/computation-intensive and delay-sensitive applications, such as cognitive assistance, real-time video/audio processing, and augmented reality (AR). On the one hand, running these applications completely within mobile devices may be infeasible due to the limited computation, storage, and battery capacity of such devices. On the other hand, offloading computation tasks of these applications to remote data centers may result in excessive end-to-end latency and hence poor user experience.
Such a dilemma has given rise to the popularity of mobile edge computing [1, 2]. In mobile edge computing, edge servers are deployed close to wireless base stations. These servers can host some popular services and process the corresponding computation tasks directly without having to forward them to remote data centers. Due to their close proximity to end users, edge servers are able to provide these services with much lower latency.
Despite the obvious advantage of mobile edge computing, there remain multiple important challenges that need to be addressed. First, edge servers can often host only a small number of services and process requests at a moderate rate due to their resource constraints. Second, mobile users generate requests for services in arbitrary, and typically time-varying, patterns. Without knowledge of future requests, an edge server needs to make decisions on which services to host (or cache) and whether to process a request locally at the edge. Third, it is typically time-consuming and expensive for an edge server to change the set of services it hosts, which would involve downloading all necessary data from a remote data center and setting up appropriate virtual machines or containers. The switching cost of changing cached services needs to be explicitly taken into account.
Most existing works only focus on one or two challenges above. Some studies assume that the edge server has infinite computation power, and therefore only address the caching problem. For example, Cao et al.  maximize the expected revenue of a service provider by optimizing the content caching strategy. Krolikowski et al.  aim to maximize the traffic offloading with minimum caching cost based on the distribution of requests. Bao et al. 
propose a policy that jointly optimizes file caching probabilities and the request limit. Zhaoet al.   study an online caching problem by explicitly taking the switching cost into account. Many other studies focus on request processing and offloading, but not service caching. For instance, Wang et al.  propose an algorithm that optimize the edge cloud resource allocation. Li et al.  propose an optimization framework that minimizes energy consumption and response delay in mobile edge computing. Some other papers consider joint designs of service caching and request processing under the assumption that the request arrival patterns are either predictable or follow a certain stationary random process. For example, Ning et al.  and Xu et al.  propose online algorithms for optimizing service caching and request routing based on a Lyapunov optimization framework.
In this paper, we aim to address all three challenges by proposing online algorithms for (1) dynamic service caching, which determines the set of services to be hosted at the edge, and (2) service routing, which determines whether to process a request at the edge or route it to a data center. For the design and analysis of online algorithms, we propose an analytical model that jointly considers the unknown future requests, the storage capacity at the edge server, the load-dependent queueing and processing delay of the edge server, and the switching cost of modifying the set of cached services.
We note that there is a natural timescale separation between service caching and service routing, where the former is a much slower operation than the latter. Using this observation, we formulate a two-stage online optimization problem where the decision of service caching is updated periodically, while the decision of service routing can be changed in real time. The two-stage structure without future requests patterns differentiates our online optimization problem from most existing ones.
To solve this two-stage online optimization problem, we employ a fractional relaxation to turn the original problem into a convex one. We then propose a two-stage online policy. Our policy consists of two parts: The first part is a low-complexity algorithm that finds the optimal service routing in real time given the current service caching decision, and the second part is another low-complexity algorithm that updates the service caching decisions periodically by taking into account how service routing will react to future request arrivals. We theoretically prove that, even after taking the switching cost into account, our policy still achieves sublinear regret.
Furthermore, to make the fractional solution from our two-stage online policy implementable, we propose a randomized online algorithm. Under this randomized online algorithm, the probability that the edge server caches a service is the same as the fractional solution by the two-stage online policy. In addition, we prove that the switching cost of this randomized online algorithm is at most twice the switching cost of the two-stage online policy.
Our online algorithms are evaluated through simulations under various of scenarios. We compare them against two other algorithms, including an offline algorithm that knows all the request arrivals in advance. Simulation results show that our randomized algorithm performs much better than the other online algorithm, and performs virtually the same as the offline algorithm.
The rest of the paper is organized as follows: Section II presents our system model and formulates the two-stage online problem. Section III gives an overview of our solutions with sublinear regret and addresses some crucial challenges. Section IV provides our detailed solutions for the online problem. Section V presents a randomized integral solution that ensures integer solutions for the service caching problem. Section VI shows our simulation results under a variety of scenarios. Finally, Section VII concludes the paper.
Ii System Model
Ii-a System Overview
We consider an edge system with a backhaul connection. This edge system includes multiple clients, an edge server and remote data centers. Clients generate requests for different services according to some unknown and unpredictable patterns, and then send these requests to the edge server. We use to denote the total number of different services. The edge server may cache some services and process some requests for these services locally, while forwarding the remaining requests to remote data centers. Requests processed at the edge encounter a processing latency due to the limited computation capacity of the edge server, while requests forwarded to remote data centers encounter a forwarding latency due to network latencies. The goal of the edge server is to determine which services to be cached and which requests to be processed at the edge so as to reduce the total latencies experienced by all requests. Fig. 1 illustrates the topology of the system.
There are two practical challenges for service caching that makes this problem significantly different from traditional data caching. First, due to the limited computation capacity of the edge server, the processing latency increases as more requests are processed at the edge. As a result, the edge server may need to forward some requests for services it caches, especially when it is overloaded. Second, retrieving a new service can be a costly operation, which typically involves downloading codes and databases and setting up virtual machines or containers. These challenges need to be explicitly addressed for the solutions to be practical.
Ii-B Service Caching and Processing
We assume that time is slotted and the system runs for time slots. Each time slot is denoted by . The duration of a time slot is chosen so that, in any given time slot, the patterns for the service requests (originating from the different clients) remain roughly the same.
Since the clients’ requests pattern may be different in different time slots, the edge server may change the services it caches in each time slot. As discussed earlier, changing cached services at the edge is a costly and slow process. Thus, we assume that the edge server can only change the services it caches at the beginning of each time slot.
Let be a binary decision variable that indicates whether the edge server will cache service at the beginning of the time slot , and let . To take into account the edge server’s limited storage capacity, we assume that the edge server can cache at most services, i.e.,
We call the problem of determining the service caching problem.
After the edge server determines at the beginning of time slot , it observes the requests from clients and calculates the request arrival rate of each service. We use to denote the request arrival rate of service in time slot , and let . We assume that an upper bound on the total arrival rate is known, that is,
During each time slot , the edge server needs to decide which requests to be processed locally. Due to the limited computation power of the edge server, it may not be desirable to process all requests for services that it caches. For a service , the edge server will process a fraction of the requests locally, and forward the remaining portion of the requests to the data center. Since the edge server can only process requests whose corresponding services have already been cached at the edge, we require that
Let . We call the problem of determining the service routing problem. Since the edge server can adjust service routing in real time, we consider that the edge server determines after it observes .
Ii-C Cost and Problem Formulation
The goal of the edge server is to minimize the total cost of the system, which consists of switching cost and latency cost.
The switching cost refers to the operation cost incurred when the edge server changes the set of services being cached at the beginning of each time slot. Since the total switching cost relates to the number of services the edge server changes, we assume that every cached service change incurs a cost of . Hence, the total switching cost over time slots is .
The latency cost refers to the total latency experienced by all requests. In the system, when a request is forwarded to the remote data center, it experiences a forwarding latency, which is denoted as for service . When a request is processed at the edge, it experiences a computation latency due to the limited computation power of the edge server. It is reasonable to assume that the per-request computation latency at the edge depends on the total computation load, and can be described by a convex, increasing, and differentiable function with and . The assumption indicates that the computation latency is smaller than the forwarding latency when the edge server is lightly loaded. Since the total computation load at the edge server is , the total latency of all requests can be written as
Under the above assumptions on the function , it can be verified that is a convex function on .
As a result, the total cost over whole time slots can be written as . The edge server aims to find and that minimize the total cost. To make the optimization problem convex, we further relax the integer constraint on and allow to be any real number in . After the relaxation, the (offline) problem of minimizing the total cost can be written as follows
Here, note that a fractional can be interpreted as the probability that the edge server caches service in time slot . In Section V, we will propose a randomized algorithm such that the probability of caching in time slot is exactly .
While the above problem is a standard convex optimization problem, solving it requires the knowledge of all request arrival rates in every time slot, that is, all , in advance. In practice, however, the edge server needs to make service caching and routing decisions, which is and , without the knowledge of future arrival rates. Moreover, as noted before, the service caching problem and the service routing problem operate on different timescales. The edge server needs to decide at the beginning of each time slot , without any knowledge about arrival rate in this time slot. In contrast, as changing routing decisions is a fast operation and the request arrival rates remain the same within a time slot, the edge server can decide the value of
after observing the first few requests in a time slot to estimate the arrival rate.
Based on these observations, we formulate an online service caching and routing problem as shown below. An algorithm for this problem is called an online algorithm.
Online Service Caching and Routing Problem
The performance of an online algorithm is evaluated by comparing its cost against the cost of the optimal offline algorithm that knows all in advance. We assume that the optimal offline algorithm needs to choose a fixed solution for service caching, but can change its solution for service routing dynamically, as the latter is a fast operation. We formally define the optimal offline algorithm as follows:
Definition 1 (Optimal offline algorithm)
The optimal offline algorithm is the algorithm that, after knowing for all , chooses a non-negative vector
, chooses a non-negative vectorwith and , and non-negative vectors with , that minimize .
The goal of this paper is to find an online algorithm with provably small regret under any sequence of arrival rates. The regret is defined as the difference of cost under the online algorithm and that under the optimal offline algorithm, given a sequence of arrival rates . Specifically, let and be the solutions produced by an online algorithm, then the regret of this online algorithm is denoted by
Iii Solution Overview
In this section, we describe a framework for solving the online service caching and routing problem based on the well-known online gradient descent method. We also highlight challenges that need to be addressed before we can employ this framework.
We first consider the service routing problem, which entails finding after is fixed and is observed. Since and are known, finding the optimal is equivalent to solving the following standard convex optimization problem:
We first show that defined above is a convex function by using the following lemma; see e.g., [12, §3.2.5].
If a function is convex in and , and is a convex set, then the function is convex in , provided that for all .
is convex in .
Since is a convex function, the set is convex, and , we conclude that is convex based on Lemma 1.
As a result, the online service caching problem, which entails finding before is revealed, can be expressed as the following online convex optimization problem.
Online Service Caching at the Edge
There exists many online algorithms for solving the above online convex optimization problem. In this paper, we employ the Online Gradient Descent with Lazy Projections for finding . Combing it with the solution of , we propose the following Online Gradient Descent with Routing (OGDR) algorithm. In the pseudocode, is the subgradient of the function , is the step size, and is an internal vector with .
We now establish a regret upper bound for OGDR.
If is upper-bounded by for all , then, by choosing , the regret of OGDR is bounded by .
The proof of upper bound of the regret is similar to the proofs in . The main difference is that we also need to incorporate the switching cost.
Recall that is the optimal offline solution for the service caching problem. Let be the solutions produced by OGDR and be the solutions of (9). The regret is then defined by
We first obtain a bound for . By Corollaries 2.13 and 2.17 in  and the definition of , we have
Next, we obtain a bound on the switching cost. We have
Thus, choosing , we have:
Theorem 1 shows that OGDR can achieve a regret bound that is sublinear with . However, there are two major challenges that need to be addressed before one can implement OGDR:
Running OGDR requires solving the routing problem in (9)–(10) and finding , both of which depend on . Since these two terms cannot be calculated until the edge server observes the request arrival rates, we need low-complexity algorithms for calculating them. In particular, we note that , being the infimum of , might not have a closed-form expression, and therefore its gradient can be difficult to characterize. We also need to characterize the upper-bound of .
Under OGDR, can be fractional. Of course, most practical systems cannot cache a partial service. A commonly-used approach of obtaining integer solutions from fractional is to employ randomized algorithm and cache a service with probability independently from any prior events. However, such an approach would result in a switching cost of , which can be much larger than . We need randomized algorithms whose switching cost is close to .
We address these two challenges in the next two sections.
Iv Low Complexity Algorithms for OGDR
In this section, we develop a low-complexity algorithm for implementing OGDR.
Iv-a Algorithm for Service Routing
Without loss of generality, we assume that . Recall that the service routing problem is
where the latency cost is given in (4) with partial derivatives computed as , where
In the following, we show that this routing problem can be solved efficiently by using its KKT conditions. Let and be the Lagrange multipliers associated with (15) and (16), respectively. Thus, the KKT conditions for (14) – (16) are
Using the property that for all , we propose Alg. 2 below to find , and that satisfy the KKT conditions, and thereby solving the service routing problem.
For the algorithm above, we have following results.
It is obvious that Alg. 2 satisfies conditions (17) – (19). Thus, we only need to show that it also satisfies the condition (20). In particular, we need to prove the following two claims: If , then . If and , then .
We first consider the case . Let be the last service that chooses . Since , we have and hence . By the design of Steps 1 – 9 in Alg. 2, we have when Steps 1 – 9 are completed. This proves the first claim.
Next, we consider the case and . By the design of Steps 1 – 9 in Alg. 2, we have by the end of the -th iteration of the for loop. Since can only be increased in latter iterations and is an increasing function, we have when Steps 1 – 9 are completed. This proves the second claim.
In addition to solving the service routing problem, Alg. 2 also produces .
Let be the solution given by Alg. 2. A subgradient of at is , where
Moreover, we have the following bound:
Iv-B Regret and Complexity Analysis
Since we have , we can obtain the following regret bound.
It remains to determine the complexity of step 2 in Alg. 1, which entails finding the projection of a vector onto the set . To this end, for any vector , define as the projection of onto , i.e., . It then follows from [14, p. 150] that we can compute as
where is any positive root in the interval of the nonincreasing function
This can be done efficiently using a bisection method. Thus, it remains to bound . Note from line 4 of Alg. 1 and (21) that . Thus, given a desired root finding accuracy , the number of bisection steps is bounded by , where each step costs only .
Let us now show that using this approximate projection in Alg. 1 adds only a negligible error to our regret bound. Following the proof of Theorem 1 and considering approximation errors, the bounds in (11) and (12) are replaced by
Take where is an arbitrary small constant, the regret bound then becomes . Note that the number of bisection steps is .
Thus, we conclude that complexity of Alg. 1 with approximate projection is per time slot.
V Randomized Algorithm for Service Caching
The online algorithms for finding as proposed in Alg. 1 may produce fractional solutions. When is fractional, the value can be interpreted as the probability that the edge server caches service at time
. In this section, we propose a randomized algorithm that satisfies this probability interpretation while guaranteeing a provably small switching cost.
V-a Randomized Algorithm
The basic idea of our randomized algorithm is to simultaneously maintain sample paths, where each sample path represents a probability mass of . We then quantize each into a multiple of . Specifically, let be the quantized version of , we then require that to be a non-negative integer and .
Let be the indicator function that service is cached at the edge at time in sample path . Let be the vector . In every time slot , our randomized algorithm receives from Alg. 1. The randomized algorithm then construct based on and to ensure three properties: First, the probability of caching service is indeed , that is, . Second, the storage capacity constraint is satisfied for all sample paths, that is, . Third, the expected switching cost, which can be expressed as , is bounded. Let be the difference between and . Alg. 3 shows the complete randomized algorithm, including all decisions on service caching and routing.
V-B Performance Analysis
First, we consider the influence of Alg. 3 on the switching cost, which is shown below.
The expected switching cost at each time slot in Alg. 3 is at most .
As the switching cost only happens when we change , we aim to bound the number of changes in . Under Alg. 3, can be changed either in lines 8 – 12 or in lines 14 – 18. In lines 8 – 12, the total number of changes is . Moreover, every change in lines 8 – 12 can result in at most two changes in lines 14 – 18. Hence, the toal number of changes in lines 14 – 18 is at most .
Thus, the maximum number of changes in Alg. 3 is over all sample paths. Since each sample path represents a probability mass of , the expected switching cost is at most .
Then, we analyze the complexity of Alg. 3. Since and , at most variables will be increased to 1 and at most variables will be decreased to 0 in Steps 7–13. This is a total of changes. To implement the while loop in Steps 14 – 18, we can first divide all sample paths into three groups: those with , those with , and those with . Then, Step 15 is a operation. Step 16 takes time. We note that each increase in Steps 7 – 13 will result in at most one iteration of the while loop in Steps 14 – 18. Hence, steps 14 – 18 will be executed at most times and the overall complexity of this while loop is .
Thus, the complexity of Alg. 3 is per time slot.
Vi Simulation Results
Stationary Offline Policy (SOP): This algorithm is based on the Iterative Caching updatE algorithm (ICE) proposed by Ma et al. . This is an offline policy that has knowledge of all future arrival rates. In the context of this work, ICE is equivalent to one that caches the same services with the largest in all time slots. Since ICE does not consider the routing problem, we will employ the optimal routing decisions for ICE.
Online Gradient Ascent (OGA): This is an algorithm proposed by Paschos et al. . It uses online gradient ascent by setting the gradient as the vector in each time slot for the service caching problem. Since OGA does not consider routing procedure, we apply our routing policy in this algorithm to obtain its best performance. OGA produces fractional and its cost is based on the fractional solutions.
We model computation latency at the edge server by assuming that the edge server operates like a queueing system with service rate . Thus, from , we have .
An important parameter of our online algorithms is the step size . While we have demonstrated a specific choice of that leads to sublinear regret, we note that this choice may be too conservative because it is based on the upper-bound of and the total number of time slots . In our simulations, we use the time-average empirical value of to determine the step size. Besides, we change the term to . Effectively, this means we aim to achieve a good performance over a time horizon of 50 slots. Specifically, let , we choose the step size to be in time slot .
In addition, we select the forwarding latency
as a uniform random variable between 0.01 and 0.1 seconds, the accuracy parameter, and vary the values of , and .
Our simulations consist of two scenarios. The first scenario is based on a Google Trace data set from , containing a sequences of different service requests which we consider as the trace of request arrivals. This data set includes more than three million requests for unique services within a seven-hours timespan. As time is slotted in the data set by 300 seconds, which is a large jump, we divide each interval into 60 different part with equal number of requests following the original sequence. Thus, the duration of a time slot in our experiments is five seconds. In addition, the upper bound for number of requests in each time slot is . The second scenario is based on a synthesis trace consisting of 500 services. Each service generates requests periodically, with different services having different periods.
The simulation results for the two scenarios are shown in Fig. 2 and Fig. 3, respectively. Several important observations can be made. First, our RSCR outperforms OGA significantly in all settings. While both RSCR and OGA are based on online gradient methods, RSCR is able to achieve better performance because it explicitly considers the processing latency at the edge server. This result shows that any online algorithm for edge computing needs to address both memory and computation power constraints of edge servers. Second, our RSCR outperforms SOP in Fig. 3 and has virtually the same performance as SOP in Fig. 2. This result is very surprising when one considers that SOP is an offline policy that knows all future request arrivals.
Finally, we note that RSCR and OGDR have very similar performance in all cases. OGDR produces fractional solutions for the service caching problem, and then RSCR transforms such fractional solutions into randomized solutions with integer solutions on every sample path. As discussed in Section V, by carefully choosing which services to host at the edge on every sample path, RSCR is able to incur a switching cost that is at most three times larger than the switching cost of OGDR. Our simulation results further show that the overall costs of RSCR and OGDR are almost identical in practical scenarios.
This paper studies the problem of service caching and routing without any knowledge about future requests. Motivated by a practical timescale separation, we formulate this problem as a two-stage online optimization problem that jointly considers the storage and computation constraints of the edge server, as well as the switching cost. We propose a low-complexity online algorithm for this problem that achieves sublinear regret bounds under a fractional relaxation. We further introduce a randomized algorithm that is guaranteed to produce integer solutions with provably small switching cost. Simulation results demonstrate that our RSCR and OGDR algorithms have similar or even better performance compared to other recent proposed policies.
-  S. Kitanov, E. Monteiro, and T. Janevski, “5G and the fog — survey of related technologies and research directions,“ in 18th Mediterranean Electrotechnical Conference (MELECON), 2016, pp. 1-6.
M. T. Beck, M. Werner, S. Feld, and T. Schimper, “Mobile edge
computing: A taxonomy,” in
Proc. of the Sixth International Conference on Advances in Future Internet.Citeseer, 2014.
-  X. Cao, J. Zhang, and H. V. Poor, “An optimal auction mechanism for mobile edge caching,” in IEEE 38th ICDCS, 2018, pp. 388–399.
-  J. Krolikowski, A. Giovanidis, and M. D. Renzo, “Optimal cache leasing from a mobile network operator to a content provider,” in IEEE INFOCOM, 2018, pp. 2744–2752.
-  W. Bao, D. Yuan, K. Shi, W. Ju, and A. Y. Zomaya, “Ins and outs: Optimal caching and re-caching policies in mobile networks,” in Proc. of the Eighteenth ACM International Symposium on Mobile Ad Hoc Networking and Computing (Mobihoc), 2018, p. 41–50.
-  T. Zhao, I.-H. Hou, S. Wang, and K. Chan, “Red/led: An asymptotically optimal and scalable online algorithm for service caching at the edge,” in IEEE Journal on Selected Areas in Communications, vol. 36, no. 8, pp. 1857–1870, 2018.
-  I.-H. Hou, T. Zhao, S. Wang, and K. Chan, “Asymptotically optimal algorithm for online reconfiguration of edge-clouds,” in Proc. of the 17th ACM International Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc), 2016, p. 291–300.
-  L. Wang, L. Jiao, J. Li, and M. Muhlh ¨ auser, “Online resource allocation ¨ for arbitrary user mobility in distributed edge clouds,” in 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), 2017, pp. 1281–1290.
-  Y. Li, Y. Chen, T. Lan, and G. Venkataramani, “Mobiqor: Pushing the envelope of mobile edge computing via quality-of-result optimization,” in 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), 2017, pp. 1261–1270.
-  Z. Ning, K. Zhang, X. Wang, L. Guo, X. Hu, J. Huang, B. Hu, and R. Y. K. Kwok, “Intelligent edge computing in internet of vehicles: A joint computation offloading and caching solution,” in IEEE Transactions on Intelligent Transportation Systems, pp. 1–14, 2020.
-  J. Xu, L. Chen, and P. Zhou, “Joint service caching and task offloading for mobile edge computing in dense networks,” in IEEE INFOCOM, 2018, pp. 207–215.
-  L. V. S Boyd, Convex Optimization. U.K. Cambridge Univ. Press, 2004.
S. Shalev-Shwartz, “Online learning and online convex optimization.”
Foundations and Trends® in Machine Learning, vol. 4, pp. 1935–8237, 2012.
-  A. Beck, “First-order methods in optimization,” in SIAM, 2017.
-  X. Ma, A. Zhou, S. Zhang, and S. Wang, “Cooperative service caching and workload scheduling in mobile edge computing,” in IEEE INFOCOM, 2020, pp. 2076–2085.
-  G. S. Paschos, A. Destounis, L. Vigneri, and G. Iosifidis, “Learning to cache with no regrets,” in IEEE INFOCOM, 2019, pp. 235–243.
-  M. U. Thomas, “Queueing systems. volume 1: Theory (leonard kleinrock),” SIAM Review, vol. 18, no. 3, pp. 512–514, 1976
-  J. L. Hellerstein. Google cluster data. google research blog. Accessed: 2010, Jan. [Online]. Available: https://github.com/google/cluster-data/ blob/master/TraceVersion1.m