The “pay-as-you-go” cloud computing model has played a significant role for data storage and computation offloading in the past decade. Recently, with the proliferation of smart devices and the development of Internet of Things, many new computationally intensive applications, such as mobile gaming and AR/VR, have posed stringent quality of service (QoS) requirements that cloud computing is unable to meet. To solve these problems and alleviate traffic congestions on transport networks, mobile edge computing (MEC) (a.k.a. fog computing) has emerged as a new paradigm to provide cloud computing services in close proximity to the end users [9, 27].
Different from traditional cloud computing framework where massive computing resources are placed on remote area, MEC deploys computing servers throughout the network. These servers are usually base stations (BSs), but can also be other dedicated devices with computing and storage resources. Offloading computation tasks to nearby BSs rather than to the cloud substantially reduces end-to-end latency, thus improves the quality of experience (QoE) of end users. Extra tasks exceeding the computing capacity of local BSs are further offloaded to the cloud, forming a hierarchical offloading structure among end users, BSs, and the cloud . Therefore, MEC is more like an extension rather than a substitute of cloud computing. In addition to low-latency computing service, densely deployed BSs also provide other benefits like location awareness and mobility support. Thus, MEC is considered as a promising approach to address the challenges posed by modern applications.
Although MEC is able to meet severe QoS requirements, one significant problem is that the available computing resources in the edge of network are very limited compared to data centers in cloud computing. Recently, peer offloading [30, 5, 18, 14] has been proposed as an effective technique to handle bursty and spatially imbalanced arrival of computation tasks. By exploiting cooperation among BSs, peer offloading allows overloaded BSs to forward part of their workload to their neighbors, thus improves the utilization of existing computing resources and user experience.
of peer offloading are based on the fluid-flow model. They assume the workload of computation tasks is divisible and regard the task arrival process as a fluid-flow with certain rate. As a result, control algorithms based on the fluid-flow model only consider the expected arrival rate and ignore the variances of task arrivals. If the actual arrival process is bursty, the amount of arrived tasks may be substantially larger than the average level in a short time interval. In this case, the performance of these algorithms will degrade significantly and result in a large worst-case response time. We present a simple example in SectionIV to further illustrate our argument. A similar discussion is also given in  and they solve this problem by incorporating deadlines of tasks into decision-making. However, the algorithm in  is specially designed for computation-intensive tasks whose processing time ranges from minutes to hours or even days. Moreover, although  serves tasks with the best effort, they do not ensure all accepted tasks will be processed before their deadlines. Therefore, there still lacks a peer offloading algorithm that is able to provide worst-case response time guarantees for real-time applications who generally require the response time of tasks should be less than 100 milliseconds (ms) [28, 22].
In this paper, we formulate the peer offloading problem based on the stochastic arrival model. Control decisions are made for individual tasks instead of abstracted task flows. We deliver two efficient online algorithms that are able to yield close to optimal performance while provide worst-case response time bound. The main contributions of our work are summarized as follows.
(1) We formalize the peer offloading problem in MEC networks based on the stochastic arrival model. The objective is to maximize the utility function of time-average throughput under a long-term energy consumption constraint and the worst-case response time requirement. Our algorithms can be extended to include other time-average constraints easily. To the best of our knowledge, we are the first that provides worst-case response time guarantees for real-time applications.
(2) We present a simple yet efficient algorithm when the expected arrival rate of computation tasks at each BS is known in advance. Theoretical analysis shows the algorithm is optimal both in system performance and response time.
(3) When the arrival rate is unknown, we develop an online algorithm that requires no prior information based on Lyapunov optimization. We show that the key subroutine of the algorithm is equivalent to the classical assignment problem, and thus can be solved in time . Theoretical analysis of the algorithm presents a - tradeoff between system performance and worst-case response time bound, where is a tunable parameter.
(4) We carry out extensive simulations with a real-world dataset to verify theoretical results and demonstrate that the proposed algorithm can produce close to optimal performance under strict worst-case response time constraint.
The rest of this paper is organized as follows. In Section II, we review related works in more detail. In Section III, we present the system model and formalize the problem. In Section IV, we propose an optimal algorithm when the arrival rate of computation tasks is known. In Section V, we develop an online algorithm based on Lyapunov optimization, and give related theoretical analysis. In Section VI, several techniques are proposed to improve the practicality of our algorithms. In Section VII, numerical results are presented to demonstrate the performance of our algorithm. Section VIII concludes the paper and shows open problems for future work.
Ii Related Works
The emerging MEC paradigm offers the possibility for supporting a large variety of new applications such as mobile gaming and AR/VR . One of the main research points in MEC is the task offloading problem. [11, 15, 7, 8, 6] stand in the position of end users to decide which task should be offloaded to nearby BSs in order to optimize objectives like latency111In the rest of this paper, we will use “response time” and “latency” interchangeably. and energy consumption. In contrast, we consider from the point of view of BSs and study how cooperative BSs can handle their tasks collaboratively to provide the best user experience. Although collaborative computing is a common act in geographical load balancing originally proposed for data centers, the main concern there is reducing operational cost with respect to spatial diversities of workload patterns  and electricity price differences across regions . In contrast, we care about system performances like throughput and energy consumption in cooperative MEC. Additionally, while the cooperative task offloading problem in MEC is online in nature, the problem considered in geographical load balancing is usually offline. Therefore, techniques developed for geographical load balancing cannot be directly applied to MEC.
Recently, extensive researches have been conducted on the cooperation strategy between edge servers and incentive mechanism design [29, 23, 26, 3, 32, 4, 12]. The works closest to ours are those that design control algorithms for peer offloading [30, 5, 18, 14]. The work in  considers the users’ QoE and the BSs’ power efficiency in MEC network. They observe a fundamental tradeoff between these two metrics and develop a distributed optimization framework to achieve this tradeoff. The authors in  present a framework for online computation peer offloading. They theoretically characterize the optimal peer offloading strategy and show that the role of a computing server is determined by its pre-offloading marginal computation cost. A distributed optimization for cost-effectiveness offloading decisions is considered in . All the three works aim to optimize the expected latency while the authors in  discuss the necessity to consider the variability of response time. To enhance satisfaction ratio, they incorporate deadlines of tasks into decision-making. However, the algorithm in  is specially designed for computation-intensive tasks whose processing time ranges from minutes to hours or even days. In addition, they serve tasks in a best effort way and do not offer any service level guarantees. Although works in [2, 19, 10, 13, 33] also adopt the stochastic arrival model and consider worst-case latency of computation tasks in MEC networks, they either investigate the user-to-BSs offloading problem, or only study control policies for a single BS. Therefore, to the best of our knowledge, our research is the first work that presents peer offloading algorithms which are able to provide worst-case service guarantees for real-time applications that generally require the response time be less than 100 ms [28, 22].
Iii System Model
We consider a local MEC network with BSs, which operates in slotted time as illustrated in Fig. 1.
We first assume that all computation tasks have equal workload. Then in Section VI-B, we show how to construct a general algorithm that is able to handle tasks of varying workload from strategies designed in Section IV and V.
To quickly react to the arrival of tasks, the time scale of each slot considered in this paper is fairly short (e.g. 1-5 ms). Thus, in each time slot, we assume at most one task may arrive at BS , denoted by , . Arrived tasks may be blocked if BSs are overloaded, and accepted tasks can be either processed locally or offloaded to nearby BSs. For convenience of description, we temporarily assume all arrived tasks will be accepted and allow BSs to drop accepted tasks. In Section VI-A, we present a method that converts drop decisions to block decisions so that the refusal of service happens at the request stage and all accepted tasks are guaranteed to be served on time.
Let and be the amount of tasks processed and dropped by BS on time slot . is the number of tasks peer offloaded from BS to BS . Like , we require , and
are binary variables. We useto denote the number of tasks stored in the queues of BSs. The update process of is
where is the one-way trip time from BS to BS . Thus is the number of peer offloaded tasks leaving BS on slot and arriving at BS on slot .
Our goal is to maximize the utility function of throughput with the constraint of time-average energy consumption and worst-case response time. Standing in the position of BSs, the response time of a task in this paper refers to the time from the moment the task is received by BSs to the moment the computation result of the task is transmitted back to the user. We omit the transmission time of the computation result as its size is usually very small. Given the maximum latencyallowed by users, we want to solve the following stochastic optimization problem with the extra requirement that all non-dropped tasks must be processed in time slots. The formulation of is
are the time-average expectation of throughput and energy consumption on BS , respectively. Here, is the expected task arrival rate of BS and is a concave function over that represents the utility of BS . Note that we have assumed a stationary in order to simplify our statement, but all algorithms and their performance analysis also hold when is time-varying. is the upper bound of time-average energy consumption. The energy consumption depends on the computation activity . Since is binary, we use to denote the active energy consumption when and to denote the static energy consumption when . Then we have , so the energy consumption constraint (2) actually requires the time-average service level satisfies ).
The difficulty of solving not only comes from the uncertainty of future task arrivals, but also from the coupling of decision variables along the timeline. From (1) we can see that the state of is dependent on the past peer offloading decisions . To avoid this problem, we consider a relaxed problem where we set in for every . Then, the update of becomes
The following theorem shows algorithms of can be constructed from algorithms of .
If there is an algorithm for the relaxed problem that achieves objective function value with worst-case response time , then we can design an algorithm for the original problem that achieves with worst-case response time , where .
To better describe the state change of , we rewrite (4) without the max operator
where , and are the actual number of tasks being dropped, being processed locally, and being peer offloaded, respectively. For example, if we have only one task in but and simultaneously. Since we cannot both offload and process this task, one of the above control decision must fail in execution. Thus, we have either or . One can prove that the time-average of control decisions and actual execution results are equal. The introduction of these notations are purely for the simplification of this proof.
Since in , the transmission of tasks is completed instantly. So there is no need to transmit tasks in advance and we can require that tasks are offloaded only when they will be served by other BSs in the next slot. Then, all tasks will be peer offloaded at most once. Let and be the decision variables of and respectively. For given , let . It is easy to check that is feasible for . Next we focus on the performance of .
Since tasks can be peer offloaded at most once and the actual transmission time will not exceed slots, the task being served by BS on slot under is also available at BS on slot under . Therefore, we have . This means tasks served on by will be served on by . Thus the throughput, as well as the objective value, of is same to that of . Note that the computing result have to be transmitted back to the original BS, which cost no more than slots. Therefore, the worst-case response time of is .
Theorem 1 enables us to focus on algorithm design of , which is a much easier problem because the update of no longer depends on past decision variables. In the next two sections, we design two online algorithms of for cases with and without prior information of task arrival rate.
Iv Algorithm Under Known Arrival Rate
In this section, we assume the task arrival rate is known. We consider the following optimization problem
where and are free variables in the set of real numbers. Let be the optimal solution of and be the corresponding optimal value. The following theorem shows is an upper bound of system performance.
No algorithm of can achieve an objective value greater than .
Suppose there is an algorithm with objective value . Let and be the time-average throughput and service level of . The definition of of (3) implies satisfies (8), and constraint (2) implies satisfies (7). Summing (5) over results in
Taking expectation, dividing by , and letting . The left-hand side turns to , which equals because . Then (10) implies (9) by substituting (3) into the right-hand side of (10). Therefore, and are feasible variables of with objective value , contradicting the assumption that is the optimal value.
From the above proof we can see that and are the time-average of optimal control decisions and of . Suppose the task arrival processes of different BSs are independent, we will show there is an algorithm that achieves and serve all tasks within one slot. The intuition behind the algorithm is illustrated by the following example. Considering a 2 BSs MEC network with task arrival rate and energy consumption constraint that requires . Let denote the two BSs. If we peer offload the task arrived at to
with probability, then the time-average number of tasks to be served by and are and , which satisfies the energy consumption constraint. Note that such strategy is based on expected task arrival rate and is usually given by algorithms adopting the fluid-flow model. We not show that although it achieves optimal throughput, the induced response time may be very large. Assume on some slot we have and , and we offload one task from to . Since the task arrival processes of different BSs are independent, such event happens with probability . Because there are two tasks enter on slot and each BS can only process one task in every time slot, one of the two tasks has to wait slot. If in the next time slot, the same event happens again, then one of the four tasks has to wait slots. Generally, for any finite integer , there is a probability of at least that the response time of some tasks exceeds slots.
The problem of above strategy is that the control decisions only depend on the expected arrival rate and disregard the actual task arrival on each time slot. As shown in the example, when the actual arrival differs from the expectation in a sequence of time slots, it inevitably induces a large response time. In contrast, if we offload tasks of only when , then each BS is assigned at most one task on every slot and thus all newly arrived tasks can be served within one slot. In our example, we first list the probabilities of all arrival events
Our strategy is offloading an arrived task from to with probability only when and . Then under all situations, there is at most one task enters the waiting queues of each BS so that all tasks can be served in the next slot. The time-average service rate of is . Similarly we can compute . So in this case both the throughput and the response time are optimal. Now we extend this method to the general case.
Let denote the BSs. Out goal is to compute how many tasks should be served by each BS given the actual arrival . We first decide how many tasks should be dropped so that the expected throughput equals . In every slot , observe , then choose the value of according to the following rule
We use to denote the number of tasks accepted by local BSs, where . It can be easily confirmed that for every and , is a random variable with expectation .
Next we develop a peer offloading strategy to let the time-average number of tasks processed by BSs equals . The whole algorithm consists of
steps. In each step, we make offloading decisions based on the outcome of the previous step. We use vectorto denote both the output of -th step and the input of -th step. The component is the number of tasks assigned to BS by the end of step . The input of the first step is . Define operation to swap the -th and -th component of any vector
For ease of statement, when the expectation of variables is invariant over time, their time index is omitted. For example, we use instead of . Now we explain the -th step of our algorithm in detail. The overall procedure is summarized in Algorithm 1.
(1) If , let and skip to the next step.
(2) Else, if , it means the expected number of tasks assigned to according to is lower than ’s optimal time-average service rate, so we should assign more tasks to by offloading from other BSs. Find the smallest such that
The left-hand side is the probability that there is at least one task arrived at . Our strategy is offloading tasks arrived at the these BSs to so that the time-average number of tasks assigned to equals . Specifically, in every time slot , observe the value of . If , then no peer offloading is performed, and we have . Else, find the smallest such that . If no such exists, let . Otherwise, if , then offload the task from to . In this case, . If , offload the task from to with probability
(3) Else, it must be , we should offload tasks of to other BSs. Similarly, find the smallest such that
The left-hand side is the probability that there is a newly arrived task for all . If and , let be the least integer with . Offload the task of to with probability
Likewise, this value must be non-negative. In this case
Otherwise, when or , we have .
Starting from the first step, one can verify that for each , we have: (1) ; (2) ; (3) . Repeat the process times, it is guaranteed that the final output satisfies
Offload tasks so that the number of tasks assigned to each BS equals and let BSs serve the assigned tasks in the next slot. The performance of the algorithm is analyzed as follows:
Since we assign at most one task to each BS at every slot according to (17), all non-dropped tasks will be served within one slot.
Since all non-dropped tasks are served by the BSs (18), our choice of guarantees the throughput of all BSs equals , which produces optimal system performance .
Therefore, it can be concluded that our algorithm is optimal, both in system performance and response time.
It can be easily checked that the time complexity of Algorithm 1 is . One can also run the algorithm offline and store the output strategy for each possible arrival . This will consume storage space in total. After that, when the task arrival is observed, one can directly look up the corresponding offloading strategy without running the whole algorithm again. The time complexity, in this case, is only .
V Algorithm Under Unknown Arrival Rate
The optimality of the algorithm designed in the previous section largely depends on the prior knowledge of arrival rate. In this section, we will solve the problem without such prior knowledge based on a methodology of Lyapunov Optimization. Different from traditional Lyapunov framework that only provides a time-average response time bound, we design a virtual queue that enables us to bound the response time in the worst-case. As stated in the proof of Theorem 1, we can require tasks are peer offloaded only if they will be served by other BSs in the next slot. As a result, the decisions of peer offloading and task serving can be represented by a single variable. Let be the number of tasks at BS that are offloaded to and served by BS on slot . Then is the number of tasks in being served on slot . Tasks offloaded to BS will be served immediately and will not enter . Now, the update of is
Considering the following constraints:
where all variables are binary. The first two constraints require that, in every slot , at most one task of can be served, and each BS can serve at most one task. The last constraint ensures that the number of tasks leaving is at most one, whether being served or being dropped. We will see later that this constraint do not harm the optimal value and it is useful in transforming drop decisions into block decisions.
In the following subsections, we first transform our problem into an equivalent form. Then we set a virtual queue to record the waiting time of the head-of-line task. We define a drift function of queues and combine it with our objective function to form a drift-plus-penalty bound. An algorithm is designed to minimize this bound. Theoretical analysis shows that the algorithm presents a - tradeoff between system performance and worst-case response time bound, where is a tunable parameter.
V-a Problem Transformation
Assume the right partial derivative of over is bounded by a non-negative constant . Define the concave extension of over as
where . Clearly, is non-decreasing, concave and when . We extend the objective function to allow variables of taking negative values. This will be useful in bounding the response time. For the sake of convenience, we also use to denote in the following subsections.
With the extended objective function, we introduce a vector of auxiliary variables to transform into the following problem