I Introduction
With the everincreasing demands for intelligent services, devices such as the smart phones are facing challenges in both battery life and computing power [1]. Rather than offloading computation to remote clouds, fog computing distributes computing, storage, control, and communication services along the CloudtoThing continuum [3, 2].
In recent years, task offloading becomes a promising technology and attracts significant attentions from researchers. In general, tasks with highcomplexity are usually offloaded to other nodes such that the battery lifetime and computational resources of an individual user can be saved [4]. For example, ThinkAir [5] provided a code offloading framework, which was capable of ondemand computational resource allocation and parallel execution for each task. In some literatures, the task offloading was modeled as a deterministic optimization problem, e.g. the maximization of energy efficiency in [6], the joint minimization of energy and latency in [7], and the minimization of energy consumption under delay constraints in [8]. However, one task offloading strategy needs to rely on the realtime states of the users and the servers, e.g. the length of the computation queue. From this aspect, the task offloading is a typical stochastic programming problem and the conventional optimization methods with deterministic parameters are not applicable. To circumvent this dilemma, the Lyapunov optimization method was invoked in [9, 10, 12, 11] to transform the challenging stochastic programming problem to a sequential decision problem, which included a series of deterministic problems in each time slot. Besides, the authors in [13] provided one gametheoretic decentralized approach, where each user can make offloading decisions autonomously.
The aforementioned task offloading schemes all assumed the availability of perfect knowledge about the system parameters. However, there are some cases where these parameters are unknown or partially known at the user. For example, some particular values (a.k.a. bandit feedbacks) are only revealed for the nodes that are queried. Specifically, the authors in [14] treated the communication delay and the computation delay of each task as a posteriori. In [15], the mobility of each user was assumed to be unpredictable. When the number of nodes that can be queried is limited due to the finite available resources, there exists a tradeoff between exploiting the empirically best node as often as possible and exploring other nodes to find more profitable actions [16, 17]. To balance this tradeoff, one popular approach is to model the exploration versus exploitation dilemma as a multiarmed bandit (MAB) problem, which has been extensively studied in statistics [18].
There are very few prior works addressing this exploration vs. exploitation tradeoff during task offloading in a fogenabled network. In this paper, we assume the processing delay of each task is unknown when we start to process the task and endeavor to find an efficient task offloading scheme with bandit feedback to minimize the user’s longterm latency. Our main contributions are as follows. Firstly, we introduce a nonstationary bandit model to capture the unknown latency variation, which is more practical than the previous modelbased ones, e.g. [7, 8, 9, 10, 11]. Secondly, an efficient task offloading algorithm is put forward based on the upperconfidence bound (UCB) policy. Note our proposed scheme is not a straightforward application of the UCB policy and thus the conventional analysis is not applicable. We also provide performance guarantees for the proposed algorithm.
The rest of this paper is organized as follows. Section II introduces the task offloading model and system assumptions. Section III presents one efficient algorithm and the corresponding performance guarantee. Numerical results are presented in Section IV and Section V concludes the paper.
Notations: Notation , , , and stand for the cardinality of set
, the uniform distribution on
, the expectation of random variable
, and the probability of event
. Notation indicates the sequence converges almost surely towards . One indicator function takes the value of when the specified condition is met (otherwise).Ii System Model
Iia Network Model
We are interested in a fogenabled network (see also Fig. 1) where both task nodes and helper nodes coexist. Computation tasks are generated at each fog node. Each fog node can also communicate with nearby nodes. The unfinished tasks are assumed to be cached in a firstinput firstoutput (FIFO) queue at each node. Due to the limited computation and storage resources within one individual node, the tasks that are processed locally usually experience high latency, which degrades the quality of service (QoS) and the quality of experience (QoE). To enable lowlatency processing, one task node may offload some of its computation tasks to the nearby helper nodes. These helper nodes typically possess more computation and storage resources and are deployed to help other task nodes on demand. In typical applications such as the online gaming, the tasks are usually generated periodically and cannot be split arbitrarily. Thus we assume one task is generated at the beginning of each time slot. Meanwhile, it can be allocated as one whole piece to one neighboring helper node.
Our goal is to minimize the longterm latency at a particular task node. In particular, the set of
fog nodes can be classified as
(1) 
In this paper, we assume the task node cannot offload tasks to a helper node when it is communicating with others. We also assume each task is generated independently and the task nodes do not cooperate with each other^{1}^{1}1 The cooperation among multiple task nodes is beyond the scope of the current paper and is left for our future works..
We use to represent the amount of time needed to deliver one bit of information to node. It is a distancedependent value and can be measured before transmission^{2}^{2}2We assume different task nodes occupy preallocated orthogonal time or spectrum resources for the communication to the helper nodes, e.g. TDMA or FDMA. Note the optimal time/spectrum reusing is itself a nontrivial research problem [21].. Denote the data length of task by . We also assume the task size is such that the transmission delay is no more than one time slot. Note the transmission delay is zero for a locally processed task, i.e. .
Let denote the queue length of node at the beginning of time slot. Meanwhile, we denote the time needed to process one bit waiting in the queue at node by , and denote the time needed to process one bit in task at node by when all the tasks ahead in the queue are completed. Furthermore, we treat and as random variables in this paper. Accordingly, the expectations are defined as:
(2) 
We assume the total latency of each task is dominated by the delays mentioned above, i.e. the transmission delay , the waiting delay in the queue, and the processing delay . We ignore the latency introduced during the transmitting of the computing results. Therefore, the total latency when allocating task to node can be written as follows.
(3) 
Before we proceed further, here we make the following assumptions:

AS1: The total latency is unknown before the task is completed;

AS2: The queue length is broadcasted by node at the beginning of each slot and is available for all the nearby fog nodes;

AS3: The waiting delay and the processing delay, i.e. and , follow unknown distributions. The corresponding expectations, i.e. and , change abruptly at unknown time instants (a.k.a. breakpoints).
Different from the modelbased task offloading problems addressed in [7, 8, 9, 10, 11], we do not require any specific relationships between the CPU frequencies and the processing delays in AS1 and AS3 as in [14]. This is a more realistic setting due to the following reasons. Firstly, the data lengths and the computation complexities of tasks should be modeled as a sequence of independent random variables. This is because their distributions may change abruptly and be completely different due to the changes in task types. Additionally, the computation capability, e.g. CPU frequencies, CPU cores, and memory size, of each node is different and may also follow abruptlychanging distributions. All of these uncertainties mentioned above make it very tough for an individual node to forecast the amount of time spent in processing different tasks. It also costs a lot of overheads for an individual node to obtain the global information about the whole system. As a result, the processing delay and the waiting delay cannot be calculated accurately in a practical system with the conventional model, where the delays are simply determined by the data length and the configured CPU frequency [10].
In our paper, the processing delay and the waiting delay are only reported after the corresponding task is finished. Namely, the observations of the waiting delay and the processing delay are treated as posterior information. Note these delays can be obtained via the timestamp feedback from the corresponding node after finishing task. Accordingly, we obtain the realizations of and as
(4) 
IiB Problem Formulation
The general minimization of the longterm average latency of tasks can be formulated as follows.
(5) 
where represents the index of the node to process task. There are two difficulties in solving the above problem. Firstly, it is a stochastic programming problem. The exact information about the latency is not available before the th task is completed. Additionally, even if
is known apriori, this problem is still a combinatorial optimization problem and the complexity is in the order of
. This is due to the fact that the previous offloading decisions determine the queue length in each fog node and further affect the decisions of future tasks. See [15] for an example. To render the task offloading strategies welcome online updating, one popular way is to convert this challenging stochastic and combinatorial optimization problem into one lowcomplexity sequential decision problem at each time slot [9, 10, 12, 11]. Given the task offloading decisions made in the previous time slots, the optimal strategy turns into allocating task to the node with minimal latency at time slot. Meanwhile, under the stochastic framework [20], it is more natural to focus on the expectation, i.e. . Accordingly, the problem in (5) becomes the following one in the th time slot:(6) 
However, the above formulation is still a stochastic programming problem. Although the tasks offloaded previously do enable an empirical average as an estimate of the expectation
, this information may be inaccurate due to limited number of observations. Note the information about node is from the feedbacks from node when it finishes the corresponding tasks. In order to get more information about one specific node, the task node has to offload more tasks to that node even though it may not be the empirically best node to offload. Therefore, an explorationexploitation tradeoff exists in this problem. In the following parts, we endeavor to find one efficient scheme to solve the problem in (6).Iii Efficient Offloading Algorithm
Iiia Task Offloading with DiscountedUCB
To strike a balance between the aforementioned exploration and exploitation, we model the task offloading as a nonstationary multiarmed bandit (MAB) problem [17], where each node in is regarded as one arm. When a particular task is generated, we need to determine one fog node, either one helper node or the task node, to deal with it. This corresponds to choosing one arm to play in the MAB.
Recall that the task node generates one task at the beginning of each time slot. Let be the time when the feedback of the th task is received, where is the maximum permitted latency. If , the task fails and is discarded. According to [17], we can estimate and with the UCB policy as
(7) 
where represents the discount factor, and
(8) 
Then the latency can be estimated as
(9) 
Note that the latency in (9) is estimated based on the history information of and instead of the previous latency values, i.e. . This is due to the fact that the individual latency closely depends on the queue length and the task length , which may vary significantly for different types of tasks. Thus it is not trustworthy to estimate with the previous latency values directly. On the other hand, the time needed to process one bit of a task is typically determined by the node capability, which is relatively stable and thus suitable to be estimated with the sample mean.
At node, the total amount of time utilized to process task is compared with the maximal tolerable latency and the time difference is defined as a reward, i.e. . Clearly, a negative reward indicates a task failure. Based on the estimated latency in (9), the estimated reward is given by
(10) 
The parameters , , and can be updated iteratively with low complexity. Particularly, let denote the set of indices of tasks completed within the interval and we can have
(11) 
(12) 
(13) 
The explorationexploitation tradeoff is then handled by applying the UCB policies as in [17]. An UCB is constructed as
. The padding function
characterizes the exploration bonus, which is defined as(14) 
where stands for an exploration constant and
(15) 
The node selected to process task is then determined by
(16) 
Our proposed strategy, i.e. Task Offloading with DiscountedUCB (TOD), is summarized in Algorithm 1.
Although the above proposed task offloading model is essentially a nonstationary MAB model, there are two main differences compared with the conventional model as proposed in [17]. First, the feedback was obtained instantaneously with the decision making in the conventional model. While in our model, as indicated in (7), the feedback is not available until the task is finished. The corresponding latency should not be ignored since it is exactly the information we need. Note the delayed feedback affects the performance analyses as discussed in [19]. Second, the best arm is assumed to change only at the breakpoints in [17]. However, our model allows the best node to vary when processing different tasks. Therefore, the performance guarantee for the conventional discountedUCB algorithm cannot be applied directly to our proposed TOD.
IiiB Performance Analysis
According to (2) and (3), the expected latency can be expressed as
(17) 
Given the offloading strategies for the first tasks, according to (6), the best node to handle task is given by . Additionally, we use
to denote the number of tasks offloaded to node while it is not the best node during the first time slots. From AS3, we know the expectations of system parameters could change abruptly at each breakpoint. We use to denote the number of breakpoints before time . The following proposition provides an upper bound for .
Proposition 1.
Assume and satisfies
For each node , we have the following upper bound for :
(18) 
where
Detailed proof for the above proposition can be found in arXiv and is omitted here due to the space limitation^{3}^{3}3arXiv:1804.08416, https://arxiv.org/abs/1804.08416.. Clearly, the upper bound depends on the number of total tasks, the number of breakpoints , and the choice of discount factor . From (18), we see the term decreases as the feasible increases. On the other hand, the last two terms, i.e. and , are increasing when the feasible is increasing. This is consistent with our intuition that a higher discount factor contributes to a better estimation in the stationary case, while it results in slow reaction to abrupt changes of environments. Therefore, there is a tradeoff between different terms in (18). To strike a balance between the stable and the abruptlychanging environments, similar to [17], we choose as
(19) 
Accordingly, we can establish the following proposition.
Proposition 2.
When , and , the value of is in the order of
Proof.
Let , then the three terms in (18), i.e. , , and , are in the order of , , and , respectively. Thus is in the order of . ∎
To show the optimality of our proposed Algorithm 1, we define the pseudoregret in offloading the first tasks as [20]
(20) 
We have the following result regarding the pseudoregret .
Proposition 3.
When , the proposed approach in Algorithm 1 is asymptotically optimal in the sense that .
Iv Numerical Results
In this section, we evaluate the performance of our proposed offloading algorithm by testing rounds of task offloading.
One task is generated in each round.
Some common system parameters are set as follows.
The network consists of task node and helper nodes;
Each time slot is ms. Data size follows KB;
Maximal latency is slots, ;
The delay of processing one bit of the task is simulated following , where characterizes the complexity of task, and reflects the CPU capability of node.
Both variables follow ;
The CPU capability of node is changed as or at each breakpoint.
We compare the performance of TOD with two other schemes, i.e. Greedy and RoundRobin. In the greedy scheme, we assume full information of every realization and offload the task to the node achieving minimal latency in each time slot. Note that the greedy scheme is not causal and cannot be applied in practice. In the roundrobin scheme, each task is offloaded to the fog nodes in a cyclic way with equal chances.
In Fig. 2, with different number of breakpoints, i.e
, we demonstrate the effectiveness and robustness of TOD by showing cumulative distribution functions (CDFs) of the latency of processing
tasks with different schemes. TODOpt and TODCal in Fig. 2 represent two different criteria to choose in TOD. The discount factor in the former criterion is searched over to achieve the minimal average latency, while the other one is calculated following (19). Both the left and the right parts in Fig. 2 show the proposed TOD algorithm performs much better than the roundrobin scheme, and performs close to the greedy method, which achieves the minimal realization of latency in each round. Additionally, we can learn from Fig. 2 that the calculated following (19) performs as well as the optimal one. In Fig. 2(a), there exists one breakpoint every tasks on average as , which indicates TOD is able to learn the system under frequent changes of parameter distribution. It is also worth noting that, in Fig. 2(b), TOD achieves even less average latency than the greedy scheme in the case of limited abrupt changes, i.e. . This phenomenon reveals that the decision with minimal latency in each time slot may not be the global optimum of (5). It also corroborates our previous analysis that every choice will affect the future state of the node, and further affects the following offloading decisions.Fig. 3 presents the ratio of the number of successfully processed tasks to the number of tasks. A task is successful if the latency is less than . Although the success ratios of TOD are lower than the greedy scheme due to the exploration of nodes, they show the tendencies to approaching the greedy scheme with time going on.
Fig. 4 depicts the regrets from different schemes. Note the regret is based on the optimal realization (greedy method), and the pseudoregret is based on the optimal expectation. In IIR, we separate the exploration and the exploitation to two phases. In the exploration phase, the roundrobin method is adopted. In the exploitation phase, we focus on maximizing the estimated reward defined in (10), which is actually an estimate based on the infinite impulse response (IIR) filter. The ratio of two phases is searched to achieve the minimal regret. It can be observed that, either in the sense of the regret or in the sense of the pseudoregret, the proposed TOD algorithm achieves much lower regrets than the roundrobin scheme and the IIR scheme. This phenomenon shows that our proposed method performs well in dealing with the explorationexploitation tradeoff. Besides, as the discount factor is set to , the TOD performance deteriorates a lot. This further indicates the importance of the exploration bonus.
V Conclusion
In this paper, an efficient online task offloading strategy and the corresponding performance guarantee in a fogenabled network have been studied. Considering that the expectations of processing speeds change abruptly at unknown time instants, and the system information is available only after finishing the corresponding tasks, we have formulated it as a stochastic programming with delayed bandit feedbacks. To solve this problem, we have provided TOD, an efficient online task offloading algorithm based on the UCB policy. Given a particular number of breakpoints , we have proven that the bound on the number of tasks offloaded to a particular nonoptimal node is in the order of . Besides, we have also proven that the pseudoregret goes to zero almost surely when the number of tasks goes to infinity. Simulations have demonstrated that the proposed TOD algorithm is capable of learning and picking the right node to offload tasks under nonstationary circumstances.
References
 [1] H. T. Dinh, C. Lee, D. Niyato, and P. Wang, “A survey of mobile cloud computing: Architecture, applications, and approaches,” Wireless Commun. Mobile Comput., vol. 13, no.18, pp. 1587–1611, Dec. 2013.
 [2] M. Chiang and T. Zhang, “Fog and IoT: An overview of research opportunities,” IEEE Internet Things J., vol. 3, no. 6, pp. 854–864, Dec. 2016.
 [3] M. Satyanarayanan, P. Bahl, R. Caceres, and N. Davies, “The case for VMbased cloudlets in mobile computing,” IEEE Pervasive Comput., vol. 8, no. 4, pp. 14–23, Oct. 2009.
 [4] M. V. Barbera, S. Kosta, A. Mei, and J. Stefa, “To offload or not to offload? The bandwidth and energy costs of mobile cloud computing,” in Proc. IEEE INFOCOM, Turin, Italy, Apr. 2013, pp. 1285–1293.
 [5] S. Kosta, A. Aucinas, P. Hui, R. Mortier, and X. Zhang, “ThinkAir: Dynamic resource allocation and parallel execution in the cloud for mobile code offloading,” in Proc. IEEE INFOCOM, Orlando, FL, USA, Mar. 2012, pp. 945–953.
 [6] Y. Yang, K. Wang, G. Zhang, X. Chen, X. Luo, and M. Zhou, “MEETS: Maximal energy efficient task scheduling in homogeneous fog networks,” submitted to IEEE Internet Things J., 2017.
 [7] T. Q. Dinh, J. Tang, Q. D. La, and T. Q. S. Quek, “Offloading in mobile edge computing: Task allocation and computational frequency scaling,” IEEE Trans. on Commun., vol. 65, no. 8, pp. 3571–3584, Aug. 2017.
 [8] C. You, K. Huang, H. Chae, and B.H. Kim, “Energyefficient resource allocation for mobileedge computation offloading,” IEEE Trans. Wireless Commun., vol. 16, no. 3, pp. 1397–1411, Mar. 2017.
 [9] J. Kwak, Y. Kim, J. Lee, and S. Chong, “DREAM: Dynamic resource and task allocation for energy minimization in mobile cloud systems,” IEEE J. Sel. Areas Commun, vol. 33, no. 12, pp. 2510–2523, Dec. 2015.
 [10] Y. Mao, J. Zhang, S. H. Song, and K. B. Letaief, “Stochastic joint radio and computational resource management for multiuser mobileedge computing systems,” IEEE Trans. Wireless Commun., vol. 16, no. 9, pp. 5994–6009, Sept. 2017.
 [11] Y. Yang, S. Zhao, W. Zhang, Y. Chen, X. Luo, and J. Wang, “DEBTS: Delay energy balanced task scheduling in homogeneous fog networks,” IEEE Internet Things J., in press.
 [12] L. Pu, X. Chen, J. Xu, and X. Fu, “D2D fogging: An energyefficient and incentiveaware task offloading framework via networkassisted D2D collaboration,” IEEE J. Sel. Areas Commun., vol. 34, no.12, pp. 3887–3901, Dec. 2016.
 [13] X. Chen, “Decentralized computation offloading game for mobile cloud computing,” IEEE Trans. Parallel Distrib. Syst., vol. 26, no. 4, pp. 974–983, Apr. 2015.
 [14] T. Chen and G. B. Giannakis, “Bandit convex optimization for scalable and dynamic IoT management”, arXiv preprint arXiv:1707.09060, 2017.
 [15] C. Tekin and M. van der Schaar, “An experts learning approach to mobile service offloading,” in Proc. Annu. Allerton Conf. Commun., Control, Comput., 2014, pp. 643–650.
 [16] P. Auer, N. CesaBianchi, and P. Fischer, “Finitetime analysis of the multiarmed bandit problem,” Mach. Learn., vol. 47, no. 2, pp. 235–256, May 2002.
 [17] A. Garivier and E. Moulines, “On upperconfidence bound policies for switching bandit problems,” in Proc. Int. Conf. Algorithmic Learn. Theory, Espoo, Finland, Oct. 2011, pp. 174–188.
 [18] D. A. Berry and B. Fristedt, Bandit Problems: Sequential Allocation of Experiments. London, U.K.: Chapman & Hall, 1985.
 [19] P. Joulani, A. Gyorgy, and C. Szepesvari, “Online learning under delayed feedback,” in Proc. Int. Conf. Mach. Learn., Atlanta, GA, USA, Jun. 2013, pp. 1453–1461.
 [20] S. Bubeck and N. CesaBianchi, “Regret analysis of stochastic and nonstochastic multiarmed bandit problems,” Found. Trends Mach. Learn., vol. 5, no. 1, pp. 1–122, 2012.
 [21] Z. Zhu, S. Jin, Y. Yang, H. Hu, and X. Luo, “Time reusing in D2Denabled cooperative networks,” IEEE Trans. Wireless Commun., in press.
Vi Appendix
Proof.
According to the definition of , it can be decomposed as
(25) 
where is a particular function with respect to . The number of missing feedbacks when task is offloaded can be defined as
(26) 
Clearly, the number of missing feedbacks is no larger than , i.e. . According to Lemma 1 in [17], for any , the following inequality is derived:
(27) 
Due to the fact that
(28) 
we have
(29) 
Let , we have
(30) 
Let denote the number of breakpoints before time , and denote the set of “well offloaded” tasks. Mathematically, these tasks are defined as follows.
(31) 
where indicates the number of tasks, of which the delay is poorly estimated. Because of this, the DUCB policy may not offload tasks to the optimal node, which leads to the following bound:
(32) 
Next, we need to upperbound the last term in (32).
There are three facts:
i) The event occurs if and only if the event occurs;
ii) ;
iii) .
Based on these facts, the following inequality is obtained:
(33) 
Namely, when node is tested enough times by the task node, the event only occurs under three circumstances: i) the delay of the optimal node is substantially overestimated; ii) the delay of node is substantially underestimated; iii) both delay expectations, i.e. and , are close enough.
However, if is chosen appropriately, the event never occurs. Denote the minimal difference between the expected delay of node and the expected delay of the best node by , i.e.
(34) 
Let , where . Recalling , we have
(35) 
However, from the definition of we obtain:
(36) 
which is contradict with (35). Thus the events and never occur simultaneously, which indicates that we only need to upperbound the probability of events and . Define as
(37) 
where , then
(38) 
Combining with the following two facts:
(39) 
we obtain
(40) 
Let
(41) 
the inequality in (40) turns to be
(42) 
Defining , the following inequality can be deduced:
(43) 
where holds due to Theorem 4 in [17]. Let , we further obtain:
(44) 
Till now, the expectation of can be upperbounded as
(45) 
Assuming
(46) 
we have
(47) 
and
(48) 
Then the following inequality holds:
Comments
There are no comments yet.