Learn and Pick Right Nodes to Offload

04/20/2018
by   Zhaowei Zhu, et al.
0

Task offloading is a promising technology to exploit the benefits of fog computing. An effective task offloading strategy is needed to utilize the computation resources efficiently. In this paper, we endeavor to seek an online task offloading strategy to minimize the long-term latency. In particular, we formulate a stochastic programming problem, where the expectations of the system parameters change abruptly at unknown time instants. Meanwhile, we consider the fact that the queried nodes can only feed back the processing results after finishing the tasks. We then put forward an effective algorithm to solve this challenging stochastic programming under the non-stationary bandit model. We further prove that our proposed algorithm is asymptotically optimal in a non-stationary fog-enabled network. Numerical simulations are carried out to corroborate our designs.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 6

06/27/2018

Online optimal task offloading with one-bit feedback

Task offloading is an emerging technology in fog-enabled networks. It al...
08/01/2020

Green Offloading in Fog-Assisted IoT Systems: An Online Perspective Integrating Learning and Control

In fog-assisted IoT systems, it is a common practice to offload tasks fr...
08/01/2020

PORA: Predictive Offloading and Resource Allocation in Dynamic Fog Computing Systems

In multi-tiered fog computing systems, to accelerate the processing of c...
12/02/2020

Delay Sensitive Task Offloading in the 802.11p Based Vehicular Fog Computing Systems

Vehicular fog computing (VFC) is envisioned as a promising solution to p...
11/24/2020

Peer Offloading with Delayed Feedback in Fog Networks

Comparing to cloud computing, fog computing performs computation and ser...
06/22/2020

An Online Algorithm for Computation Offloading in Non-Stationary Environments

We consider the latency minimization problem in a task-offloading scenar...
11/09/2021

Reinforcement Learning for Security-Aware Computation Offloading in Satellite Networks

The rise of NewSpace provides a platform for small and medium businesses...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

With the ever-increasing demands for intelligent services, devices such as the smart phones are facing challenges in both battery life and computing power [1]. Rather than offloading computation to remote clouds, fog computing distributes computing, storage, control, and communication services along the Cloud-to-Thing continuum [3, 2].

In recent years, task offloading becomes a promising technology and attracts significant attentions from researchers. In general, tasks with high-complexity are usually offloaded to other nodes such that the battery lifetime and computational resources of an individual user can be saved [4]. For example, ThinkAir [5] provided a code offloading framework, which was capable of on-demand computational resource allocation and parallel execution for each task. In some literatures, the task offloading was modeled as a deterministic optimization problem, e.g. the maximization of energy efficiency in [6], the joint minimization of energy and latency in [7], and the minimization of energy consumption under delay constraints in [8]. However, one task offloading strategy needs to rely on the real-time states of the users and the servers, e.g. the length of the computation queue. From this aspect, the task offloading is a typical stochastic programming problem and the conventional optimization methods with deterministic parameters are not applicable. To circumvent this dilemma, the Lyapunov optimization method was invoked in [9, 10, 12, 11] to transform the challenging stochastic programming problem to a sequential decision problem, which included a series of deterministic problems in each time slot. Besides, the authors in [13] provided one game-theoretic decentralized approach, where each user can make offloading decisions autonomously.

The aforementioned task offloading schemes all assumed the availability of perfect knowledge about the system parameters. However, there are some cases where these parameters are unknown or partially known at the user. For example, some particular values (a.k.a. bandit feedbacks) are only revealed for the nodes that are queried. Specifically, the authors in [14] treated the communication delay and the computation delay of each task as a posteriori. In [15], the mobility of each user was assumed to be unpredictable. When the number of nodes that can be queried is limited due to the finite available resources, there exists a tradeoff between exploiting the empirically best node as often as possible and exploring other nodes to find more profitable actions [16, 17]. To balance this tradeoff, one popular approach is to model the exploration versus exploitation dilemma as a multi-armed bandit (MAB) problem, which has been extensively studied in statistics [18].

There are very few prior works addressing this exploration vs. exploitation tradeoff during task offloading in a fog-enabled network. In this paper, we assume the processing delay of each task is unknown when we start to process the task and endeavor to find an efficient task offloading scheme with bandit feedback to minimize the user’s long-term latency. Our main contributions are as follows. Firstly, we introduce a non-stationary bandit model to capture the unknown latency variation, which is more practical than the previous model-based ones, e.g. [7, 8, 9, 10, 11]. Secondly, an efficient task offloading algorithm is put forward based on the upper-confidence bound (UCB) policy. Note our proposed scheme is not a straightforward application of the UCB policy and thus the conventional analysis is not applicable. We also provide performance guarantees for the proposed algorithm.

The rest of this paper is organized as follows. Section II introduces the task offloading model and system assumptions. Section III presents one efficient algorithm and the corresponding performance guarantee. Numerical results are presented in Section IV and Section V concludes the paper.

Notations: Notation , , , and stand for the cardinality of set

, the uniform distribution on

, the expectation of random variable

, and the probability of event

. Notation indicates the sequence converges almost surely towards . One indicator function takes the value of when the specified condition is met (otherwise).

Ii System Model

Ii-a Network Model

Fig. 1: A fog-enabled network. Different colors indicate the tasks from different types of nodes. The task node is busy dealing with computation tasks, some of which are offloaded to its nearby helper nodes, i.e. helper node-, , , and .

We are interested in a fog-enabled network (see also Fig. 1) where both task nodes and helper nodes co-exist. Computation tasks are generated at each fog node. Each fog node can also communicate with nearby nodes. The unfinished tasks are assumed to be cached in a first-input first-output (FIFO) queue at each node. Due to the limited computation and storage resources within one individual node, the tasks that are processed locally usually experience high latency, which degrades the quality of service (QoS) and the quality of experience (QoE). To enable low-latency processing, one task node may offload some of its computation tasks to the nearby helper nodes. These helper nodes typically possess more computation and storage resources and are deployed to help other task nodes on demand. In typical applications such as the online gaming, the tasks are usually generated periodically and cannot be split arbitrarily. Thus we assume one task is generated at the beginning of each time slot. Meanwhile, it can be allocated as one whole piece to one neighboring helper node.

Our goal is to minimize the long-term latency at a particular task node. In particular, the set of

fog nodes can be classified as

(1)

In this paper, we assume the task node cannot offload tasks to a helper node when it is communicating with others. We also assume each task is generated independently and the task nodes do not cooperate with each other111 The cooperation among multiple task nodes is beyond the scope of the current paper and is left for our future works..

We use to represent the amount of time needed to deliver one bit of information to node-. It is a distance-dependent value and can be measured before transmission222We assume different task nodes occupy pre-allocated orthogonal time or spectrum resources for the communication to the helper nodes, e.g. TDMA or FDMA. Note the optimal time/spectrum reusing is itself a non-trivial research problem [21].. Denote the data length of task- by . We also assume the task size is such that the transmission delay is no more than one time slot. Note the transmission delay is zero for a locally processed task, i.e. .

Let denote the queue length of node- at the beginning of time slot-. Meanwhile, we denote the time needed to process one bit waiting in the queue at node- by , and denote the time needed to process one bit in task- at node- by when all the tasks ahead in the queue are completed. Furthermore, we treat and as random variables in this paper. Accordingly, the expectations are defined as:

(2)

We assume the total latency of each task is dominated by the delays mentioned above, i.e. the transmission delay , the waiting delay in the queue, and the processing delay . We ignore the latency introduced during the transmitting of the computing results. Therefore, the total latency when allocating task- to node- can be written as follows.

(3)

Before we proceed further, here we make the following assumptions:

  • AS-1: The total latency is unknown before the task is completed;

  • AS-2: The queue length is broadcasted by node- at the beginning of each slot and is available for all the nearby fog nodes;

  • AS-3: The waiting delay and the processing delay, i.e. and , follow unknown distributions. The corresponding expectations, i.e. and , change abruptly at unknown time instants (a.k.a. breakpoints).

Different from the model-based task offloading problems addressed in [7, 8, 9, 10, 11], we do not require any specific relationships between the CPU frequencies and the processing delays in AS-1 and AS-3 as in [14]. This is a more realistic setting due to the following reasons. Firstly, the data lengths and the computation complexities of tasks should be modeled as a sequence of independent random variables. This is because their distributions may change abruptly and be completely different due to the changes in task types. Additionally, the computation capability, e.g. CPU frequencies, CPU cores, and memory size, of each node is different and may also follow abruptly-changing distributions. All of these uncertainties mentioned above make it very tough for an individual node to forecast the amount of time spent in processing different tasks. It also costs a lot of overheads for an individual node to obtain the global information about the whole system. As a result, the processing delay and the waiting delay cannot be calculated accurately in a practical system with the conventional model, where the delays are simply determined by the data length and the configured CPU frequency [10].

In our paper, the processing delay and the waiting delay are only reported after the corresponding task is finished. Namely, the observations of the waiting delay and the processing delay are treated as posterior information. Note these delays can be obtained via the timestamp feedback from the corresponding node after finishing task-. Accordingly, we obtain the realizations of and as

(4)

Ii-B Problem Formulation

The general minimization of the long-term average latency of tasks can be formulated as follows.

(5)

where represents the index of the node to process task-. There are two difficulties in solving the above problem. Firstly, it is a stochastic programming problem. The exact information about the latency is not available before the -th task is completed. Additionally, even if

is known apriori, this problem is still a combinatorial optimization problem and the complexity is in the order of

. This is due to the fact that the previous offloading decisions determine the queue length in each fog node and further affect the decisions of future tasks. See [15] for an example. To render the task offloading strategies welcome online updating, one popular way is to convert this challenging stochastic and combinatorial optimization problem into one low-complexity sequential decision problem at each time slot [9, 10, 12, 11]. Given the task offloading decisions made in the previous time slots, the optimal strategy turns into allocating task- to the node with minimal latency at time slot-. Meanwhile, under the stochastic framework [20], it is more natural to focus on the expectation, i.e. . Accordingly, the problem in (5) becomes the following one in the -th time slot:

(6)

However, the above formulation is still a stochastic programming problem. Although the tasks offloaded previously do enable an empirical average as an estimate of the expectation

, this information may be inaccurate due to limited number of observations. Note the information about node- is from the feedbacks from node- when it finishes the corresponding tasks. In order to get more information about one specific node, the task node has to offload more tasks to that node even though it may not be the empirically best node to offload. Therefore, an exploration-exploitation tradeoff exists in this problem. In the following parts, we endeavor to find one efficient scheme to solve the problem in (6).

Iii Efficient Offloading Algorithm

Iii-a Task Offloading with Discounted-UCB

To strike a balance between the aforementioned exploration and exploitation, we model the task offloading as a non-stationary multi-armed bandit (MAB) problem [17], where each node in is regarded as one arm. When a particular task is generated, we need to determine one fog node, either one helper node or the task node, to deal with it. This corresponds to choosing one arm to play in the MAB.

Recall that the task node generates one task at the beginning of each time slot. Let be the time when the feedback of the -th task is received, where is the maximum permitted latency. If , the task fails and is discarded. According to [17], we can estimate and with the UCB policy as

(7)

where represents the discount factor, and

(8)

Then the latency can be estimated as

(9)

Note that the latency in (9) is estimated based on the history information of and instead of the previous latency values, i.e. . This is due to the fact that the individual latency closely depends on the queue length and the task length , which may vary significantly for different types of tasks. Thus it is not trustworthy to estimate with the previous latency values directly. On the other hand, the time needed to process one bit of a task is typically determined by the node capability, which is relatively stable and thus suitable to be estimated with the sample mean.

At node-, the total amount of time utilized to process task- is compared with the maximal tolerable latency and the time difference is defined as a reward, i.e. . Clearly, a negative reward indicates a task failure. Based on the estimated latency in (9), the estimated reward is given by

(10)

The parameters , , and can be updated iteratively with low complexity. Particularly, let denote the set of indices of tasks completed within the interval and we can have

(11)
(12)
(13)

The exploration-exploitation tradeoff is then handled by applying the UCB policies as in [17]. An UCB is constructed as

. The padding function

characterizes the exploration bonus, which is defined as

(14)

where stands for an exploration constant and

(15)

The node selected to process task- is then determined by

(16)

Our proposed strategy, i.e. Task Offloading with Discounted-UCB (TOD), is summarized in Algorithm 1.

Although the above proposed task offloading model is essentially a non-stationary MAB model, there are two main differences compared with the conventional model as proposed in [17]. First, the feedback was obtained instantaneously with the decision making in the conventional model. While in our model, as indicated in (7), the feedback is not available until the task is finished. The corresponding latency should not be ignored since it is exactly the information we need. Note the delayed feedback affects the performance analyses as discussed in [19]. Second, the best arm is assumed to change only at the breakpoints in [17]. However, our model allows the best node to vary when processing different tasks. Therefore, the performance guarantee for the conventional discounted-UCB algorithm cannot be applied directly to our proposed TOD.

1:Initialization: Set appropriate . Set , .
2:Repeat
3: Let , offload task- to node-; ;
4:Until ;
5:Update , and as (7) and (8);
6:Repeat
7: Update and as (10) and (14);
8: Determine as (16), offload task- to node-;
9:;
10: Update , , and as (11)-(13);
11:Until ;
Algorithm 1 TOD (Task Offloading with Discounted-UCB) Algorithm

Iii-B Performance Analysis

According to (2) and (3), the expected latency can be expressed as

(17)

Given the offloading strategies for the first tasks, according to (6), the best node to handle task- is given by . Additionally, we use

to denote the number of tasks offloaded to node- while it is not the best node during the first time slots. From AS-3, we know the expectations of system parameters could change abruptly at each breakpoint. We use to denote the number of breakpoints before time . The following proposition provides an upper bound for .

Proposition 1.

Assume and satisfies

For each node , we have the following upper bound for :

(18)

where

Detailed proof for the above proposition can be found in arXiv and is omitted here due to the space limitation333arXiv:1804.08416, https://arxiv.org/abs/1804.08416.. Clearly, the upper bound depends on the number of total tasks, the number of breakpoints , and the choice of discount factor . From (18), we see the term decreases as the feasible increases. On the other hand, the last two terms, i.e. and , are increasing when the feasible is increasing. This is consistent with our intuition that a higher discount factor contributes to a better estimation in the stationary case, while it results in slow reaction to abrupt changes of environments. Therefore, there is a tradeoff between different terms in (18). To strike a balance between the stable and the abruptly-changing environments, similar to [17], we choose as

(19)

Accordingly, we can establish the following proposition.

Proposition 2.

When , and , the value of is in the order of

Proof.

Let , then the three terms in (18), i.e. , , and , are in the order of , , and , respectively. Thus is in the order of . ∎

To show the optimality of our proposed Algorithm 1, we define the pseudo-regret in offloading the first tasks as [20]

(20)

We have the following result regarding the pseudo-regret .

Proposition 3.

When , the proposed approach in Algorithm 1 is asymptotically optimal in the sense that .

Proof.

Note . We have

(21)

According to Proposition 1 and Proposition 2, we obtain

(22)

Then for any , there exists a finite integer , such that

(23)

Therefore,

(24)

The above equation indicates . ∎

Iv Numerical Results

In this section, we evaluate the performance of our proposed offloading algorithm by testing rounds of task offloading. One task is generated in each round. Some common system parameters are set as follows.
The network consists of task node and helper nodes;
Each time slot is ms. Data size follows KB;
Maximal latency is slots, ;
The delay of processing one bit of the task is simulated following , where characterizes the complexity of task-, and reflects the CPU capability of node-. Both variables follow ;
The CPU capability of node- is changed as or at each breakpoint.

We compare the performance of TOD with two other schemes, i.e. Greedy and Round-Robin. In the greedy scheme, we assume full information of every realization and offload the task to the node achieving minimal latency in each time slot. Note that the greedy scheme is not causal and cannot be applied in practice. In the round-robin scheme, each task is offloaded to the fog nodes in a cyclic way with equal chances.

(a) .
(b) .
Fig. 2: CDFs of the latency of processing tasks. TOD-Opt.: solution with TOD using (left), (right); TOD-Cal.: solution with TOD using (left), (right).
Fig. 3: Cumulative success ratio versus time. The number of breakpoints is set as .
Fig. 4: Average regret versus time. The regret is calculated as . (R): Regret by taking ; (P): Pseudo-regret by taking . TOD-Sel.: solution with TOD using ; IIR: solution with separated exploration and exploitation using the same discount factor as TOD-Opt. The number of breakpoints is set as .

In Fig. 2, with different number of breakpoints, i.e

, we demonstrate the effectiveness and robustness of TOD by showing cumulative distribution functions (CDFs) of the latency of processing

tasks with different schemes. TOD-Opt and TOD-Cal in Fig. 2 represent two different criteria to choose in TOD. The discount factor in the former criterion is searched over to achieve the minimal average latency, while the other one is calculated following (19). Both the left and the right parts in Fig. 2 show the proposed TOD algorithm performs much better than the round-robin scheme, and performs close to the greedy method, which achieves the minimal realization of latency in each round. Additionally, we can learn from Fig. 2 that the calculated following (19) performs as well as the optimal one. In Fig. 2(a), there exists one breakpoint every tasks on average as , which indicates TOD is able to learn the system under frequent changes of parameter distribution. It is also worth noting that, in Fig. 2(b), TOD achieves even less average latency than the greedy scheme in the case of limited abrupt changes, i.e. . This phenomenon reveals that the decision with minimal latency in each time slot may not be the global optimum of (5). It also corroborates our previous analysis that every choice will affect the future state of the node, and further affects the following offloading decisions.

Fig. 3 presents the ratio of the number of successfully processed tasks to the number of tasks. A task is successful if the latency is less than . Although the success ratios of TOD are lower than the greedy scheme due to the exploration of nodes, they show the tendencies to approaching the greedy scheme with time going on.

Fig. 4 depicts the regrets from different schemes. Note the regret is based on the optimal realization (greedy method), and the pseudo-regret is based on the optimal expectation. In IIR, we separate the exploration and the exploitation to two phases. In the exploration phase, the round-robin method is adopted. In the exploitation phase, we focus on maximizing the estimated reward defined in (10), which is actually an estimate based on the infinite impulse response (IIR) filter. The ratio of two phases is searched to achieve the minimal regret. It can be observed that, either in the sense of the regret or in the sense of the pseudo-regret, the proposed TOD algorithm achieves much lower regrets than the round-robin scheme and the IIR scheme. This phenomenon shows that our proposed method performs well in dealing with the exploration-exploitation tradeoff. Besides, as the discount factor is set to , the TOD performance deteriorates a lot. This further indicates the importance of the exploration bonus.

V Conclusion

In this paper, an efficient online task offloading strategy and the corresponding performance guarantee in a fog-enabled network have been studied. Considering that the expectations of processing speeds change abruptly at unknown time instants, and the system information is available only after finishing the corresponding tasks, we have formulated it as a stochastic programming with delayed bandit feedbacks. To solve this problem, we have provided TOD, an efficient online task offloading algorithm based on the UCB policy. Given a particular number of breakpoints , we have proven that the bound on the number of tasks offloaded to a particular non-optimal node is in the order of . Besides, we have also proven that the pseudo-regret goes to zero almost surely when the number of tasks goes to infinity. Simulations have demonstrated that the proposed TOD algorithm is capable of learning and picking the right node to offload tasks under non-stationary circumstances.

References

  • [1] H. T. Dinh, C. Lee, D. Niyato, and P. Wang, “A survey of mobile cloud computing: Architecture, applications, and approaches,” Wireless Commun. Mobile Comput., vol. 13, no.18, pp. 1587–1611, Dec. 2013.
  • [2] M. Chiang and T. Zhang, “Fog and IoT: An overview of research opportunities,” IEEE Internet Things J., vol. 3, no. 6, pp. 854–864, Dec. 2016.
  • [3] M. Satyanarayanan, P. Bahl, R. Caceres, and N. Davies, “The case for VM-based cloudlets in mobile computing,” IEEE Pervasive Comput., vol. 8, no. 4, pp. 14–23, Oct. 2009.
  • [4] M. V. Barbera, S. Kosta, A. Mei, and J. Stefa, “To offload or not to offload? The bandwidth and energy costs of mobile cloud computing,” in Proc. IEEE INFOCOM, Turin, Italy, Apr. 2013, pp. 1285–1293.
  • [5] S. Kosta, A. Aucinas, P. Hui, R. Mortier, and X. Zhang, “ThinkAir: Dynamic resource allocation and parallel execution in the cloud for mobile code offloading,” in Proc. IEEE INFOCOM, Orlando, FL, USA, Mar. 2012, pp. 945–953.
  • [6] Y. Yang, K. Wang, G. Zhang, X. Chen, X. Luo, and M. Zhou, “MEETS: Maximal energy efficient task scheduling in homogeneous fog networks,” submitted to IEEE Internet Things J., 2017.
  • [7] T. Q. Dinh, J. Tang, Q. D. La, and T. Q. S. Quek, “Offloading in mobile edge computing: Task allocation and computational frequency scaling,” IEEE Trans. on Commun., vol. 65, no. 8, pp. 3571–3584, Aug. 2017.
  • [8] C. You, K. Huang, H. Chae, and B.-H. Kim, “Energy-efficient resource allocation for mobile-edge computation offloading,” IEEE Trans. Wireless Commun., vol. 16, no. 3, pp. 1397–1411, Mar. 2017.
  • [9] J. Kwak, Y. Kim, J. Lee, and S. Chong, “DREAM: Dynamic resource and task allocation for energy minimization in mobile cloud systems,” IEEE J. Sel. Areas Commun, vol. 33, no. 12, pp. 2510–2523, Dec. 2015.
  • [10] Y. Mao, J. Zhang, S. H. Song, and K. B. Letaief, “Stochastic joint radio and computational resource management for multi-user mobile-edge computing systems,” IEEE Trans. Wireless Commun., vol. 16, no. 9, pp. 5994–6009, Sept. 2017.
  • [11] Y. Yang, S. Zhao, W. Zhang, Y. Chen, X. Luo, and J. Wang, “DEBTS: Delay energy balanced task scheduling in homogeneous fog networks,” IEEE Internet Things J., in press.
  • [12] L. Pu, X. Chen, J. Xu, and X. Fu, “D2D fogging: An energy-efficient and incentive-aware task offloading framework via network-assisted D2D collaboration,” IEEE J. Sel. Areas Commun., vol. 34, no.12, pp. 3887–3901, Dec. 2016.
  • [13] X. Chen, “Decentralized computation offloading game for mobile cloud computing,” IEEE Trans. Parallel Distrib. Syst., vol. 26, no. 4, pp. 974–983, Apr. 2015.
  • [14] T. Chen and G. B. Giannakis, “Bandit convex optimization for scalable and dynamic IoT management”, arXiv preprint arXiv:1707.09060, 2017.
  • [15] C. Tekin and M. van der Schaar, “An experts learning approach to mobile service offloading,” in Proc. Annu. Allerton Conf. Commun., Control, Comput., 2014, pp. 643–650.
  • [16] P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis of the multiarmed bandit problem,” Mach. Learn., vol. 47, no. 2, pp. 235–256, May 2002.
  • [17] A. Garivier and E. Moulines, “On upper-confidence bound policies for switching bandit problems,” in Proc. Int. Conf. Algorithmic Learn. Theory, Espoo, Finland, Oct. 2011, pp. 174–188.
  • [18] D. A. Berry and B. Fristedt, Bandit Problems: Sequential Allocation of Experiments. London, U.K.: Chapman & Hall, 1985.
  • [19] P. Joulani, A. Gyorgy, and C. Szepesvari, “Online learning under delayed feedback,” in Proc. Int. Conf. Mach. Learn., Atlanta, GA, USA, Jun. 2013, pp. 1453–1461.
  • [20] S. Bubeck and N. Cesa-Bianchi, “Regret analysis of stochastic and nonstochastic multi-armed bandit problems,” Found. Trends Mach. Learn., vol. 5, no. 1, pp. 1–122, 2012.
  • [21] Z. Zhu, S. Jin, Y. Yang, H. Hu, and X. Luo, “Time reusing in D2D-enabled cooperative networks,” IEEE Trans. Wireless Commun., in press.

Vi Appendix

Proof.

According to the definition of , it can be decomposed as

(25)

where is a particular function with respect to . The number of missing feedbacks when task- is offloaded can be defined as

(26)

Clearly, the number of missing feedbacks is no larger than , i.e. . According to Lemma 1 in [17], for any , the following inequality is derived:

(27)

Due to the fact that

(28)

we have

(29)

Let , we have

(30)

Let denote the number of breakpoints before time , and denote the set of “well offloaded” tasks. Mathematically, these tasks are defined as follows.

(31)

where indicates the number of tasks, of which the delay is poorly estimated. Because of this, the D-UCB policy may not offload tasks to the optimal node, which leads to the following bound:

(32)

Next, we need to upper-bound the last term in (32). There are three facts:
i) The event occurs if and only if the event occurs;
ii) ;
iii) .
Based on these facts, the following inequality is obtained:

(33)

Namely, when node- is tested enough times by the task node, the event only occurs under three circumstances: i) the delay of the optimal node is substantially overestimated; ii) the delay of node- is substantially underestimated; iii) both delay expectations, i.e. and , are close enough.

However, if is chosen appropriately, the event never occurs. Denote the minimal difference between the expected delay of node- and the expected delay of the best node- by , i.e.

(34)

Let , where . Recalling , we have

(35)

However, from the definition of we obtain:

(36)

which is contradict with (35). Thus the events and never occur simultaneously, which indicates that we only need to upper-bound the probability of events and . Define as

(37)

where , then

(38)

Combining with the following two facts:

(39)

we obtain

(40)

Let

(41)

the inequality in (40) turns to be

(42)

Defining , the following inequality can be deduced:

(43)

where holds due to Theorem 4 in [17]. Let , we further obtain:

(44)

Till now, the expectation of can be upper-bounded as

(45)

Assuming

(46)

we have

(47)

and

(48)

Then the following inequality holds: