Learning-Based Task Offloading for Vehicular Cloud Computing Systems

Vehicular cloud computing (VCC) is proposed to effectively utilize and share the computing and storage resources on vehicles. However, due to the mobility of vehicles, the network topology, the wireless channel states and the available computing resources vary rapidly and are difficult to predict. In this work, we develop a learning-based task offloading framework using the multi-armed bandit (MAB) theory, which enables vehicles to learn the potential task offloading performance of its neighboring vehicles with excessive computing resources, namely service vehicles (SeVs), and minimizes the average offloading delay. We propose an adaptive volatile upper confidence bound (AVUCB) algorithm and augment it with load-awareness and occurrence-awareness, by redesigning the utility function of the classic MAB algorithms. The proposed AVUCB algorithm can effectively adapt to the dynamic vehicular environment, balance the tradeoff between exploration and exploitation in the learning process, and converge fast to the optimal SeV with theoretical performance guarantee. Simulations under both synthetic scenario and a realistic highway scenario are carried out, showing that the proposed algorithm achieves close-to-optimal delay performance.



There are no comments yet.


page 1

page 2

page 3

page 4

page 5

page 6


Adaptive Learning-Based Task Offloading for Vehicular Edge Computing Systems

The vehicular edge computing (VEC) system integrates the computing resou...

Task Replication for Vehicular Edge Computing: A Combinatorial Multi-Armed Bandit based Approach

In vehicular edge computing (VEC) system, some vehicles with surplus com...

Distributed Task Replication for Vehicular Edge Computing: Performance Analysis and Learning-based Algorithm

In a vehicular edge computing (VEC) system, vehicles can share their sur...

Learning-based decentralized offloading decision making in an adversarial environment

Vehicular fog computing (VFC) pushes the cloud computing capability to t...

Task Offloading and Replication for Vehicular Cloud Computing: A Multi-Armed Bandit Approach

Vehicular Cloud Computing (VCC) is a new technological shift which explo...

Multi-Task Offloading over Vehicular Clouds under Graph-based Representation

Vehicular cloud computing has emerged as a promising paradigm for realiz...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Mobile devices are expected to support a vast variety of mobile applications. Many of them require a large amount of computation in a very short time, which cannot be satisfied by the devices themselves due to the limited processing power and battery capacity. Mobile cloud computing (MCC) is thus proposed [1], enabling mobile devices to offload computation tasks to powerful remote cloud servers through the Internet. However, centralized deployments of clouds introduce long latency for data transmission, which cannot meet the requirements of the emerging delay-sensitive applications, such as augmented/virtual reality, connected vehicles, Internet of things, etc. By deploying computing and storage resources at the network edge, mobile edge computing (MEC) can provide low latency computing services [2]. A major challenge in the MEC system is to perform task offloading, i.e., when and where to offload the computation tasks, and how to allocate radio and computing resources, which have been widely investigated [3, 4, 5]. Ad hoc cloudlet has been proposed in [6] to make use of the computing resources of mobile devices through device-to-device communications [7].

Vehicles have huge potential to enhance edge intelligence. The global number of connected vehicles is increasing rapidly, and will achieve to around 250 million by 2020 [8]. Meanwhile, vehicles are equipped with increasing amount of computing and storage resources [9]. In order to improve the utilization of vehicle resources, the concept of vehicular cloud computing (VCC) is proposed [10], in which vehicles can serve as vehicular cloud (VC) servers by sharing their surplus computing resources, and users such as other vehicles and pedestrians can offload computation tasks to them. In this case, the vehicles providing services are called service vehicles (SeVs) and the vehicles that offload their tasks are called task vehicles (TaVs). Existing architectures include software-defined VC [11] and VANET-Cloud [12].

Compared with the MEC system, the highly dynamic vehicular environments bring more uncertainties to the VCC system. First, the topology of vehiclar networks and the wireless channel states vary rapidly over time due to the mobility of vehicles. Second, the computing resources of SeVs are heterogeneous and fluctuate over time. These factors are typically difficult to predict, but significantly affect the delay performance of computation tasks. Furthermore, each TaV may be surrounded by multiple candidate SeVs since the density of SeVs can be much higher than that of MEC servers. It is not easy to estimate the performance of different SeVs for task offloading.

There are existing papers investigating task offloading algorithms in the VCC system. In [13], the VC and remote cloud layer are jointly considered. A centralized task offloading algorithm is proposed to minimize the average system cost related to delay, energy consumption and resource occupation. However, the centralized control requires large signaling overheads for vehicular states update, and the proposed algorithm has high complexity. A distributed task offloading algorithm is proposed in [14] based on ant colony optimization, which is of much lower complexity. However, it still requires exchanges of vehicular states. To further overcome the uncertainties in the VCC system and improve service reliability, replicated task offloading is proposed in [15], in which task replicas are assigned to multiple SeVs at the same time.

In this work, we focus on the task offloading problem of TaVs in the VCC system, and propose a learning-based task offloading algorithm to minimize the average offloading delay. Our main contributions are summarized as follows:

1) We propose a learning-based distributed task offloading framework based on the multi-armed bandit (MAB) theory [16], which enables TaVs to learn the performance of SeVs and to make task offloading decisions individually, in order to obtain low delay without exchanging vehicular states.

2) We propose an adaptive volatile upper confidence bound (AVUCB) algorithm by redesigning the utility function in the classic MAB algorithms, making it adapt to the time-varying load and action space. Both load-awareness and occurrence-awareness are augmented to the learning algorithm, so that AVUCB can effectively cope with the highly dynamic vehicular environment and balance the so-called exploration and exploitation tradeoff in the learning process. We also prove that the performance loss can be upper bounded.

3) Simulations are carried out under both synthetic scenario and a realistic highway scenario, showing that the proposed algorithm can achieve close-to-optimal delay performance.

The rest of this paper is organized as follows. The system model and problem formulation is described in Section II. The AVUCB algorithm is then proposed in Section III and the performance is analyzed in Section IV. Simulation results are provided in Section V, and finally comes the conclusion in Section VI.

Ii System Model and Problem Formulation

Ii-a System Overview

Fig. 1: An illustration of task offloading in the VCC system.

We consider a discrete-time VCC system in which moving vehicles are classified into two categories: SeVs and TaVs. SeVs are employed as vehicular cloud servers to provide computing services, while the on-board user equipments (UEs) of TaVs, such as smart phones and laptops, generate computation tasks that need to be offloaded for cloud execution. Note that the role of SeV or TaV is not fixed for each vehicle, which depends on whether the computing resources are sufficient and shareable. For each TaV, the surrounding SeVs in the same moving direction within its communication range

are considered as candidate servers. Here the moving direction, together with vehicle ID, speed and location of each candidate SeV can be known to each TaV, provided by vehicular communication protocols such as beacons of dedicated short-range communication (DSRC) standards [17].

Each computation task is offloaded to one of the candidate SeVs and processed on this single SeV according to the task offloading algorithm, without further offloaded to other SeVs, MEC servers or the remote cloud. An exemplary VCC system is shown in Fig. 1, where TaV 1 discovers 3 candidate SeVs (SeV 1-3) in its neighborhood, and the current task is offloaded to and executed on SeV 3.

In this work, task offloading decisions are made in a distributed manner, i.e., each TaV makes its own task offloading decisions, in order to avoid large signaling exchange overhead. We focus on a representative TaV that moves on the road for a total of time periods. In time period , the TaV generates a computation task, selects an SeV , offloads the task and then receives the computing result. Here, is denoted as the candidate SeV set that can provide computing services to the TaV in time period . We assume that for , otherwise the TaV can offload tasks to the MEC server or the remote cloud. Note that changes across time since vehicles are moving.

Ii-B Computation Task Offloading

The computation task generated in time period is described by three parameters: input data size (in bits) that needs to be transmitted from TaV to SeV , output data size (in bits) that is fed back from SeV to TaV, and the computation intensity (in CPU cycles per bit) which is the required CPU cycles to compute one bit input data. Then the total required CPU cycles of the task in time period is [3].

For each candidate SeV , the maximum computation capability is denoted by (in CPU cycles per second), which is the maximum available CPU speed of its on-board server. Multiple computation tasks can be processed simultaneously using processor sharing, and the allocated computation capability to the considered TaV in time period is denoted by . Then the computation delay of SeV is


However, in the real system, may be unknown to the TaV in advance (this will be discussed in details in Section II-C).

At time period , the uplink transmission rate between the TaV and each candidate SeV is denoted by , which mainly depends on the uplink channel state between the TaV and SeV , and the interference power at SeV . Given the channel bandwidth , the transmission power of the TaV and the noise power , the uplink transmission rate can be written as


Similarly, the downlink transmission rate is given by


where is the downlink channel state between SeV and the TaV, and is the interference at the TaV.

Therefore, the total transmission delay for uploading the task to SeV and receiving the result feedback is


Still, both and are unknown to the TaV in advance.

Finally, the sum offloading delay to SeV in time period is the computation delay plus the transmission delay


Ii-C Problem Formulation

The TaV makes task offloading decisions about which SeV should serve each computation task, in order to minimize the average offloading delay. The problem is formulated as


where is the index of the selected SeV for task offloading in time period .

If the TaV knows the exact computation capability , uplink and downlink transmission rates , of all candidate SeVs before offloading the task in time period , it only needs to calculate the sum delay for , and . However, in real systems, the wireless channel state and the interference change rapidly due to the movements of vehicles, and the computing resource of each SeV is shared by multiple tasks. Thus the transmission rates , and the computation capability are fast varying across time, which are not easy to predict. On the other hand, if each TaV requests , and of all candidate SeVs in each time period, the signaling overhead will be very high.

Without the pre-knowledge of the transmission rates , and computation capability of each candidate SeV , the TaV does not know which SeV performs the best when making the current offloading decision. Therefore, we will design learning-based task offloading algorithm in the following section, in which the TaV learns the delay performance of candidate SeVs based only on the historical delay observations. That is, the offloading decision at time is based on the observed delay sequence , but not the exact value of , and of any candidate SeV in the current time period .

Iii Learning-Based Task Offloading Algorithm

In this section, we develop learning-based task offloading algorithm which enables the TaV to learn the delay performance of candidate SeVs and minimizes the average offloading delay.

We assume that tasks are of diverse input data size, but the ratio of output data size and input data size, as well as the computation intensity are identical. In fact, this is a valid assumption when the offloaded tasks are of the same kind. Let and for . Define the bit offloading delay as


which is the sum delay of offloading one bit input data to SeV in time period , reflecting the comprehensive service capability of each candidate SeV. Therefore, the sum delay


When making the offloading decision of the current task at time , TaV knows the input data size . But for , the exact value of and its distribution are not known to the TaV in prior, which need to learn.

Our task offloading problem can be formulated as an MAB problem to solve. To be specific, the TaV is the player and each candidate SeV corresponds to an action with unknown distribution of loss. The player makes sequential decisions on which action should be taken to minimize the average loss. The main challenge of the classic MAB problem is to balance the exploration and exploitation tradeoff: explore different actions to learn good estimates of each distribution, while at the same time select the empirically best actions as many as possible. The problem has been widely studied and many algorithms have been proposed with strong performance guarantee, such as the upper confidence bound (UCB) based UCB1 and UCB2 algorithms [16]. The MAB framework has already been applied in the wireless networks to help learn the unknown environments, such as solving channel access problems [18] and mobility management issues [19].

Although our problem is similar to the classic MAB problem, we still face two new challenges. First, the candidate SeV set changes across time due to the relative movements of vehicles, rather than the fixed number of actions in the classic MAB problem. The SeVs may appear and disappear in the communication range of the TaV unexpectedly, causing a volatile action space. Existing solutions cannot exploit the empirical information of the remaining SeVs efficiently. Second, the performance loss in each time period is of equal weight in the MAB problem. However, in our model, the input data size of each task brings a weighting factor on the offloading delay. Intuitively, the task offloading algorithm should explore more when is low, and exploit more when is high, so that the cost of exploration can be reduced.

To overcome the aforementioned two challenges, we propose an Adaptive Volatile UCB (AVUCB) algorithm for task offloading, as shown in Algorithm 1. Parameter is a constant factor, is the number of tasks that have been offloaded to SeV up till time , and records the occurrence time of each SeV . Parameter is the normalized input data size within , which is denoted as


where and are the upper and lower thresholds for normalizing .

1:Input: , and .
2:for  do
3:     if  Any SeV has not connected to TaV then
4:         Connect to SeV once.
5:         Update , , .
6:     else
7:         Observe .
8:         Calculate the utility function of each candidate SeV :
9:         Offload the task to SeV:
10:         Observe delay .
11:         Update .
12:         Update .
13:     end if
14:end for
Algorithm 1 AVUCB Algorithm for Task Offloading

Our proposed AVUCB algorithm can effectively balance the exploration and exploitation under the variation of candidate SeV set and input data size, inspired by the volatile MAB [20] and opportunistic MAB [21] frameworks. In Algorithm 1, Lines 3-5 are the initialization phase, in which the TaV will connect to the newly appeared SeV once. Lines 7-12 represent the continuous learning phase. The utility function defined in (10) is the sum of empirical delay performance

and a padding function (the latter term). Compared with existing UCB algorithms, the padding function is redesigned by jointly taking into account the occurrence time

of SeV and the input data size (the load), thus it can dynamically adjusts the weight of exploration and bring the occurrence-awareness and load-awareness to the algorithm. Task offloading decision is then made in Line 9, which is a minimum seeking problem with computational complexity , where is the number of candidate SeVs in time period .

Iv Performance Analysis

In this section, we present the performance analysis of the proposed AVUCB algorithm. We first define an epoch as the duration in which the candidate SeV set remains the same. Let

denote the total number of epochs during time periods, the candidate SeV set in the th epoch, and , the beginning and ending time period of the th epoch with . We assume that the bit offloading delay of each candidate SeV is i.i.d. over time and independent of each other, with expectation . We will prove later through simulations that without this assumption, AVUCB still works well. In each epoch, let and .

Define the total learning regret as


which is the expected loss of delay performance due to lacking the service capability information of the candidate SeVs. In the following subsections, we try to upper bound the learning regret of AVUCB algorithm.

Iv-a Regret Analysis under Identical Load

Compared to the existing UCB algorithms [16], the AVUCB algorithm adds the occurrence time and normalized input data size to the padding function. In this subsection, we first investigate the impact of the occurrence time, by assuming that the load is not time varying, i.e., tasks are of identical input data size with and for . Thus the padding function in (10) can be simplified as , and the learning regret .

Let , and . We first provide the upper bound of the learning regret within each epoch, as shown in Lemma 1.

Lemma 1.

Let , in the th epoch, the learning regret of AVUCB with identical load has an upper bound as follows:


See Appendix A. ∎

Then we can upper bound the learning regret over time periods in the following theorem.

Theorem 1.

Let , the total learning regret of AVUCB with identical load has an upper bound as follows:


See Appendix B. ∎

Theorem 1 implies that when tasks are of equal input data size, the proposed AVUCB algorithm can provide a bounded performance loss, compared to the optimal solution in which the TaV knows in prior which SeV performs the best. Specifically, the performance loss grows linearly with and logarithmically with .

Iv-B Impact of the Load

In this subsection, we show the impact of the load on the learning regret, by considering that the input data size is random and continuous. For simplicity, we focus on a single epoch and assume . Thus there exists single best SeV with , , and the learning regret is simplified as .

Recall that the normalized input data size is defined in (9), in which the upper and lower thresholds and should be carefully selected for the tradeoff between exploration and exploitation. In the following theorem, and are selected such that and . Particularly, when , let if and if .

The learning regret under random and continuous input data size is shown in Theorem 2.

Theorem 2.

Let , with random continuous and , we have:

(1) With , the expected number of tasks offloaded to any SeV can be upper bounded as


(2) With , the learning regret


where is the expected under condition , , and .


See Appendix C. ∎

Theorem 2 shows that, under single epoch, the learning regret grows logarithmically with . This implies that when the input data size varies across time, our proposed AVUCB algorithm can still effectively balance the exploration and exploitation by adjusting the normalized factor , and provide a bounded deviation compared to the optimal solution.

V Simulations

In this section, we evaluate the average delay and learning regret of the proposed AVUCB algorithm through simulations. We start from a synthetic scenario, and then simulate a realistic highway scenario.

V-a Simulation under Synthetic Scenario

Simulations under synthetic scenario are carried out in MATLAB. We consider one TaV and 5 SeVs appear (and may also disappear) in the duration of time periods. The communication range . Within the TaV’s communication range, the distance between the TaV and each SeV ranges in , and changes randomly by to every time period. According to [22], the wireless channel state is modeled by an inverse power law , where is the distance between TaV and SeV and . Other default parameters are: computation intensity , transmit power , channel bandwidth , noise power and parameter in (10) is .

We first evaluate the effect of occurrence time in the proposed AVUCB algorithm, by assuming that all the tasks are of equal input data size , and thus for . The whole duration is divided into epochs, each having time periods. The index of candidate SeVs and their maximum computation capability is shown in table I. In epoch 2, there appears two SeVs indexed by and , while in epoch 3, there appears a new SeV and SeV disappears. At each time period, the allocated computation capability is randomly distributed from to .

Index of SeV 1 2 3 4 5
(GHz) 3 4 6 5 2
Epoch 1
Epoch 2
Epoch 3
TABLE I: Candidate SeVs and Maximum Computation Capability

Fig. 2(a) shows the learning regret of the proposed AVUCB algorithm and existing UCB1 algorithm under diverse occurrence time of SeVs and identical load. In the first epoch, the two algorithms perform the same since the occurence time of all SeVs are 1. From the second epoch, by taking into account of the occurrence time, AVUCB is able to learn the performance of the newly appeared SeVs faster, while effectively making use of the information of the remaining SeVs. Compared to UCB1 algorithm, it can reduce about of the learning regret. The average delay performance of each epoch is shown in Fig. 2(b), in which the average delay of AVUCB converges faster to optimal delay than UCB1.

(a) Learning regret
(b) Average delay
Fig. 2: Performance of AVUCB under diverse SeV occurrence time and identical load.

We then focus on a single epoch with , and evaluate the effect of the normalized input data size under random load. The candidate SeV set and its maximum computation capability are the same as epoch in table I. The input data size

is uniformly distributed within

. The upper and lower thresholds are set to be and respectively. As shown in Fig. 3, under the two sets of thresholds, AVUCB achieves similar learning regret, since both settings can effectively adjust the weight of exploration and exploitation. However, the UCB1 algorithm suffers much higher learning regret without load-awareness, since it may still explore even if the input data size is large.

Fig. 3: Performance of AVUCB under random load.

V-B Simulation under Realistic Highway Scenario

In this subsection, we simulate a realistic highway scenario and better emulate the traffic flow using Simulation of Urban MObility (SUMO)111http://www.sumo.dlr.de/userdoc/SUMO.html. We use a stretch of two-lane G6 Highway in Beijing with two ramps, obtained from Open Street Map (OSM)222http://www.openstreetmap.org/. The network consists of SeVs and

TaVs. Each vehicle is of equal probability of reaching a maximum speed of

or . In each time period, the probability of generating a vehicle from each ramp is , and whenever a vehicle approaches a ramp, it leaves the highway with probability .

The locations of vehicles are simulated by SUMO, and then we can calculate the distance between each TaV and SeV at each time period. SeVs within its communication range are considered as candidate SeVs. We then focus on a single TaV, moving through the highway in time periods. The occurrence and departure time of each candidate SeV, and its maximum computation capability are listed in table II. Same as above, the allocated computation capability is randomly distributed from to , and the input data size is uniformly distributed within . Let .

Index of SeV 1 2 3 4 5
Occurrence time 1 1 1 118 320
Departure time 400 400 400 400 343
(GHz) 3 2 2.5 4.5 3.5
TABLE II: Candidate SeVs and Maximum Computation Capability

We evaluate the average delay performance of the proposed AVUCB algorithm compared with 4 other algorithms: 1) UCB1 [16], which considers neither occurrence time nor input data size. 2) VUCB1 [20], which is occurrence-aware but not load-aware. 3) A naive Random Policy in which the TaV randomly select a SeV in each time period. 4) Optimal Policy, in which TaV knows the performance of each candidate SeV in advance and selects the optimal one.

Fig. 4: Average delay performance of AVUCB algorithm under a realistic highway scenario.

Fig. 4 shows the average delay of the aforementioned 5 algorithms. Our proposed AVUCB algorithm achieves close-to-optimal delay performance, while outperforms the other algorithms. This is because by introducing the occurrence time of each SeV and the normalized input data size, AVUCB is both occurrence-aware and load-aware, and can effectively balance exploration and exploitation in the learning process.

Vi Conclusions

In this work, we have studied the task offloading problem in the VCC system and proposed a learning-based AVUCB algorithm that minimizes the average offloading delay based only on the historical delay observations. The proposed algorithm is of low complexity and easy to implement with low signaling overhead. We have extended the classic MAB algorithms to be both load-aware and occurrence-aware, by taking into account the input data size of tasks and the occurrence time of SeVs in the utility function. Therefore, AVUCB can effectively adapt to the highly dynamic vehicular environment and balance the tradeoff between exploration and exploitation in the learning process with performance guarantees. Simulations under both synthetic scenario and a realistic highway scenario have shown that our proposed algorithm can achieve close-to-optimal delay performance. Future research directions include considering the heterogeneous cloud architecture with VCC, MEC and remote cloud, as well as the cooperation of SeVs.


This work is sponsored in part by the Nature Science Foundation of China (No. 91638204, No. 61571265, No. 61621091), NSF through grants CNS-1547461, CNS-1718901, CCF-1423542, and Intel Collaborative Research Institute for Mobile Networking and Computing.

Appendix A Proof of Lemma 1

The learning regret in the th epoch can be written as


where is the number of tasks offloaded to SeV in epoch . Following the proof of Lemma 1 in [20] and Theorem 1 in [16], when , the expectation of can be upper bounded by


Substituting (18) into (A), we prove Lemma 1:


Appendix B Proof of Theorem 1

According to Lemma 1, the learning regret of the th epoch:


By summing over , the total learning regret:


Appendix C Proof of Theorem 2

When and , the utility function in (10) is


The offloading decision in (11) can be written as:


The learning regret can be written as


Since , and , our problem is equivalent to the problem defined in Section II in our previous work [21]. By leveraging Lemma 7, Theorem 3 and Appendix C.2 in [21], we can get Theorem 2.


  • [1] H. T. Dinh, C. Lee, D. Niyato, and P. Wang, “A survey of mobile cloud computing: Architecture, applications, and approaches,” Wireless Commun. Mobile Comput., vol. 13, no. 18, pp. 1587-1611, 2013.
  • [2] Y. C. Hu, M. Patel, D. Sabella, N. Sprecher, and V. Young, “Mobile edge computing: A key technology towards 5G,” ETSI White Paper No.11, vol. 11, 2015.
  • [3] Y. Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, “A survey on mobile edge computing: The communication perspective,” IEEE Commun. Surveys Tuts., vol. 19, no. 4, pp. 2322-2358, 2017.
  • [4] C. You, K. Huang, H. Chae, and B.-H. Kim, “Energy-efficient resource allocation for mobile-edge computation offloading,” IEEE Trans. Wireless Commun., vol. 16, no. 3, pp. 1397–1411, Mar. 2016.
  • [5] T. Zhao, S. Zhou, X. Guo, and Z. Niu, “Tasks scheduling and resource allocation in heterogeneous cloud for delay-bounded mobile edge computing,” in Proc. IEEE Int. Conf. Commun. (ICC), Paris, France, May 2017.
  • [6] M. Chen, Y. Hao, Y. Li, C. F. Lai, and D. Wu, “On the computation offloading at ad hoc cloudlet: architecture and service modes,”IEEE Commun. Mag., vol. 53, no. 6, pp. 18-24, Jun. 2015.
  • [7] L. Wang, and G. L. Stuber, “Pairing for resource sharing in cellular device to device underlays,” IEEE Netw., vol. 30, no. 2, pp. 122-128, Mar. 2016.
  • [8] A. Velosa, J. F. Hines, L. Hung, E. Perkins, and R. M Satish, “Predicts 2015: The internet of things,” Gartner Inc., Dec. 2014.
  • [9] S. Abdelhamid, H. Hassanein, and G. Takahara, “Vehicle as a resource (VaaR),” IEEE Netw., vol. 29, no. 1, pp. 12-17, Feb. 2015.
  • [10] M. Whaiduzzaman, M. Sookhak, A. Gani, and R. Buyya, “A survey on vehicular cloud computing,”J. Netw. Comput. Appl., vol. 40, pp.325-344, Apr. 2014.
  • [11] J. S. Choo, M. Kim, S. Pack, and G. Dan, “The software-defined vehicular cloud: A new level of sharing the road,” IEEE Veh. Technol. Mag., vol. 12, no. 2, pp. 78-88, Jun. 2017.
  • [12] S. Bitam, A. Mellouk, and S. Zeadally, “VANET-cloud: a generic cloud computing model for vehicular ad hoc networks,” IEEE Wireless Commun., vol. 22, no. 1, pp. 96-102, Feb. 2015.
  • [13] K. Zheng, H. Meng, P. Chatzimisios, L. Lei, and X. Shen, “An smdp-based resource allocation in vehicular cloud computing systems,” IEEE Trans. Ind. Electron., vol. 62, no. 12, pp. 7920-7928, Dec. 2015.
  • [14] J. Feng, Z. Liu, C. Wu, and Y. Ji, “AVE: autonomous vehicular edge computing framework with aco-based scheduling,” IEEE Trans. Veh. Technol., vol. 66, no. 12, pp. 10660-10675, Dec. 2017.
  • [15] Z. Jiang, S. Zhou, X. Guo, and Z. Niu, “Task replication for deadline-constrained vehicular cloud computing: Optimal policy, performance analysis and implications on road traffic,” IEEE Internet Things J., to be published.
  • [16] P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis of the multiarmed bandit problem,” Machine learning, vol. 47, no. 2-3, pp. 235–256, 2002.
  • [17] J. B. Kenney, “Dedicated short-range communications (DSRC) standards in the United States,” Proceedings of the IEEE, vol. 99, no. 7, pp. 1162-1182, Jul. 2011.
  • [18] L. Chen, S. Iellamo, and M. Coupechoux, “Opportunistic Spectrum Access with Channel Switching Cost for Cognitive Radio Networks,” in Proc. IEEE Int. Conf. Commun. (ICC), Kyoto, Japan, Jun. 2011.
  • [19] Y. Sun, S. Zhou, and J. Xu, “EMM: Energy-Aware Mobility Management for Mobile Edge Computing in Ultra Dense Networks,” IEEE J. Sel. Areas Commun., vol. 35, no. 11, pp. 2637-2646, Nov. 2017.
  • [20] Z. Bnaya, R. Puzis, R. Stern, and A. Felner, “Social network search as a volatile multi-armed bandit problem,” HUMAN, vol. 2, no. 2, pp. 84-98, 2013.
  • [21] H. Wu, X. Guo, and X. Liu, “Adaptive exploration-exploitation tradeoff for opportunistic bandits.” [Online] Available: https://arxiv. org/abs/1709.04004.
  • [22] M. Abdulla, E. Steinmetz, and H. Wymeersch,“Vehicle-to-vehicle communications with urban intersection path loss models,” in Proc. IEEE Global Commun. Conf. (GLOBECOM), Washington, DC, USA, Dec. 2016.