I Introduction
Recent developments in vehicular applications such as autonomous driving, locationspecific services and various forms of mobile infotainments are pushing car manufacturers to offer increasingly more advanced and sufficient onboard computing resources. In spite of the phenomenal growth of computing capacities on vehicles, it has been recently noticed that, most of the time, a huge array of onboard computing capacities are chronically underutilized [1]. A series of recent papers [1, 2, 3] have put forth the vision of Vehicular Cloud Computing (VCC), which pools underutilized vehicular resources (including computing resource, net connection and storage facilities) and rents them to vehicles on the road or other customers, similar to the way in which the resource of conventional cloud is provided but a nontrivial extension of conventional cloud. VCC is actually a paradigm shift from Vehicular Ad Hoc Network (VANET) which includes VehicletoVehicle (V2V), VehicletoInfrastructure (V2I), and Vehicletoeverything (V2X) communications. Though originally designed for emergency alerts and collision avoidance, VANET is now merging with intelligent transportation systems, which leads to the advent of intelligent vehicular networks that build ubiquitous vehicular communication environment using different protocols, e.g. dedicated short range communication (DSRC) [4], longtermevolutionvehicle (LTEV) [5] and 5G technologies [6]. Armed with these components, vehicular networks are evolving into a connected group of smart vehicles.
With the pooled vehicular resources, VCC operator uses VehicleasaResource (VaaR) to provide computing services to end users (e.g., onboard passengers or pedestrians). Without loss of generality, we define the end user as onboard equipment on vehicles. In this case, the vehicles in a VCC system can be grouped into two categories: Server Vehicle (SeV) and Task Vehicle (TaV). SeVs have surplus computing resources and therefore is pooled by VCC to provide computing service. (Onboard equipment of) TaVs have task requests that need to be offloaded for processing. The supply and demand of computing resource are matched by VCC operator, e.g., a roadside unit (RSU), which collects task requests from TaVs and assign them to SeVs. After the task computation, RSUs also collect the results and return them to TaVs. Fig. 1 shows an illustration of the considered scenario.
While VCC has offered a basic framework for computing service provisioning, task scheduling policies still need to be carefully designed to guarantee the timeliness of task processing due to the increasing demand on realtime response of vehicular applications. To capture this important feature, we assume the tasks in VCC are deadlineconstrained (i.e., the task result must be returned to TaV before a hard deadline, otherwise it becomes useless). Therefore, how to ensure that the tasks can be completed before the deadline becomes the main concern. In this paper, we use the task replication technique to enhance the performance of VCC for deadlineconstrained tasks. The key idea of task replication is allowing one task to be offloaded to multiple SeVs and the task is considered completed as long as one of the SeVs processes the task and feeds back the result before the deadline. In this way, the large number of vehicles can be exploited efficiently to provide satisfactory Quality of Service (QoS).
However, optimally deciding task replications for VCC faces special challenges. First of all, the service delay of a task exhibits large uncertainty due to the unpredictable vehicle trace. For example, as shown in Fig. 1, the delay for task result return depends on the locations of TaV and SeVs which decide whether the interRSU data transmission is necessary. However, this location information is unknown/uncertain to RSUs when the task replication decisions are made. How to deal with the uncertainty in vehicle mobility will be the most critical issue for task replication in VCC systems. Second, VCC systems are extremely volatile where the vehicles connect and disconnect at any time and the role (TaV or SeV) of vehicles changes frequently. This is very different from existing task scheduling strategies for conventional cloud computing where available servers are fixed in advance. The task replication policy for VCC must be carefully designed to work efficiently with everchanging system status. Third, the budget constraint is another critical issue that needs to be considered, otherwise, the VCC operator could simply assign a task to all available SeVs. Notice that whether a task is completed by a set of replications follows the AtLeastOne rule which demonstrates a feature of diminishing rewards. A good task replication policy should stop replicating smartly to ensure that replications are always beneficial for TaVs.
In this paper, a novel learning algorithm, called DATEV (DeadlineAware Task rEplication for Vehicular cloud), is proposed for the replication of deadlineconstrained tasks based on Multiarmed Bandit (MAB) framework. We design a novel MAB algorithm, contextualcombinatorial MAB (CCMAB), to address special challenges in VCC systems. CCMAB collects context (side information) of computational tasks, TaVs, and SeVs, learns over time the completion probability of task replications with the collected contexts, and exploits learned knowledge to select multiple SeVs for a task request. One salient feature of CCMAB is that it is able to work with infinitely many vehicles and allow them to appear and disappear at any time. However, the sequential decision making in CCMAB can be easily interrupted by the stochasticity of task arrival since the new tasks may arrive at RUS before the results of previous tasks are returned (formally termed as delayed feedback in MAB problems). To better fit the practical application in VCC, CCMAB is also extended to learn with the delayed feedback. The key contributions of this paper are summarized as follows:
1) We first construct a RUSassisted VCC system and formulate the deadlineconstrained task replication problem as a submodular function maximization problem with cardinality constraint. A greedy algorithm is designed to give an oracle solution by assuming that the completion probabilities (i.e. the probability that the task result is returned to TaV before the deadline) of all possible task replications are known a priori.
2) The formulated task replication problem is next considered as a MAB problem. A learning algorithm, DATEV, is developed within a novel MAB framework, contextualcombinatorial MAB (CCMAB), which satisfies the special needs of VCC systems. The main advantage of CCMAB is that it is able to learn efficiently with an infinitely large number of vehicles and everchanging VCC systems. We analytically bound the loss due to learning, termed regret, of DATEV compared to the oracle benchmark that knows precisely the completion probability of task replications a priori. A regret bound is first provided with nondelayed feedback by assuming the rewards of task replications can be immediately observed, and then extended to the delayed feedback case. The regret upper bounds in both cases are sublinear, which imply that the proposed learning framework produces asymptotically optimal task replication decisions.
3) We carry out extensive simulations using the realworld mobility trace of San Francisco Yellow Cabs. The results show that the proposed DATEV significantly outperforms other benchmark algorithms.
The rest of this paper is organized as follows. Section II reviews related work. Section III presents the system model and formulates the task replication problem. Section IV designs the learning algorithm DATEV and gives its performance guarantee. Section V evaluates the proposed algorithm via simulations, followed by the conclusion in Section VI.
Ii Related Work
Recent efforts have been made to investigate the VCC system. Authors in [7, 8] proposed to exploit the spare computing resource on parked cars for task offloading and cooperative sensing. Since the locations of parked cars do not change over time, the task offloading policies for parked cars are similar to those for static cloud servers which have been well investigated [9, 10]. There exist works considering moving vehicles in VCC systems [11, 12]
, where the task scheduling process is assumed to be a Markov Decision Process (MDP). However, the MDPbased approaches usually suffer from the curse of dimensionality and hence cannot be applied to the situation when the number of vehicles is large. By contrast, we solve the task replication problem within a MAB framework, which is a general learning framework and does not rely on additional assumptions on traffic model or scheduling process. The most related work is probably
[13] where the authors use the MAB framework to help make task offloading decision. While [13] only considers a task offloading problem, we consider both task offloading and task replication for VCC systems. More importantly, the MAB algorithm proposed in [13] only works with a finite arm set. By contrast, the proposed CCMAB framework is able to learn with an infinitely large arm set which fits the VCC systems.A large body of work has focused on the task replication policy in data retrieval [14] and multiserver data processing systems [15]. These task replication techniques are usually leveraged to deal with the straggler problem where the service process has a heavytail distribution. For example, in [14], the optimal replication degree, i.e., the number of replicas, is investigated. However, it is not applicable to the VCC system since the servers (vehicles) are not always available and may change across the time.
MAB has been widely studied to address the critical tradeoff between exploration and exploitation in sequential decision making under uncertainty [16]. The basic MAB concerns with learning the single optimal arm among a set of candidate arms of a priori unknown rewards by sequentially trying one arm each time and observing its realized noisy reward [17]. Combinatorial bandits extends the basic MAB by allowing multipleplay each time [18] and contextual bandits extends the basic MAB by considering the contextdependent reward functions [19]. While both combinatorial bandits and contextual bandits problems are already much more difficult than the basic MAB problem, this paper tackles the even more difficult contextualcombinatorial MAB problem. Recently, a few other works [20, 21] also started to study CCMAB problems. However, these works make strong assumptions that are not suitable for VCC systems. For instance, [22, 20] assume that the reward of an individual action is a linear function of the contexts and [21] assumes a fixed arm set. In our problem, the reward of a replication is unlikely to be a linear function of contexts and, more importantly, the arms may appear and disappear across the time. Delayed feedback [23] is another important branch in the MAB family. It concerns with a practical issue that the rewards of arms are not immediately available after the arms are pulled. This issue is also encountered when applying MAB in VCC systems since the transmission/computation delays are incurred to complete the tasks. However, most existing works [23, 24] on MAB with delayed feedback assume a fixed arm set and hence cannot be applied in our problem.
Iii System Model
Iiia Vehicle Cloud and System Overview
We consider a Vehicle Cloud Computing (VCC) system where a set of Road Side Units (RSUs) are deployed along the main streets based on certain deployment rules, e.g. improving the overall network performance or maximizing deployment distance [25]. The main functionality of a RSU is receiving tasks from TaVs and dispatching the tasks to appropriate SeVs such that the task results can be returned to TaVs before the deadline. Consider an arbitrary RSU, let be the sequence of TaVs’ tasks received by the RSU. The procedures for completing a task are as follows:
1) TaVtoRSU (T2R) task offloading: when a TaV issues a task request, it connects to a nearby RSU and offload its task via the wireless connection. The data transmission between TaV and RSU can be easily achieved by existing V2I communication techniques, e.g. DSRC, LTE, and 5G.
2) RSUtoSeV (R2S) task assignment: RSU identifies the available SeVs based on the RSUtoSeV uplink SINR condition. To ensure the successful task transmission, the SINR of SeV should be greater than a threshold :
(1) 
where is the transmission power of RSU, denotes the distance between RSU and SeV , is the signal power decay, is the background noise on the frequency channel, is the interference, and is the threshold which depends on the wireless network design ( is recommended for vehicle communication [26]).
3) Task processing: Once a SeV receives a task, it processes the task with the computing resource on the vehicle. To simplify the system model, it is assumed that the SeV has immediate computing resources to allocate for task processing. Queuing of task at SeVs are not considered in this paper.
4) Return results: After the task is processed, the SeV needs to send the task result back to the TaV. At this time, we denote the RSU associated with SeV as SRSU and the RSU associated with TaV as TRSU. SeV first transmits the task result to SRSU via wireless connection and then SRSU transmits the results to TRSU through the backhaul network. When TRSU receives the result, it sends the result to the TaV. Note that if SRSU and TRSU turn out to be the same RSU, the transmission between SRSU and TRSU is not performed.
In our paper, we consider that each task has a hard deadline requirement . A task is completed if the TaV receives the task result before the deadline, otherwise it is failed. The probability of task completion is subject to many uncertain factors, e.g. wireless channel condition and the trace of moving vehicles. Note that whether a SeV can return the task result before the deadline is unknown to the RSU. To increase the completion probability of a task, we allow RSU to send a TaV’s task to multiple SeVs. We call each TaVSeV pair a replication of the task and write all possible replications also as with slight abuse of notation.
IiiB Service Delay and Replication Quality
Each task is denoted by a tuple where (in bits) denotes the size of task input data, (in bits) is the size of task result, is the numbers of CPU cycles required to complete the task, is the budget (maximum number of replications) for the task, and is the deadline. Service delay is incurred to complete the task. Let denote the service delay of replication . It consists of the following parts:
IiiB1 T2R task transmission delay
the transmission rate for offloading task from TaV to RSU can be written as . Therefore, the T2R transmission delay can be written as . Note that is actually revealed to RSU by observing the timestamps of data packets defined by Network Time Protocol (NTP).
IiiB2 R2S task assignment delay
For simplicity of system model, we assume transmission between RSU and SeV is operated with a fixed transmission rate by leveraging power/spectrum allocation strategies [27]. However, our algorithm is compatible with other R2S transmission models which do not give transmission rate exactly. This is because the service delay is modeled as a gray box to the learning algorithm (will be discussed later in this section).
IiiB3 Computation delay
let be the available CPU frequency allocated by SeV for task . Then, the computation delay can be simply obtained by . Here, we assume that each SeV reports to RSU in advance. Again, our algorithm is also able to work when is unknown a priori (will be discussed later).
IiiB4 Result return delay
let be the result return delay of replication . It consists of the following parts: 1) the transmission delay between SeV and RSRU , where is the transmission rate. 2) the backhaul transmission delay between SRSU and TRSU , where is the backhaul transmission rate and be the round trip time when sending back the result of replication . If SRUS and TRSU are the same, then ; 3) the delay for transmitting results between RSU and TaV where is the fixed transmission rate operated by RSU for RSUtoVehicle data transmission. The result return delay of replication can be obtained as .
Therefore, the total service delay of replication is . One can immediately see that the delay model for a replication is a “gray box” to the RSU operators: while some parts of service delay are revealed to the RSU (e.g. , , and ), the result return delay is unknown to the RSU due to the uncertainty in vehicle movement and backhaul network condition. If TaV receives the result of replication before the deadline, i.e. , then replication is considered as successfully executed. We define quality of replication as , where is the indicator function. Let be the expected quality of replication . Since , , and are known to the RSU, the expected quality of replication can be written as .
Remark: The service delay “gray box” can be changed to other configurations, e.g., RSUs do not use fixed transmission rate or the SeVs do not report the CPU frequency allocated for a task. In this case, and become unknown, and the expected quality of replication can be written as . Our method is able to work even if the service delay is a “black box”.
IiiC Problem Formulation
For each task , the RSU picks a subset of replications from all available replications for task , and we call the subset the replication decision for task . The reward achieved by the selected replications in is defined as:
(2) 
The term in (2) captures the cost of replication decision , where is the unit cost for one replication. By applying the AtLeastOne probabilistic rule, we can write the expected reward of replication decision as:
(3) 
where the first term in (3) denotes the probability that task is completed by at least one replication in . Consider an arbitrary sequence of tasks that arrive at a RSU. The RSU makes task replication decisions for each task which aims to maximize the expected cumulative rewards:
P1:  (4a)  
s.t.  (4b)  
(4c) 
where the constraint (4c) indicates that the number of replication in should not exceed the budget of task . The problem in P1 can be decouple into independent subproblems, one for each task as follows:
(5) 
The objective in P2 exhibits a property of submodularity: the total reward achieved by the selected replication is not a simple sum of individual qualities but demonstrates a feature of diminishing returns determined by the AtLeastOne rule. The formal definition of submodularity is given below.
Definition 1 (Submodularity).
Let be the universe replication set. For all possible subsets and any replication , if a reward function satisfies Then, is submodular.
We for now assume that there was an omniscient oracle that knows the expected quality of each possible replication . Then P2 becomes a submodular function maximization problem with cardinality constraint, which is a wellstudied topic and can be efficiently solved by the greedy algorithm presented in Algorithm 1. To facilitate the solution presentation, we define the marginal reward of replication :
Definition 2 (Marginal Reward).
Consider a task , let be a subset of replications and let be an available replication. Define the marginal reward of adding replication to as .
The greedy algorithm works in an iterative manner. In each iteration , a replication is selected such that the marginal reward is maximized given . In the general case, the greedy algorithm guarantees no less than of the optimum in only polynomial runtime. However, for our problem in P2, the greedy algorithm actually gives the optimal solution, which is proved in Proposition 1.
Proposition 1 (Optimality of Greedy Algorithm).
For an arbitrary task , the task replication decision to the th subproblem derived by the greedy algorithm is optimal.
Let be the optimal replication decision to the perslot problem for task . Therefore, the optimal solution for P1 is . Since this optimal solution is obtained by an oracle, we call it the oracle solution. Let be the replication decisions derived by a certain algorithm. The performance of this algorithm is evaluated by comparing its loss with respect to the oracle algorithm. This loss is called the regret of the algorithm which is formally defined as
(6) 
which is equivalent to .
In the above, we have discussed the oracle solution to P1 by assuming that the expected quality of replications is known to the RSU. However, in the real VCC application, it is difficult, if not impossible, to know in advance the replication qualities precisely due to the uncertainty in vehicle movement and network conditions. In this case, the replication decisions cannot be easily derived by the greedy algorithm alone. In the next section, we put the task replication problem into a contextualcombinatorial MAB (CCMAB) framework, such that a RSU is able to learn the expected quality of task replications over time by observing the contexts of replications and then make smart replication decisions.
Iv CCMAB for DeadlineAware Task Replication
Whether a replication can be completed depends on many factors which are collectively referred to as context. For example, relevant factors can be task information (e.g. data size of task input and result affects the delay during transmission), vehicle information (e.g. speeds of TaV and SeVs influence vehicle locations when sending back the task results, and therefore determine whether the interRSU transmission is necessary), road conditions (e.g. high vehicle density causes high wireless transmission interference and therefore increases the transmission delay). This categorization is clearly not exhaustive and the impact of each single context on the replication quality is unknown a priori. Our algorithm will learn to discover the underlying connection between such context and replication quality, thereby facilitating the task replication decision making.
Iva Contextaware Task Replication
Let be the context space of tasks which includes the task information (e.g. size of input/result data, deadline, etc.) and TaVs’ vehicle information (e.g. speed, location, and available computational resources). Let be the context space of SeVs’ vehicle information. The RSU sets the joint space as the context space of replications. The context space is assumed to be bounded and hence can be denoted as without loss of generality, where is the number of dimensions of context space . Since the service delay of replication is now parameterized by its context , we write a quality of a replication as and its expected value as . Let collect all the contextspecific replication qualities.
Now, we are ready to formulate the task replication problem as a CCMAB problem. For each task , the RSU operates sequentially as follows: (i) upon receiving the task request, the RSU lists all possible replications and observes the context for each replication . Let collect all replications’ context. (ii) the RSU selects a subset of replications based on the observed context and the knowledge learned from the previous tasks. (iii) the RSU sends the task replications to selected SeVs and then collects results when the task is processed. (iv) The RSU sends the task result back to TaV and observes the quality of selected replications. The observed qualities will be used to update the current knowledge. Yet, notice that the quality task ’s replication may not be immediately observed before the arrival of task due to the transmission/computation delays incurred by VCC, which causes the problem of delayed feedback in CCMAB. For the ease of presentation and explanation, we assume for now that the qualities of replications for task are observed before the arrival of next task , and therefore the feedback of CCMAB is nondelayed. The more practical case of delayedfeedback will be discussed later in this section.
IvB DATEV with Nondelayed Feedback
DATEV (DeadlineAware Task rEplication in Vehicle Cloud) (in Algorithm 2) is developed based on the CCMAB framework. In the initialization phase, DATEV creates a partition on the contexts space , which splits into sets based on the given time horizon . These sets are given by dimensional hypercubes of identical size . Here, is a parameter to be designed to determine the number of hypercubes in . Additionally, the RSU keeps a counter for each hypercube which records the number of selected replications that have context falling in hypercube before receiving task . Fig. 2
offers an illustration of context partition and counter update. Moreover, the algorithm also keeps an estimated quality
for each hypercube. Let be the set of observed qualities of replications with context in . Then, the estimated quality for replications with context is .For each task , DATEV performs the following steps: the contexts of all possible replications are observed. For each context , the algorithm determines a hypercube such that holds. The collection of these hypercubes for task is denoted by . Then the algorithm checks if there exist hypercubes that have not been explored sufficiently often. For this purpose, we define the underexplored hypercubes for task as:
(7) 
where is a deterministic, monotonically increasing control function that needs to be designed by CCMAB. In addition, we collect the replications that fall in the underexplored hypercubes in .
Depending on the underexplored replications for task , DATEV can either be in an exploration phase or an exploitation phase. If is nonempty, DATEV enters an exploration phase. Let be the size of underexplored replications. If the set contains at least replications (), then DATEV randomly selects replications from . If contains fewer than replications (), then DATEV selects all replications from . Since the budget is not fully utilized, the remaining replications are picked using Greedy algorithm (Algorithm 1) with estimated qualities :
(8) 
where and . If is empty, DATEV enters an exploitation phase. It selects up to replications using Greedy algorithm with estimated qualities:
(9) 
After selecting the replications, DATEV observes the qualities realized by selected replications and then updates the estimated quality and the counter of each hypercube in . Note that the task index of replication quality estimation and counter is drop in the pseudocode (Line 15) since previous values of counters and quality estimations do not need to be stored.
It remains to design the parameter and the control policy to achieve a sublinear regret in the time horizon , i.e., , such that DATEV guarantees an asymptotically optimal performance ().
IvC Parameter Design and Regret Analysis
In this section, we design the algorithm parameters and and give a corresponding upper bound for the regret incurred by DATEV. The regret analysis is carried out based on the natural assumption that the expected qualities of arms are similar if they have similar context [21]. This assumption is formalized by the Hölder condition as follows:
Assumption 1 (Hölder Condition).
There exists , such that for any two contexts , it holds that , where is the Euclidean norm.
Assumption 1 is needed for the regret analysis, but it should be noted that DATEV can also be applied if this assumption does not hold. However, a regret bound might not be guaranteed in this case. Now, we set for the context space partition, and in each time slot for identifying the underexplored hypercubes and arms. Then, we have a sublinear regret upper bound of DATEV:
Proposition 2 (Regret Upper Bound).
Let and . If Hölder condition holds true, the regret is bounded by
where is the maximum possible budget for a task. The leading order of the regret is , which is sublinear.
The regret upper bound given in Proposition 2 is sublinear for a sequence of tasks . In addition, the regret bound is valid for any finite task number, thereby providing a bound on the performance loss for any finite . Therefore, this Proposition also can be used to characterize the convergence speed of DATEV.
IvD DATEV with Delayed Feedback
We have evaluated the performance of DAVEV with nondelayed feedback. However, the nondelayed feedback assumption can be easily violated in application since the RSU can observe the qualities of replications only after the task results are returned, yet at this time new task requests may have already arrived. Therefore, in the following, we analyze the performance of DATEV with delayed feedback.
For an arbitrary task , DAVEV has counters that counts the number of selected replications with context in . Since the feedback is delayed, the number of observed qualities may be less than the number of selected replications. Therefore, we introduce a new counter to record the number of observed qualities for the replications with context in hypercube before receiving task . Clearly, we will have . Let be the set of observed qualities, the estimated quality is now .
Now, we compare the performances of DAVEV under nondelayed feedback and delayed feedback cases by analyzing the exploration and exploitation phases separately. We first consider the exploration phase of DATEV in a delayed feedback setting. Whether DATEV will enter the exploration for task is determined by the counters and does not depend on the number of observed qualities. Therefore, the regrets incurred by the exploration in the nondelayed feedback and delayed feedback cases are the same. Next, we consider the exploitation phase of DATEV with delayed feedback. For a task , if its counters satisfy , then DATEV enters the exploitation phase. Due to the delayed feedback, we have two cases for the exploitation: i) the number of observed qualities satisfies . In this case, though there are qualities remaining unobserved, the number of observed qualities is larger than . Therefore, exploiting the estimated qualities guarantees the regret bound as proved in the nondelayed feedback case. ii) the number of observed qualities in satisfies . Since the number of observed qualities is less than , using for task replication cannot guarantee the regret bound in the exploitation. We call the exploitation phase with as misexploitation. To bound the regret of DATEV with delayed feedback, we only need to consider the extra regret in misexploitation.
Proposition 3.
If DATEV is run with the parameters given in Proposition 2, the regret due to misexploitation is
where is the task arrival rate and is the maximum task deadline. The regret of DATEV with delayedfeedback is bounded by , where is the regret upper bound of DATEV with nondelayed feedback.
Proposition 3 shows that the regret of DATEV with delayed feedback is the regret of DATEV with nondelayed feedback plus an additional term which grows with the increase task arrival. Note that this additional term is still sublinear in which mean the regret of DATEV with delayed feedback is still sublinear. Moreover, we see that the leading order of is the same as that of .
V Simulation
Va Simulation Setup
Our simulation uses the mobility trace of San Francisco Yellow Cabs [29]. It records the GPS coordinates of 550 cabs, logged approximately every 45 seconds, over a period of 30 days, in the San Francisco Bay Area. These cab traces are used to simulate the vehicle movement in the VCC system. We focus on an area of coordinate from N to N, W to W. Fig. 3 depicts a portion of all cab traces in this area, which at the same time shows the road layout. We deploy a total of 12 RSU along the main roads. The distance between two neighbor RSUs is set around 200m. The maximum coverage of a RSU is set as 300m such that most vehicles in this area can access at least one RSU. A RSU is randomly selected as an example to run the proposed algorithm.
For simplicity, we assume the tasks from TaVs are of the same type with the input data size Mb, the task result size Mb and the required CPU cycles M. The deadline for each task is randomly chosen from sec. The RSUtoVehicle data transmission operates on fixed transmission rate 3Mbps. The VehicletoRSU transmission rate is determined by the Shannon Capacity where the bandwidth MHz, transmission power of vehicles is 10dBm, noise power dBm. The backhaul transmission rate is chosen from Mbps. The round trip time for backhaul transmission is chosen from ms. We collect speeds and locations of TaVs/SeVs, and the task deadlines as context. Fig. 4 depicts the impact of location and task deadline on the replication quality, which shows that the quality of a replication is very related to its context.
DATEV is compared with the following benchmarks:
1) Oracle: Oracle knows precisely the expected quality of each replication before making task replication decisions. For each task, Oracle selects replications based on expected qualities using the greedy algorithm presented in Algorithm 1.
2) mLinUCB: LinUCB [19] is a contextual bandit algorithm which recommends exactly one arm in each round. To select multiple replications, mLinUCB repeats LinUCB algorithm times in each round. By sequentially removing selected replications, we ensure that the replications returned by mLinUCB are distinct in for each task .
3) UCB: UCB algorithm [17] is a classical MAB algorithm (noncontextual and noncombinatorial) that achieves the logarithmic regret bound. Similar to mLinUCB, we repeat UCB times to select multiple replications for each task.
4) Random: The Random algorithm picks replications randomly from the available replications for each task .
VB Performance Comparison
Fig. 5 shows the cumulative rewards achieved by DATEV and the other 4 benchmarks. As expected, Oracle achieves the highest reward which gives an upper bound to the other algorithms. Among the others, we see that the proposed algorithm significantly outperforms other benchmarks by taking into account the context of tasks and vehicles. It can be observed in the figure that the cumulative reward of DATEV is similar to that of the Random algorithm in the first 2,500 tasks. This is because the RSU does not enough knowledge at the beginning to link the replications’ context and qualities, therefore it randomly explores the available replications, which is exactly the same as the Random algorithm. After a period of exploration, the proposed algorithm is able to exploit the learned knowledge and we see that cumulative reward of the proposed algorithm begins to approach the cumulative reward of Oracle. For the UCB algorithm, its cumulative reward is similar to that of the Random algorithm. The malfunction of UCB is mainly due to the large the arm sets (TaVSeV pairs) and hence the UCB algorithm is stuck in the exploration. Further analyzing the rewards achieve by mLinUCB, we know that considering the context for each possible replication is not effective to produce a good result due to the large arm set. We also show the average reward for each replication in Fig. 6. We see that the average reward for a task replication achieved by Oracle stabilizes at around 0.25 and DATEV increases the average replication reward from 0.14 to 0.23. This means that DATEV can learn contextspecific replication qualities over time and after sufficiently many tasks, it selects replications almost as well as Oracle does.
VC DATEV with Delayed Feedback
Fig. 7 shows the cumulative rewards achieved by DATEV with nondelayed/delayed feedback. In general, we see that delayed feedback does not incur significant performance loss. We also evaluate the effect of task arrival rate in the delayed feedback scenario. The simulation result is consistent with the theoretic analysis in Proposition 3 that a higher task arrival rate leads to a larger regret.
VD Impact of Budget
Fig. 8 depicts the cumulative reward achieved by 5 algorithms under different budgets. We see that the cumulative rewards achieved by DATEV and Oracle grow with the increase in budget since more beneficial replications can be selected to maximize the reward. It is worth noticing that the cumulative rewards become saturated when the budget is larger than 3 since the proposed algorithm considers the submodularity of the reward function and therefore stops smartly when the marginal reward is low. By contrast, UCB, mLinUCB, and Random always utilize the full budget and select replications without considering the submodular reward. Therefore, with a larger budget, these algorithms keep adding replications when marginal rewards become negative, which decreases the cumulative reward.
VE Impact of Task Deadline
Fig. 9 shows the cumulative rewards achieved by Oracle, DATEV, and Random with different task deadlines. We see that the cumulative rewards achieved by all three algorithms grow with the increase in the mean task deadline. The reason for this trend is intuitive: the tasks are more likely to be completed if the deadline is loose. In addition, the gap of cumulative reward between DATEV and Random diminishes as the deadline increases. This is because most of the replications can be completed with loose deadlines and hence the benefit of smart replication provided by DATEV decreases.
Vi Conclusion
In this paper, we investigated the task replication for deadlineconstrained tasks in VCC systems. A RSUassisted task scheduling framework is constructed, and a novel task replication algorithm, called DATEV, is proposed to guarantee the timeliness of task processing. DATEV addresses many key concerns in VCC systems. It uses sideinformation (context) of tasks, vehicles to learn the completion probability of a replication with the uncertainty in vehicle movements. The combinatorial feature of DATEV allows multiple replications to be made for each task such that the completion probability of a task is increased. The DATEV is practical, easy to implement and scalable to large vehicular networks while achieving provably asymptotically optimal performance. Besides the task replication for VCC, our framework can also be applied to many other sequential decision making problems under uncertainty that involve multipleplay given a limited budget and context information.
References
 [1] G. Yan, D. Wen et al., “Security challenges in vehicular cloud computing,” IEEE Trans. on Intelligent Transportation Systems, vol. 14, no. 1, pp. 284–294, 2013.
 [2] S. Arif, S. Olariu et al., “Datacenter at the airport: Reasoning about timedependent parking lot occupancy,” IEEE Trans. on Parallel and Distributed Systems, vol. 23, no. 11, pp. 2067–2080, 2012.
 [3] M. Eltoweissy, S. Olariu et al., “Towards autonomous vehicular clouds,” in Int’l Conf. on Ad Hoc Networks. Springer, 2010, pp. 1–16.
 [4] J. B. Kenney, “Dedicated shortrange communications (dsrc) standards in the united states,” Proc. of the IEEE, vol. 99, no. 7, pp. 1162–1182, 2011.
 [5] S. Chen, J. Hu et al., “Ltev: A tdltebased v2x solution for future vehicular network,” IEEE Internet of Things journal, vol. 3, no. 6, pp. 997–1005, 2016.
 [6] C.X. Wang, F. Haider et al., “Cellular architecture and key technologies for 5g wireless communication networks,” IEEE Communications Magazine, vol. 52, no. 2, pp. 122–130, 2014.
 [7] D. Eckhoff, C. Sommer et al., “Cooperative awareness at low vehicle densities: How parked cars can help see through buildings,” in Global Telecommunications Conf. (GLOBECOM 2011), 2011 IEEE. IEEE, 2011, pp. 1–6.
 [8] N. Liu, M. Liu et al., “Pva in vanets: Stopped cars are not silent,” in INFOCOM, 2011 Proc. IEEE. IEEE, 2011, pp. 431–435.
 [9] K. Kumar and Y.H. Lu, “Cloud computing for mobile users: Can offloading computation save energy?” Computer, vol. 43, no. 4, pp. 51–56, 2010.
 [10] L. Chen, S. Zhou et al., “Energy efficient mobile edge computing in dense cellular networks,” in 2017 IEEE Int’l Conf. on Communications (ICC), May 2017, pp. 1–6.
 [11] K. Zheng, H. Meng et al., “An smdpbased resource allocation in vehicular cloud computing systems,” IEEE Trans. on Industrial Electronics, vol. 62, no. 12, pp. 7920–7928, 2015.
 [12] Z. Jiang, S. Zhou et al., “Task replication for deadlineconstrained vehicular cloud computing: Optimal policy, performance analysis, and implications on road traffic,” IEEE Internet of Things J., vol. 5, no. 1, pp. 93–107, 2018.
 [13] Y. Sun, X. Guo et al., “Learningbased task offloading for vehicular cloud computing systems,” arXiv preprint arXiv:1804.00785, 2018.
 [14] N. B. Shah, K. Lee et al., “When do redundant requests reduce latency?” IEEE Trans. on Communications, vol. 64, no. 2, pp. 715–722, 2016.
 [15] D. Wang, G. Joshi et al., “Using straggler replication to reduce latency in largescale parallel computing,” ACM SIGMETRICS Performance Evaluation Review, vol. 43, no. 3, pp. 7–11, 2015.
 [16] T. L. Lai and H. Robbins, “Asymptotically efficient adaptive allocation rules,” Advances in applied mathematics, vol. 6, no. 1, pp. 4–22, 1985.
 [17] P. Auer, N. CesaBianchi et al., “Finitetime analysis of the multiarmed bandit problem,” Machine learning, vol. 47, no. 23, pp. 235–256, 2002.
 [18] Y. Gai, B. Krishnamachari et al., “Combinatorial network optimization with unknown variables: Multiarmed bandits with linear rewards and individual observations,” IEEE/ACM Trans. on Networking (TON), vol. 20, no. 5, pp. 1466–1478, 2012.
 [19] L. Li, W. Chu et al., “A contextualbandit approach to personalized news article recommendation,” in Proc. of the 19th Int’l conference on World wide web. ACM, 2010, pp. 661–670.
 [20] S. Li, B. Wang et al., “Contextual combinatorial cascading bandits,” in Int’l Conf. on Machine Learning, 2016, pp. 1245–1253.
 [21] S. Müller, O. Atan et al., “Contextaware proactive content caching with service differentiation in wireless networks,” IEEE Trans. on Wireless Communications, vol. 16, no. 2, pp. 1024–1036, 2017.
 [22] L. Qin, S. Chen et al., “Contextual combinatorial bandit and its application on diversified online recommendation,” in Proc. of the 2014 SIAM Int’l Conf. on Data Mining. SIAM, 2014, pp. 461–469.
 [23] P. Joulani, A. Gyorgy et al., “Online learning under delayed feedback,” in Int’l Conf. on Machine Learning, 2013, pp. 1453–1461.
 [24] T. Desautels, A. Krause et al., “Parallelizing explorationexploitation tradeoffs in gaussian process bandit optimization,” The J. of Machine Learning Research, vol. 15, no. 1, pp. 3873–3923, 2014.
 [25] Y. Wang, J. Zheng et al., “Delivery delay analysis for roadside unit deployment in vehicular ad hoc networks with intermittent connectivity,” IEEE Trans. on Vehicular Technology, vol. 65, no. 10, pp. 8591–8602, 2016.

[26]
M. Andrews and M. Dinitz, “Maximizing capacity in arbitrary wireless networks in the sinr model: Complexity and game theory,” in
INFOCOM 2009, IEEE. IEEE, 2009, pp. 1332–1340.  [27] L. Liang, H. Peng et al., “Vehicular communications: A physical layer perspective,” IEEE Trans. on Vehicular Technology, vol. 66, no. 12, pp. 10 647–10 659, 2017.
 [28] Online appendix. [Online]. Available: https://www.dropbox.com/sh/lcwym6te48vv0f1/AADR1pY4ZKb3HGd2tInMeA73a?dl=0
 [29] M. Piorkowski, N. SarafijanovicDjukic et al., “CRAWDAD dataset epfl/mobility (v. 20090224),” Downloaded from https://crawdad.org/epfl/mobility/20090224, Feb. 2009.

[30]
W. Hoeffding, “Probability inequalities for sums of bounded random variables,”
J. of the American statistical association, vol. 58, no. 301, pp. 13–30, 1963.
Appendix A Proof of Proposition 1
Proof.
The optimality of greedy algorithm is due to the unique property of submodular reward function (3) indicated in the following Lemma.
Lemma 1.
For any two replication , if holds true, we have .
Proof.
This property can be easily verified by definition of reward function and marginal reward. ∎
We now consider a special case where the number of replications to select is fixed in advance, i.e. where is a constant. In this case, is a constant and the solution to problem P2 is to select replications that have the highest expected quality:
(10) 
In addition, the SeVs selected by the greedy algorithm can be rewritten as:
(11) 
From Lemma 1, we know that the two sequences and are identical. Next, we need the determine the number of replications to maximize the reward. Note that the reward function can be written as a sum of marginal rewards: . Moreover, by following the algorithm design, we will have . Therefore, to maximize the reward, the greedy algorithm should stop at the th iteration if . We now can conclude that the greedy algorithm is able to achieve the optimal solution for the problem P2. ∎
Appendix B Proof of Proposition 2
Before proceeding, we first define some auxiliary variables. For each hypercube , we define and be the best and worst expected quality over all contexts . In some steps of the proofs, we need to compare the qualities at different positions in a hypercube. As a point of reference, we define the context at (geometrical) center of a hypercube as and its expected quality . Given the replication set , context set , and the hypercube set for each task , let
In addition, for a task , we define a replication set satisfying:
(12) 
The replication set is used to identify subsets of replications which are bad to select. Let
(13) 
be the set of suboptimal subsets of arms for hypercube set , where and are parameters used only in the regret analysis. We call a subset of replication in is suboptimal for , since the sum of the worst expected reward in is at least an amount higher than the sum of the best expected reward for subset . We call subsets in nearoptimal for . Here, denotes the set of all element subsets with element number less than . Then, the expected regret can be divided into the following three summands:
(14) 
where the term is the regret due to exploration phases and the terms and both correspond to regret in exploitation phases: is the regret due to suboptimal choices, i.e., the subsets of replications from are selected; is the regret due to nearoptimal choices, i.e., the subsets of replications from are selected. In the following, we prove that each of the three summands is bounded.
We first give a bound of , which depends on the choice of two parameters and .
Lemma 2 (Bound for ).
Let and , where and . If the algorithm is run with these parameters, the regret is bounded by
(15) 
Proof of Lemma 2.
Suppose the algorithm enters the exploration phase for task and let be the hypercubes of currently available replications. Then, based on the design of DATEV, the set of underexplored hypercubes is nonempty, i.e., there exists at least one replication with context , such that a hypercube satisfying has . Clearly, there can be at most exploration phases in which replications in are selected due to underexploration of . Since there are hypercubes in the partition, there can be at most exploration phases. Notice that the maximum achievable reward of a replication decision is bounded by and the minimum achievable reward is , where is the maximum possible budget for a task. The maximum regret of wrong replication selection in one exploration phase is bounded by . Therefore, we have
Using , it holds
(16) 
∎
Next, we give a bound for . This bound also depends on the choice of two parameters and . Additionally, a condition on these parameters has to be satisfied.
Lemma 3 (Bound for ).
Let and , where and . If the algorithm is run with these parameters, Assumption 1 holds true and the additional condition is satisfied for all where , the regret is bounded by
(17) 
Proof of Lemma 3.
For , let be the even that slot is an exploitation phase. By the definition of , in this case, it holds that . Let be the event that subset is selected at for task . Then, it holds that
Comments
There are no comments yet.