I Introduction
By deploying computing resources at the edge of the network, mobile edge computing (MEC) can provide lowlatency, highreliability computing services for mobile devices [2, 3]. A major problem in MEC is how to perform task offloading, i.e., whether or not to offload each task, and how to manage radio and computing resources to execute tasks, which has been widely investigated recently, see surveys [4, 5, 6] and technical papers [7, 8, 9].
To support autonomous driving and a vast variety of onboard infotainment services, vehicles are equipped with substantial computing and storage resources. It is forecast that each selfdriving car will have computing power of dhrystone million instructions executed per second (DMIPS) in the near future[10], which is tens of times that of the current laptops. Vehicles and infrastructures like road side units (RSUs) can contribute their computing resources to the network. This forms the Vehicular Edge Computing (VEC) system [11, 12, 13], that can process computation tasks from vehicular driving systems, onboard mobile devices and pedestrians for various applications.
In this paper, we focus on the task offloading among vehicles, i.e., the driving systems or passengers of some vehicles generate computation tasks, while some other surrounding vehicles can provide computing services. We call the vehicles that require task offloading task vehicles (TaVs), and vehicles who can help to execute tasks service vehicles (SeVs). We design a distributed task offloading algorithm to minimize the average delay, where the task offloading decision is made by each TaV individually.
Multiple SeVs might be available to process each task, and a key challenge is the lack of accurate state information of SeVs in the dynamic VEC environment. The network topology and the wireless channel states vary rapidly due to the movements of vehicles [14], and the computation workloads of SeVs fluctuate across time. These factors are difficult to model or to predict, so that the TaV has no idea in prior which SeV performs the best in terms of delay performance.
Our solution is learning while offloading, i.e., the TaV is able to learn the delay performance while offloading tasks. To be specific, we adopt the multiarmed bandit (MAB) framework to design our task offloading algorithm [15]
. The classical MAB problem aims at balancing the exploration and exploitation tradeoff in the learning process: to explore different candidate actions that lead to good estimates of their reward distributions, while to exploit the learned information to select the empirically optimal actions. The upper confidence bound (UCB) based algorithms, such as UCB1 and UCB2, have been proposed with strong performance guarantee
[15], and applied to the wireless networks to learn the unknown environments [16, 17, 18].However, in our task offloading problem, the movements of vehicles lead to a dynamic candidate SeV set, and the workload of each task is timevarying, leading to a varying cost in exploring the suboptimal actions. These factors have not been addressed by existing MAB schemes, which motivates us to specifically adapt the MAB framework in the vehicular task offloading scenario. Our key contributions include:
1) We propose an adaptive learningbased task offloading (ALTO) algorithm based on MAB theory, in order to guide the task offloading of TaVs and minimize the average offloading delay. ALTO algorithm works in a distributed manner and enables the TaV to learn the delay performance of candidate SeVs while offloading tasks. The proposed algorithm is of low computational complexity, and does not require the exchange of accurate state information like channel states and computing workloads between vehicles, so that it is easy to implement in the real VEC system.
2) Two kinds of adaptivity are augmented with the proposed ALTO algorithm: inputawareness and occurrenceawareness, by adjusting the exploration weight according to the workloads of tasks and the appearance time of SeVs. Different from our previous theoretical work [19] which only considers timevarying workloads of tasks with fixed actions, we consider a more general case with dynamic candidate SeVs (actions), and prove that ALTO can effectively balance the exploration and exploitation in the dynamic vehicular environment with sublinear learning regret.
3) Extensive simulations are carried out under a synthetic scenario, as well as a realistic highway scenario using system level simulator Veins. Results illustrate that our proposed algorithm can achieve low delay performance, and provide guidelines for the settings of key design parameters.
The rest of this paper is organized as follows. We introduce the related work in Section II. The system model and problem formulation is introduced in Section III, and the ALTO algorithm is then proposed in Section IV. The learning regret is analyzed in Section V. Simulation results are then provided in Section VI, and finally comes the conclusions in Section VII.
Ii Related Work
Iia VEC Architecture and Use Cases
An illustration of the VEC architecture is shown in Fig. 1. The development of vehicletoeverything (V2X) communication techniques enable vehicletovehicle (V2V), vehicletoinfrastructure (V2I) and vehicletopedestrian (V2P) communications, so that tasks can be offloaded to other vehicles through different kinds of routes. Specifically, there are three major offloading modes:

VehicleVehicle (VV) Offloading: Vehicles directly offload tasks to their surrounding vehicles with surplus computing resources in a distributed manner. In this case, each individual vehicle may not be able to acquire the global state information for task offloading decisions, and there might be no coordinations for task scheduling.

Pedestrian/VehicleInfrastructureVehicle (P/VIV) Offloading: When there are no other neighboring vehicles for task offloading, one solution is that tasks are first offloaded to the infrastructures alongside, and then assigned to other vehicles in a centralized manner.

Pedestrian/VehicleInfrastructure (P/VI) Offloading: In this mode, tasks are offloaded to the infrastructures for direct processing.
Similar to the traditional cloud computing services, the VEC system can provide infrastructure as a service (IaaS), platform as a service (PaaS) and software as a service (SaaS) [13], and support a wide variety of applications. For example, cooperative collision avoidance and collective environment perception are necessary for safety driving, where sensing data is generated by a group of vehicles and processed by some of them[20, 21]. In vehicular crowd sensing, the video recordings and images are generated by vehicles and required to be analyzed in real time, in order to supervise the traffic, monitor the road conditions and navigate car parkings [22]. The computing resources of vehicles may be underutilized by the aforementioned vehicular applications [11], which can further provide services for entertainments and multimedia applications, such as cloud gaming, virtual reality, augmented reality and video transcoding [23].
IiB Task Offloading Algorithms
There are some existing efforts investigating the task scheduling and computing resource management problem in VEC. A softwaredefined VEC architecture is proposed in [13]. Inspired by the softwaredefined network, a centralized controller is designed to periodically collect the state information of vehicles, including mobility and resource occupation, and manage radio and computing resources upon task requests. In terms of P/VIV offloading, a semiMarkov decision based centralized task assignment problem is formulated in [24], in order to minimize the average system cost by jointly considering the delay of tasks and the energy consumption of mobile devices. Ref. [25] further introduces task replication technique to improve the service reliability of VEC, where task replicas can be offloaded to multiple vehicles to be processed simultaneously. However, a key drawback of the centralized framework is that, it requires frequent state information update to optimize the system performance, which is of high signaling overhead.
An alternative method is to make task offloading decisions by the task generators in a distributed manner. An autonomous vehicular edge framework which enables VV and VI offloading is proposed in [23], followed by a task scheduling algorithm based on ant colony optimization. However, when the number of vehicles is large, the computational complexity can be quite high. We will design a distributed task offloading algorithm with low complexity.
Iii System Model and Problem Formulation
Iiia VV Offloading: System Overview
We consider VV offloading in the VEC system, where vehicles involved in the task offloading are classified into two categories:
TaVs are the vehicles that generate and offload computation tasks for cloud execution, while SeVs are the vehicles with sufficient computing resources that can provide computing services. Note that the role of each vehicle depends on the sufficiency of its computing resources, and is not fixed to TaV or SeV during the trip.TaVs can offload tasks to their neighboring SeVs. Each TaV may have multiple candidate SeVs that can process the tasks, and each task is offloaded to a single SeV and executed by it. As shown in Fig. 1, for TaV 1, there are 3 candidate SeVs (SeV 13), and currently the task is offloaded to SeV 3.
In this work, we design distributed task offloading algorithm to minimize the delay performance, by letting each TaV decide which SeV should serve each task independently, without interTaV cooperations. Moreover, we do not make any assumptions on the service disciplines of SeVs, nor the mobility models of vehicles.
IiiB Task Offloading Procedure
Since offloading decisions are made in a distributed manner, we then focus on a single TaV of interest and model the task offloading problem. Consider a discretetime VEC system. There are four procedures for task offloading within each time period:
SeV discovery: The TaV discovers neighboring SeVs within its communication range, and selects those in the same moving direction as candidates. Here the driving states of each vehicle, including speed, location and moving direction, can be acquired by other neighboring vehicles through vehicular communication protocols. For example, in dedicated shortrange communication (DSRC) standard [26], the periodic beaconing messages can provide these state information. Denote the candidate SeV set in time period by , which may change across time since vehicles are moving. And due to the unknown mobility model, candidate SeVs in the future are unknown in prior. Besides, assume that for , otherwise the TaV can seek help from RSUs along the road, which is beyond the scope of this paper.
Task upload: After updating the candidate SeV set at the beginning of each time period, the TaV selects one SeV and uploads the computation task. Denote the input data size of the task generated in time period by (in bits), which is required to be transmitted from TaV to SeV. The uplink wireless channel state between TaV and SeV is denoted by , and the interference power at SeV is . We assume that the wireless channel state remains static during the uploading process of each computation task. Given the fixed transmission power , channel bandwidth and noise power , the uplink transmission rate between the TaV and SeV is
(1) 
And the transmission delay of uploading the task to SeV in time period is given by
(2) 
Task execution: The selected SeV processes the task after receiving the input data from the TaV. For the task generated in time period , the total workload is given by , where is computation intensity (in CPU cycles per bit) representing how many CPU cycles are required to process one bit input data [4]. The computation intensity of the task mainly depends on the nature of applications.
The computing capability of SeV is described by its maximum CPU frequency (in CPU cycles per bit), and the allocated CPU frequency to the task of TaV in time period is denoted by . The SeV may deal with multiple computation tasks simultaneously, and adopt dynamic frequency and voltage scaling (DVFS) technique to dynamically adjust the CPU frequency [27], and thus we have . We assume that remains static during each time period , and each computation task can be completed within each time period due to the timely requirements. Tasks of larger workloads can be further partitioned into multiple subtasks [18, 28], so that each subtask is offloaded to and processed by a SeV within one time period. Then the computation delay can be written as
(3) 
Result feedback: Upon the completion of task execution, the selected SeV transmits back the result to the TaV. Let denote the downlink wireless channel state, which is assumed to be static during the transmission of each result. The interference at the TaV is denoted by . Similar to (2), the downlink transmission rate from SeV to TaV can be written as
(4) 
The data volume of the computation result in time period is denoted by (in bits), and thus the downlink transmission delay from SeV to the TaV is
(5) 
Then the sum delay of offloading the task to SeV in time period can be given by
(6) 
IiiC Problem Formulation
Consider a total number of time periods. Our objective is to minimize the average offloading delay, by guiding the task offloading decisions of the TaV on which SeV should serve each task. The task offloading problem is formulated as
(7) 
where is the optimization variable, which represents the index of SeV selected in time period , with .
Availability of state information: The state information related to the delay performance can be classified into two categories based on its ownership: parameters of each task, including the input and output data volumes , and computation intensity , are known by the TaV upon the generation of each task. The uplink and downlink transmission rates , and the allocated CPU frequency are closely related to the SeV. If all these states are exactly known by the TaV before offloading each task, the sum delay of SeV can then be calculated, and the optimization problem P1 is easy to solve with
(8) 
However, due to the mobility of vehicles, the transmission rates vary fast across and are difficult to predict. Since there is no cooperation between TaVs, the computation loads at SeVs dynamically change, making the allocated CPU frequency vary across time. Moreover, exchanging these state information between the TaV and all candidate SeVs causes high signaling overhead. Therefore, the TaV may lack the state information of SeVs, and can not realize which SeV provides the lowest delay when making offloading decisions.
Learning while offloading: To overcome the unavailability of the state information of SeVs, we propose the approach learning while offloading: the TaV can observe and learn the delay performance of candidate SeVs while offloading computation tasks. Specifically, the SeV in time period is selected according to the historical delay observations , without acquiring the exact transmission rates and CPU frequency. We aim to design a learning algorithm that minimizes the expectation of offloading delay, written as
(9) 
In the rest of the paper, we consider a simplified version of P2 by assuming that the input data size of task is timevarying, but the computation intensity and the ratio of output and input data volume remains constant across time. In practical, this is a valid assumption when tasks are generated by the same type of application. Let and for . Then the sum delay of offloading the task to SeV in time period can be transformed as
(10) 
Define the bit offloading delay as
(11) 
which represents the sum delay of offloading one bit input data of the task to SeV in time period . The bit offloading delay reflects the service capability of each candidate SeV, which is what the TaV needs to learn.
Finally, the optimization problem can be written as
(12) 
Iv Adaptive LearningBased Task Offloading Algorithm
In this section, we develop a learningbased task offloading algorithm based on MAB, which enables the TaV to learn the delay performance of candidate SeVs and minimizes the expected offloading delay.
Our task offloading problem P3 requires online sequential decision making, which can be solved according to the MAB theory. Each SeV corresponds to an arm whose loss (bit offloading delay) is governed by an unknown distribution. The TaV is the decision maker who tries an arm at a time and learns the estimation of its loss, in order to minimize the expectation of cumulative loss across time. However, the variations of input data size and candidate SeV set incapacitate existing algorithms of MAB, such as UCB1 and UCB2, in the VEC system.
In this work, we propose an Adaptive Learningbased Task Offloading (ALTO) algorithm which is aware of both the input data size of tasks and the occurrence of vehicles, as shown in Algorithm 1. Parameter is a constant weight, and records the number of tasks that have been offloaded to SeV up till time . The occurrence time of SeV is recorded by , and the input data size is normalized to be within as:
(13) 
where and are the upper and lower thresholds to normalize . In particular, if , when , and when .
In Algorithm 1, Lines 35 are the initialization phase, which is called whenever new SeVs occur as candidates. The TaV selects the newly appeared SeV once and offloads the task, in order to get an initial estimation of its bit offloading delay.
Lines 712 are the main loop of the learning process, inspired by the volatile UCB (VUCB) algorithm [29] and the our previous work on opportunistic MAB [19]. During each time period, the TaV gets the data volume before offloading the task and calculates . The utility function defined in (14) is used to evaluate the service capability of each SeV, which consists of the empirical bit offloading delay
and a padding function. Specifically,
is the average bit offloading delay of SeV observed until time period . And the padding function jointly considers the input data size and occurrence time of each SeV, in order to balance the exploration and exploitation in the learning process, and adapt to the dynamic VEC environment. The offloading decision is then made according to (15), by selecting the SeV with minimum utility. Finally, the offloading delay is observed upon result feedback, and and is updated.(14) 
(15) 
Two kinds of adaptivity of the algorithm are highlighted as follows.
Inputawareness: The input data size can be regarded as a weight factor on the offloading delay. Intuitively, when is small, even if the TaV selects a poorly performed SeV, the sum offloading delay will not be too large. On the other hand, when is large, selecting a SeV with weak service capability brings great delay degradation. Therefore, the padding function is proportional to that is nonincreasing as grows, so that ALTO explores more when is small, while exploits more when is large.
Occurrenceawareness: The random presences of SeVs are also considered, and the proposed ALTO algorithm has occurrenceawareness. To be specific, for any newly appeared SeV, is large due to the small number of selections , so that ALTO tends to explore more. Meanwhile, ALTO is able to exploit the learned information of any existing SeV, since more times of connections lead to a small value of the padding function.
Iva Complexity
In our proposed ALTO algorithm, the computational complexity of calculating the utility functions of all candidate SeVs in Line 8 is , where is the number of candidate SeVs in time period . The task offloading decision made in Line 9 is a minimum seeking problem, with complexity . Updating the empirical bit offloading delay and offloaded times has a complexity of . Therefore, within each time period, the total computational complexity of running ALTO to offload one task is . Assume that there are totally tasks required to be offloaded in the VEC system. Since TaVs offload tasks independently, the total amount of computation is .
An ant colony optimization based distributed task offloading algorithm is proposed in [23]. According to Section V.D, the computational complexity is , where is the number of iterations required by the ant colony optimization. Therefore, ALTO is of lower complexity than the existing algorithm in [23].
IvB Signaling Overhead
Considering the distributed VV offloading case, the completestate task offloading (CSTO) policy is that, the TaV obtains the accurate state information of all candidate SeVs, evaluates their delay performance, and selects the SeV with minimum offloading delay. Compared with the CSTO policy, our proposed ALTO algorithm is of lower signaling overhead and much easier to implement in the real VEC system.
First, the uplink and downlink wireless channel states, allocated CPU frequency and interference of each candidate SeV are not required to know by the ALTO algorithm. Therefore, for each TaV, offloading a task can save at least signaling messages for the state information of the candidate SeVs, and signaling messages can be saved for tasks. Second, when a SeV is serving multiple TaVs simultaneously, the CSTO policy needs to know the task workload of TaVs to allocate computing resources of the SeV. In this case, more signaling messages are generated by the CSTO policy. Last but not least, frequent signaling exchange may lead to additional collisions and retransmissions, and the delayed state information may not be accurate. The proposed ALTO algorithm enables each TaV to learn the state information of SeVs instead of obtaining them from signaling messages, and thus reduces the signaling overhead.
V Performance Analysis
In this section, we characterize the delay performance of the proposed ALTO algorithm. We adopt the learning regret of delay as the performance criteria, which is widely used in the MAB theory. Compared with the existing UCB based algorithms in [15], two major modifications in ALTO are the occurrence time and normalized input . We first evaluate their impacts on the learning regret separately, and then jointly analyze these two factors.
Va Definition of Learning Regret
Define an epoch
as the interval during which candidate SeVs remain identical. The total number of epochs during the considered
time periods is denoted by , and let be the candidate SeV set of the th epoch, where . Let and be the start and end time of the th epoch, with and .For theoretical analysis, we assume that for each SeV , its bit offloading delay is i.i.d. over time and independent of others. We will show in Section VI through simulation results that without this assumption, ALTO still works well.
Define the mean bit offloading delay of each candidate SeV as . During each epoch, let be the optimal bit offloading delay, and the index of the optimal SeV. Note that and are unknown in prior.
The learning regret represents the expected cumulative performance loss of sum offloading delay brought by the learning process, which is compared with the genieaided optimal policy where the TaV always selects the SeV with maximum service capability. The learning regret by time period can be written as
(16) 
In the following subsections, we will characterize the upper regret bound of ALTO algorithm.
VB Regret Analysis under Dynamic SeV Set and Identical Input
We first assume that the input data size is not timevarying, and analyze the learning regret under varying SeV set. Let for , and , then . The utility function (14) becomes
(17) 
and the learning regret
(18) 
Also, define the maximum bit offloading delay during the time periods as , the performance difference between any suboptimal SeV and the optimal SeV in the th epoch . Let , where is a constant.
The learning regret within each epoch is upper bounded in Lemma 1.
Lemma 1.
Let , the learning regret of ALTO with dynamic SeV set and identical input data size has an upper bound in each epoch. Specifically, in the th epoch:
(19) 
Proof.
See Appendix A. ∎
Then we have the following Theorem 1 that provides the upper bound of the learning regret over time periods.
Theorem 1.
Let . For a given time horizon , the total learning regret of ALTO dynamic SeV set and identical input data size has an upper bound as follows:
(20) 
Proof.
See Appendix B. ∎
Theorem 1 implies that, our proposed ALTO algorithm provides a sublinear learning regret compared to the genieaided optimal policy. To be specific, within each epoch, the learning regret is governed by , and inversely proportional to the performance difference of optimal SeV and suboptimal SeV . Moreover, for any finite time horizon with epochs, ALTO achieves learning regret.
Remark 1.
The random appearance and disappearance of SeVs affect the number of epochs and the learning regret . Within a fixed number of time periods, higher randomness of SeVs results in a more dynamic environment, and thus higher learning regret.
VC Regret Analysis under Varying Input and Fixed Candidate SeVs
We then characterize the upper bound of the learning regret within a single epoch, and consider that the input data size is random and continuous. Let . The optimal SeV is , and its mean bit offloading delay . The learning regret can be simplified as
(21) 
The following theorem bounds the learning regret under varying input data size and fixed candidate SeV set.
Theorem 2.
Let , and . For any finite time horizon , we have:
(1) When , the expected number of tasks offloaded to any SeV can be bounded as
(22) 
(2) With , the learning regret can be bounded as
(23) 
where is the expectation of on the condition that , , and .
Proof.
See Appendix C. ∎
According to Theorem 2, the time order of the learning regret is , indicating that under timevarying input data volume, the TaV is still able to learn which SeV performs the best, and achieves a sublinear deviation compared to the genieaided optimal policy.
Recall that compared to the existing UCB based algorithms, the major modification under varying input is the introduction of normalized input , which dynamically adjusts the weight of exploration and exploitation. As shown in (23), the consideration of brings an coefficient to the learning regret. When the input data size is fixed to , the coefficient of the learning regret of conventional UCB algorithms is . Therefore, by properly selecting the lower threshold , we have . This implies that the proposed ALTO algorithm can take the opportunity to explore when is small, and achieve lower learning regret.
Moreover, when the task offloading scenario is simplified to the case with fixed candidate SeVs and identical input data size, the proposed ALTO algorithm reduces to a conventional UCB algorithm, and the lower bound of the learning regret has been investigated in [30, 31, 32], which is provided in Appendix D. Specifically, the regret lower bound of conventional UCB algorithms is , where
is the KullbackLeibler divergence of the bit offloading delay distributions. Therefore, in the case with varying input, the regret upper bound of ALTO is even possible to be smaller than the lower bound of conventional UCB algorithms, due to the inputawareness.
VD Joint Consideration of Occurrenceawareness and Inputawareness
Finally, we analyze the learning regret by jointly considering the occurrence of vehicles and the variations of input data size. Although these two factors are independent with each other, they actually couple together in the utility function (14), and collectively balance the exploration and exploitation in the learning process. Therefore, it is quite difficult to derive the upper bound of the learning regret in this case.
We study a special case with periodic input and fixed bit offloading delay, and derive the theoretical upper bound to provide some insights. To be specific, assume that the input data size when is even, and when
is odd, where
. Let , and , thus when , and when Consider two SeVs appear at and respectively, and . Then there are epochs during time periods, and we only need to focus on the second epoch, since the first epoch only has one SeV available. The bit offloading delay of each SeV is fixed, with for , but unknown in prior. Without loss of generality, let , and .The learning regret can be written as
(24) 
where represents how many times SeV 2 is selected in the second epoch.
The upper bound for learning regret of ALTO algorithm under periodic input and fixed bit offloading delay is given in the following theorem.
Theorem 3.
Let . With periodic input data size and fixed bit offloading delay, we have:
(25) 
Proof.
See Appendix E. ∎
The learning regret in (25) indicates that, when jointly considering the timevarying feature of input data size and candidate SeV set, the proposed ALTO algorithm still achieves regret, and focuses on the exploration only when the input is low ().
Conjecture 1.
The proposed ALTO algorithm with random continuous input data size and dynamic SeV set achieves learning regret.
The conjecture follows the insight that, when the candidate SeV set is identical over time, the learning regret can be derived in a general case with random continuous input and random bit offloading delay, as shown in (23). When the occurrence time of each SeV is different, within single epoch, the learning regret in (25) resembles (23), both governed by the time order . Following the similar generalization method in [19], we may draw a similar conclusion that with random continuous input data size and dynamic SeV set, the learning regret within an epoch is , and the total learning regret is .
Vi Simulations
To evaluate the average delay performance and learning regret of the proposed ALTO algorithm, we carry out simulations in this section. We start from a synthetic scenario to evaluate the impact of key parameters, and then simulate a realistic highway scenario using system level simulator Veins^{1}^{1}1http://veins.car2x.org/ (VEhicles in Network Simulations) to further verify the proposed ALTO algorithm.
Via Simulation under Synthetic Scenario
We carry out simulations in the synthetic scenario using MATLAB. Consider one TaV of interest, with 8 SeVs that appear as candidates during time periods. The communication range is set to . The distance of the TaV and each candidate SeV ranges within , and changes randomly from to in each time period. The occurrence and disappearance time of SeVs, as well as their maximum CPU frequency are shown in Table I. There are 3 epochs, and each lasts time periods. In the first epoch, there are 5 candidate SeVs. At the beginning of the second epoch, a less powerful SeV 5 disappears and SeVs 6 and 7 with higher computing capability appear. At the beginning of the third epoch, SeVs 1 and 6 disappear, while SeV 8 with suboptimal computing capability arrives. Note that the occurrence and disappearance time of SeVs are unknown to the TaV in prior.
Index of SeV  1  2  3  4  5  6  7  8 

(GHz)  3.5  4.5  5  5.5  3  6.5  6  4 
Epoch 1 (time 11000)  –  –  –  
Epoch 2 (time 10012000)  –  
Epoch 3 (time 20013000) 
The input data size
follows uniform distribution within
. The computation intensity is set to , and the upper and lower thresholds are selected such that and . Recall that for each SeV, the allocated CPU frequency to the TaV is a fraction of the maximum CPU frequency, which is randomly distributed from to . The wireless channel state is modeled by an inverse power law , with , and is the distance between TaV and SeV [33]. Other default parameters include: transmission power , channel bandwidth , noise power , and weight factor .In Fig. 2, the proposed ALTO algorithm is compared with three existing learning algorithms under the MAB framework. 1) UCB is proposed in [15], which is neither inputaware nor occurrenceaware, with padding function . 2) VUCB is aware of the occurrence of SeVs, with padding function [29]. 3) AdaUCB is inputaware, with padding function [19]. Note that in the first epoch, VUCB is equivalent to UCB, and AdaUCB is equivalent to ALTO. Besides, in the Optimal genieaided policy, the TaV always connects to the SeV with minimum expected delay, which is the delay lower bound of the learning algorithm.
The comparison of learning regret is shown in Fig. 2(a), which provides two major observations as follows. First, the proposed ALTO algorithm performs the best among the four learning algorithms. To be specific, both VUCB and AdaUCB achieve lower learning regret compared with UCB algorithm, which means that either inputawareness or occurrenceawareness brings adaptivity to the dynamic VEC environment and reduces loss of delay performance through learning. The joint consideration of these two factors further optimizes the explorationexploitation tradeoff, and decreases the learning regret by , and from that of UCB, VUCB and AdaUCB respectively. Second, the learning regret of ALTO grows sublinearly with time , indicating that the TaV can asymptotically converge to the SeV with optimal delay performance. As shown in Fig. 2(b), during each epoch, the average delay of ALTO converges faster to the optimal delay than other learning algorithms, and achieves closetooptimal delay performance.
We then consider a single epoch and set SeVs 27 in Table I as candidates for time periods. Fig. 3 evaluates the impact of weight factor on the learning regret. When , there is no exploration in the learning process, and the learning regret is drastically worse than those of , since ALTO may stick to a suboptimal SeV for a long time. When , the learning regret grows up slightly as increases. Although the existing effort shows that the sublinear learning regret is achieved when [31], in our simulation, the learning regret is lower when . The reason may be that only a small number of explorations can help the TaV to find the optimal SeV under our settings.
Finally, we try different pairs of upper and lower thresholds for normalizing the input data size, and evaluate the effect on the learning regret. Define and
, as the probability that the input data size is higher (or lower) than the upper (or lower) threshold. Two kinds of thresholds are selected: 1)
, indicating that and explorations happen only when . 2) , where explorations also happen when the input data size is between and . As shown in Fig. 4, the proposed ALTO algorithm always outperforms UCB algorithm. Moreover, the learning regret under is lower than the case when , and achieves the lowest when under our settings, which we set as default.ViB Simulation under Realistic Highway Scenario
In this subsection, simulations are further carried out using system level simulator Veins, in order to evaluate the average delay of ALTO under a realistic highway scenario.
The simulation platform Veins integrates a traffic simulator Simulation of Urban MObility (SUMO)^{2}^{2}2http://www.sumo.dlr.de/userdoc/SUMO.html and a network simulator OMNeT++^{3}^{3}3https://www.omnetpp.org/documentation, and enables to use real maps from Open Street Map (OSM)^{4}^{4}4http://www.openstreetmap.org/. Vehicular communication protocols including IEEE 802.11p for PHY layer and IEEE 1609.4 for MAC layer are supported by Veins, together with a tworay interference model [34] which captures the feature of vehicular channel better.
A segment of G6 Highway in Beijing is downloaded from OSM and used in our simulation, with two lanes and two ramps, as shown in Fig. 5. The maximum speed of TaVs and SeVs is set to
. The TaV moves from A to D, and SeVs have three routes: A to D, A to C and B to D. The arrival of SeVs is modeled by Bernoulli distribution, with probability
, and ranging from to (e.g., is the probability of the generation of a SeV which departs at A and leaves the road from C at each second). Besides the aforementioned UCB, VUCB and AdaUCB algorithms, we also adopt a naive Random policy as a baseline, where the TaV randomly selects a SeV for task offloading in each time period.Fig. 6 shows the average delay performance with a single TaV, which means the density of SeV is much higher than that of TaV. And in Fig. 7, we consider 10 TaVs that depart every 10 seconds. In this case, each TaV is within some other TaVs’ communication range, and thus they might compete for bandwidth and computing resources. We make three major observations as follows. First, the proposed ALTO algorithm always outperforms the other learning algorithms and the random policy, illustrating that ALTO can adapt to the vehicular environment better. To be specific, compared with the UCB algorithm, when , ALTO can reduce the average delay by about under single TaV case (Fig. 6(a)), and under multiTaV scenario (Fig. 7(a)). Second, the average delay grows up when the density of TaV becomes high, since each SeV may serve multiple TaVs simultaneously. Besides, as shown in Fig. 7, when the density of TaV is high, the average delay performance decreases as the arrival probability of SeV increases, since the computing resources are more sufficient.
Vii Conclusions
In this paper, we have studied the task offloading problem in vehicular edge computing (VEC) systems, and proposed an adaptive learningbased task offloading (ALTO) algorithm to minimize the average offloading delay. The proposed algorithm enables each task vehicle (TaV) to learn the delay performance of service vehicles (SeVs) in a distributed manner, without frequent exchange of state information. Considering the timevarying features of task workloads and candidate SeVs, we have modified the existing multiarmed bandit (MAB) algorithms to be inputaware and occurrenceaware, so that ALTO algorithm is able to adapt to the dynamic vehicular task offloading environment. Theoretical analysis has been carried out, providing a sublinear learning regret of the proposed algorithm. We have evaluated the average delay and learning regret of ALTO under a synthetic scenario and a realistic highway scenario, and shown that the proposed algorithm can achieve low delay performance, and decrease the learning regret up to and the average delay up to , compared with the classical upper confidence bound algorithm.
As future work, we plan to formulate the task offloading problem based on adversarial MAB framework [32], where no stochastic assumptions are made on the delay performance of SeVs. The adversarial setting makes learning more difficult, but may perform better under more complicated vehicular environments such as urban scenarios. Besides, we plan to consider the joint resource allocation of vehicles and infrastractures in the VEC system, in order to further optimize the delay performance.
Appendix A Proof of Lemma 1
In the th epoch, the learning regret is
(26) 
where is the number of tasks offloaded to SeV in the th epoch. According to Lemma 1 in [29] and Theorem 1 in [15], when , the expected number of tasks offloaded to a suboptimal SeV has an upper bound as follows
(27) 
Thus we can prove Lemma 1.
Appendix B Proof of Theorem 1
We have for . Following Lemma 1, the learning regret in the th epoch can be bounded from above as:
(29) 
By summing over the learning regrets of the epochs, we have:
(30) 
Thus Theorem 1 is proved.
Appendix C Proof of Theorem 2
When and , the utility function in (14) is
(31) 
The decision making function in (15) can be written as
(32) 
The learning regret can be written as
(33) 
Since , and , the task offloading problem can be transformed to the opportunistic bandit problem defined in Section III in our previous work [19], with equivalent definitions of learning regret, utility and decision making (as shown in [19], eq. (13)). By leveraging Lemma 7 and Appendix C.2 in [19], we can get the upper bound of , as shown in Theorem 2(1). By leveraging Theorem 3 and Appendix C.2 in [19], we can get the upper bound of the learning regret , as shown in Theorem 2(2).
Appendix D Regret Lower Bound
The regret lower bound of classical UCB algorithms has been investigated in [30, 31, 32]. In the following, we provide a regret lower bound of ALTO in a simple task offloading case, with identical input data size and fixed candidate set of SeVs (and thus the index of epoch is omitted).
Lemma 2.
When the candidate SeV set is not timevarying, and the input data size is identical over time, the learning regret can be bounded from above as:
(34) 
where is the KullbackLeibler divergence of the bit offloading delay distributions of SeV and optimal SeV .
Proof.
With fixed SeV set and identical input data size, the proposed ALTO algorithm reduces to the classical UCB algorithm. According to [30], Theorem 5, when , the number of tasks offloaded to a suboptimal SeV can be bounded as follows
(35) 
Substituting (35) into (A), the learning regret can be bounded as
(36) 
∎
Appendix E Proof of Theorem 3
The proof of Theorem 3 follows the similar idea in [19], while the major difference is that the two SeVs appear at and respectively. Let . We only needs to bound the learning regret in the second epoch, from time to time .
We first bound the number of tasks offloaded to the suboptimal SeV.
Lemma 3.
With periodic input of tasks and fixed bit offloading delay of SeVs,
(37) 
Proof.
First, (37) holds for and . For , we prove the lemma by contradiction. For simplicity, we use rather than . If (37) does not hold, there exists at least one , such that
(38)  
(39) 
Since , SeV 2 is selected at time .
Similar proof can be carried out when . Thus we prove Lemma 3. ∎
Then we prove that the proposed ALTO algorithm can explore sufficiently, such that when the input data size is large, it always selects the optimal SeV 1.
Lemma 4.
With periodic input of tasks and fixed bit offloading delay of SeVs, there exists , such that when and .
Proof.
First, define an auxiliary function
(41) 
and . We prove that . It is easy to see that holds when and . Assume that there exists , such that , but . Since , and , , are integers, we have . Thus SeV 1 is selected at time .
Comments
There are no comments yet.