Intelligent Task Offloading for Heterogeneous V2X Communications

by   Kai Xiong, et al.

With the rapid development of autonomous driving technologies, it becomes difficult to reconcile the conflict between ever-increasing demands for high process rate in the intelligent automotive tasks and resource-constrained on-board processors. Fortunately, vehicular edge computing (VEC) has been proposed to meet the pressing resource demands. Due to the delay-sensitive traits of automotive tasks, only a heterogeneous vehicular network with multiple access technologies may be able to handle these demanding challenges. In this paper, we propose an intelligent task offloading framework in heterogeneous vehicular networks with three Vehicle-to-Everything (V2X) communication technologies, namely Dedicated Short Range Communication (DSRC), cellular-based V2X (C-V2X) communication, and millimeter wave (mmWave) communication. Based on stochastic network calculus, this paper firstly derives the delay upper bound of different offloading technologies with a certain failure probability. Moreover, we propose a federated Q-learning method that optimally utilizes the available resources to minimize the communication/computing budgets and the offloading failure probabilities. Simulation results indicate that our proposed algorithm can significantly outperform the existing algorithms in terms of offloading failure probability and resource cost.



There are no comments yet.


page 1


Exploiting Moving Intelligence: Delay-Optimized Computation Offloading in Vehicular Fog Networks

Future vehicles will have rich computing resources to support autonomous...

TODG: Distributed Task Offloading with Delay Guarantees for Edge Computing

Edge computing has been an efficient way to provide prompt and near-data...

Hybrid Vehicular and Cloud Distributed Computing: A Case for Cooperative Perception

In this work, we propose the use of hybrid offloading of computing tasks...

Computation Offloading in Heterogeneous Vehicular Edge Networks: On-line and Off-policy Bandit Solutions

With the rapid advancement in vehicular communications and intelligent t...

Vehicular Edge Computing via Deep Reinforcement Learning

The smart vehicles construct Vehicle of Internet which can execute vario...

Communication and Computing Resource Optimization for Connected Autonomous Driving

Transportation system is facing a sharp disruption since the Connected A...

A Two-Stage Allocation Scheme for Delay-Sensitive Services in Dense Vehicular Networks

Driven by the rapid development of wireless communication system, more a...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Modern transportation systems have evolved vehicles with the automotive artificial intelligence which can make decisions through on-board processors and shares road information by Vehicle-to-Everything (V2X) communications. The emergence of services like intelligent cruise scheduling, road traffic management, and cooperative driving with the shared on-board computing resources, has prompted the need for significant computing capacity and rigorous delay tolerance

[Zhang7907225]. Previous literature [Zhou8535091] remarked that an autonomous vehicle can create

GB of data per second from the on-board processor. However, the volume of the data will exponentially grow with the increasing number of autonomous vehicles. Without enough communication and computing resources, road risk estimation becomes leggy and inaccurate that may incur life-threatening problems.

Due to the constrained on-board processing power and communication bandwidth in vehicular networks, it is impractical to raise the computing and communication capacity of an individual vehicle for automotive applications. To cope with the scarce computing resources problem, Vehicular Edge Computing (VEC) technology has been proposed to support the computing-thirsty tasks in the intelligent transportation systems, in which the computing capabilities of vehicles can be improved through offloading the overburden tasks of a vehicle to the adjacent vehicles or edge servers [Qiao8879573]. However, previous work [Dai8493149Dai, Chai8918338] mainly pays attention to the task offloading schemes with sufficient communication resources. However, the performance of VEC is significantly constrained by the limited communication bandwidth in a practical vehicular network.

On the other hand, the use of heterogeneous V2X communications has great potential to enhance the communication capability for the large scale application of autonomous vehicles [Chen8784150]. Vehicle-to-Vehicle (V2V) communications are the major transmission form in the platoon-based edge computing paradigm, where a platoon of vehicles with sufficient on-board computing resources and communication bandwidth can offer additional mobile computing resources cooperatively [Qiao8436044]. Specifically, there are three widely used types of V2V communication technologies, including the Dedicated Short Range Communications (DSRC) communication, cellular-based V2V (C-V2V) communication, and mmWave communication. As for the Vehicle-to-Infrastructure (V2I) communication, it can support the infrastructure-based edge computing [Zhang8403956], in which the overburden task is offloaded from vehicles to the nearby base station that provides the communication service. In this paper, we investigate the cellular-based V2I communication (C-V2I) that requires vehicles to stay within cellular coverage. In addition, DSRC and mmWave V2V operate on the license free bands [Mavromatis8904214, Ghasempour8088544]. However, the C-V2V and C-V2I work on the licensed band to provide paid communication service.

In the context of VEC with heterogeneous V2X connections and computing servers, the offloading reliability is tightly affected by the selection of access technologies and offloading targets. Malfunctions of any part (transmission may fail or task processing may be interrupted) deteriorate the VEC performance. However, only a few work has investigated the integrating failure probability of computing and communication processes in a heterogeneous VEC network, or incorporated this failure probability into the VEC optimization objective. Therefore, with the constrains of the offloading failure probability and resource cost, it is still an open challenge to attain the reliable and economical VEC services through the optimal selection of access technologies and offloading targets.

To fulfill these research gaps, our study exploits the heterogeneous VEC to optimize tasks offloading with multiple V2X technologies. First, we derive the upper bound of the offloading delay using different access technologies with variable failure probability. Based on the analysis of the upper bounds, a federated learning-based intelligent offloading scheme is proposed to minimize the failure probability and the resource cost in terms of communication cost and computing cost. Our federated Q-learning algorithm can parallelly exploit and share the local knowledge among vehicles. The main contributions are summarized as follows:

  • We derive the upper bound of offloading delay for different V2X technologies by leveraging stochastic network calculus. It is the first closed-form of the upper bound for the offloading delay in the VEC. These upper bounds can guide the design of optimal offloading scheduling and the selection of communication forms for the task offloading.

  • We propose a new optimization model taking account of the communication and computing budgets as well as the failure probability. This is the first work to consider the offloading failure probability in the heterogeneous VEC. Simulations show that the cellular-based V2X (C-V2X) communication have the best performance in terms of the failure probability under different traffic loads.

  • We design a federated learning-based parallel scheme with high scalability and fast convergence. The optimization model is decoupled and parallelly trained by a set of local Q-learning processes, which can accumulate global knowledge by exploiting different local action-state spaces, simultaneously. The proposed consensus Q-Table can amplify the knowledge sharing, and avoid the heavy communication overhead of the training phase.

The remainder of this paper is organized as follows. Section II presents the related works. Section III introduces the system model. Section IV provides the platoon-based edge computing and infrastructure-based edge computing formations. Section V presents the optimization model and our two solutions on the V2X offloading selection. Section VI demonstrates the simulation results and the performance discussion. Finally, we draw the conclusion in Section VII.

Ii Related Work

Existing work on VEC focuses on selecting the optimal offloading targets taking account of resource constraints. Dai et al. [Dai8493149Dai] investigated the multiple vehicles task offloading problem of VEC and proposed the two-step iterative algorithm to optimize offloading ratio and computation resource. Zhang et al. [Zhang7997360] proposed a Stackelberg game approach for the offloading candidates selection to optimize the utilities of both vehicles and VEC servers. Zhang et al. [Zhang8403956] regarded vehicles as caching servers and proposed a caching service migration scheme, where the communication, computing, and caching resources at the wireless network edge are jointly scheduled. However, these VEC solutions ignored the potential selection problem of optimal access technologies for task offloading.

Furthermore, each V2X technology has its own limitations. DSRC is a typical competition-based communication that is not suitable for multiple tasks communication due to scalability issues [Sial8859331]. The C-V2X technology eliminates the above drawbacks, while it charges the fee to vehicles [Zheng7293220]. In contrast, the mmWave communication works at the high-frequency unlicensed bands with the large available bandwidth [Chen8784150]. However, it needs antenna beam alignment that incurs additional link budget [chogwentsp2019]. Consequently, the academic society has been exploited a heterogeneous V2X architecture that complements shortcomings of individual V2X technology. Abboud et al. [Zhuang7513432] discussed the interworking issue of DSRC and cellular communication solutions. Perales et al. [Perales8642796] proposed a heterogeneous DSRC and mmWave vehicular network, which used side information of DSRC channel to speed up mmWave beam alignment. Prior work of Sim et al. [Sim8472783] integrated mmWave into C-V2X. Katsaros et al. [Katsaros7277110] developed stochastic network calculus to derive the stochastic upper bound of the end-to-end delay for the DSRC combining with C-V2X hybrid communications.

However, to our best knowledge, there is no literature discussed the interworking of DSRC, C-V2X, and mmWave communications in the heterogeneous VEC system. In addition, existing work on VEC devoted to the computation offloading between the vehicle and the VEC servers. They ignored the offloading failure probability due to communication issues and the reliability of VEC. To fill the research gap, our work attempts to addresses the problem of tasks offloading in the heterogeneous VEC environment in accounts of the offloading failure probability and resource cost. This is a challenging problem due to the complicated dependency between multiple access technologies and offloading targets.

Iii System Model

As shown in Fig. 1, the proposed VEC framework consists of vehicles, cellular base stations (BSs), and VEC servers, where VEC servers are located with the BSs at the roadside to reduce the end-to-end communication and process latency. Since the C-V2I communication has several advantages over other V2I technologies, including longer range and enhanced reliability [Sial8859331], this paper only regards the cellular BS as the roadside infrastructure. In addition, VEC servers are connected with each other through the X2 interfaces of the accompanied BSs to form a resource pool, namely, the VEC pool. To simplify the model, we apply the virtual link to depict the logical connection between the VEC pool and the connected VEC servers. In this paper, the centralized resource management is implemented in the VEC pool that provides efficient resource utilization to the C-V2I offloading tasks. However, the resource management in a platoon is achieved by the platoon header that is a resource-rich vehicle elected from the platoon vehicles [Qiao8879573].

Furthermore, we propose the Synchronous Federated Q-Learning (Sync-FQL) algorithm that instructs the offloading direction of the network traffic taking accounts of offloading failure probability and resource cost. This algorithm is deployed in the VEC pool or platoon header. As shown in Fig. 2, the input of Sync-FQL algorithm includes delay requirements of the offloading tasks, traffic attributes of the offloading tasks (arrival rate of and brustiness measure ), as well as the service attributes of servers (envelope service rate and peak service capacity of each server) [Rizk6868978]. Note that we treat V2X communication and task processing as a service.

Fig. 1: Illustration of the heterogeneous VEC framework.

Since the V2X technologies and the offloading targets (vehicles or VEC servers) are tightly coupled, it is a typical studied object for the stochastic network calculus theory. Recent advances in network performance researches have adopted network calculus to estimate the end-to-end delay [Katsaros7277110, Yang8252754]. However, these previous works did not investigate the stochastic network calculus in the VEC system from the perspective of the offloading failure probability as well as communication and computing resources. In this paper, we investigate the stochastic network calculus to obtain the offloading performance metrics. Once the theoretical performance of each V2X technology is available, it can be used to guide the search pruning in the training process.

Moreover, offloading scheduling is impacted by the properties of input network traffic. Thus, it is important to capture the statistical characteristics of the input traffic. From the perspective of the network calculus, an arrival traffic curve can be modelled as the cumulative volume of the input traffic during interval . And, is the cumulative volume of the input traffic during , where . Moreover, has a statistical envelope , referred to as exponentially bounded burstiness model [Rizk6868978], which provides a validated inequality , where is an arrival rate of task . is the burstiness measure of task [Rizk6868978]. And, is the violated probability. In virtue of Chernoff’s bound [Rizk6868978], we get



is the expectation of random variable

. is a constant parameter. We assume that there are categories automotive tasks in the transportation system, such as vehicular Internet and infotainment, remote diagnostics and management, cooperative lane change assist, and cooperative adaptive cruise control [Campolo2017]. Each task generates own network traffic that is characterized by the arrival rate and brustiness measure . In addition, the task can be divided into several parts and processed separately.

Hereafter, we investigate the dynamic service curve that is provided by a channel or a processor [Jiang2008Stochastic]. is non-negative and increases with . A dynamic service envelope is defined as for all , where is the envelope service rate. is the peak service capacity of the server [Rizk6868978]. According to Chernoff’s bound, the envelope is rewritten as


When the dynamic server processes the input traffic , the processing delay with a failure probability satisfies the following inequality [Fidler4015760]


where .

Fig. 2: Demonstration of the offloading scheduling.

Iv Performance Analysis on VEC

In this section, we derive the end-to-end delay upper bound of different networks in the platoon-based edge computing and infrastructure-based edge computing, respectively.

Iv-a Platoon-based Edge Computing

Vehicles can form a platoon to share their computing resource with the surrounding vehicles for cooperatively processing tasks. The offloading target of the platoon-based edge computing is the neighbor vehicles. There are three kinds of communication technologies to support the platoon-based edge computing that is shown in Fig. 2. The C-V2V communication operates in a licensed band and needs the vehicles to pay a fee to the mobile network operator. While DSRC and mmWave V2V communications are free to access.

Iv-A1 Delay Upper Bound of DSRC

The DSRC standard is based on the 802.11p amendment to the IEEE 802.11 standard that adapts the exponential back-off algorithm to cope the access competition. The initial size of the back-off window is assumed as . The retry limit is set to . Thus, the number of the back-off stage is [Katsaros7277110]. And, the available bandwidth of DSRC is denoted by . Therefore, the size of the window in the back-off state can be expressed as , where is a threshold of the back-off counter, . When the back-off counter exceeds , the size of the window will not grow anymore. Hence, the access delay of input traffic at the head-of-line is , in which is the longest duration of the back-off stage . And, [Dianati7277110], where


represents the largest number of back-off slots at stage . is the average length of a back-off slot. And, is the average length of a transmission slot, where is the average brustiness measure of the input traffic [Rizk6868978]. Without loss of generality, we assume and the probability of a collision occurring in a transmission slot is . Therefore, represents the probability of a collision occurring in an unit period. If , we get .

Moreover, the DSRC transmission process can be regarded as the classical latency-rate service where is the access delay [Jiang2008Stochastic]. Therefore, the latency-rate service curve of DSRC is given as


in which when . Otherwise, . By virtue of the superposition property [Rizk6868978], the whole input traffic except for task is regarded as the background arrival curve for task :


where is the percentage of task using the DSRC offloading. And, is the accumulated traffic volume of task in interval , i.e., , where is an envelope arrival rate of task ; is a burstiness measure of task [Rizk6868978]. Then, based on the theory of Leftover Service [Jiang2008Stochastic], the service curve of DSRC transmission for task is expressed as


If task has offloaded to the target vehicle, it needs to compete with other offloaded tasks (i.e., offloaded by mmWave, DSRC, and C-V2V) and the local processed tasks for the on-board CPU cycling (computing resource). Similar to Eq. (6), the background on-board processing service curve of task is


where is the number of vehicles in the road segment. is the percentage of the task using category offloading, where represents one of the DSRC, C-V2I, C-V2V, mmWave, and local processing. Hence, for any task , the equation holds. Afterwards, based on the Leftover Service property [Jiang2008Stochastic], the on-board processing service curve for task using DSRC offloading is given as


where , and is the computing capacity for the on-board processor. Furthermore, according to the concatenated property [Fidler4015760], the total service curve of the DSRC transmission and the on-board processing is


which represents task traversing the services of DSRC transmission and the on-board processor, where . Therefore, based on Eq. (3), the validated inequality of DSRC offloading is


where , , , and . The first inequality of Eq. (11) is attained by the Chernoff’s bound. And, the second inequality is derived by the fact:


The third inequality of Eq. (11) is based on the assumption with . The last equality of Eq. (11) is obtained by the infinity geometric sum, where is assumed for convergence. In addition, we can regard the last equation as the offloading failure probability of task :


where represents the probability that the offloading delay (transmission delay plus the on-board processing delay) exceeds the value of . Through a simple algebraic operation, we get the upper bound of the offloading delay :


where . The inequality of Eq. (14) is because of . Hence, the delay upper bound of DSRC offloading for task with the offloading failure probability is expressed as


Iv-A2 Delay Upper Bound of C-V2V

C-V2V communication can be regarded as specific reservation-based V2V technology. Additionally, Release 14 of C-V2X standardization in the Third Generation Partnership Project defines two new modes (mode 3 and mode 4) for C-V2V communication [Chen7992934]. To simplify the analysis, we focus on the C-V2V communication under mode 3. In mode 3, the communication resource for each V2V communication is pre-assigned by the nearby base station. There is no access competition in the transmission procedure. Hence, the service curve of C-V2V communication under mode 3 is , where is the exclusive bandwidth for task . Similar to the DSRC task offloading, task also needs to compete with other offloaded tasks and the local processed tasks in the on-board processor. Based on the Leftover Service property, the on-board processing service curve of C-V2V offloading is


Hereafter, by virtue of the concatenated property, the total service curve of C-V2V offloading for task is . Therefore, the validated inequality of C-V2V offloading is given as


where , , , and . Then, we denote the right hand-side of Eq. (17) as the offloading failure probability . Hence, the delay upper bound of C-V2V offloading is


Iv-A3 Delay Upper Bound of mmWave

The implementation of mmWave communication needs the antenna beam alignment at first that incurs additional time overhead compared to other V2V communications [chongwenTVT]. Furthermore, the usage of the control channel for beam alignment can significantly reduce the complexity of the alignment [Gruteser8642796]. This control channel delivers the Request-To-Send like (RTS-like) and Clear-To-Send like (CTS-like) beacons that contain vehicular kinetic information (location, speed, acceleration, and heading direction) and the communication types [Gruteser8642796]. In addition, the sub-6GHz control channel for RTS/CTS-like beacons can be a competition-based channel or a reservation-based channel. To simplify the notations, the mmWave offloading with the competition-based channel aided alignment is denoted by CmmW. And, RmmW represents the mmWave offloading with the reservation-based channel aided alignment.

Delay Upper Bound of CmmW

In this sub-section, we resort to the DSRC channel as the competition-based control channel for the beam alignment, which is depicted in Fig. 3. The mmWave communication includes two procedures: the beam alignment and the data transmission. Vehicle initiates the beam alignment through sending an RTS-like beacon that contains the kinetic information of vehicle . When vehicle has listened to the RTS-like beacon, it returns a CTS-like beacon with its kinetic information to confirm the mmWave communication, then the alignment procedure has completed [Perales8642796]. Due to the concatenated property, the service curve of mmWave communication is the convolution of the beam alignment procedure and the data transmission procedure. Moreover, the beam alignment procedure is composed of the RTS-like beacon transmission and CTS-like beacon transmission. Consequently, the service curve of mmWave communication is given as

Fig. 3: Beam alignment of CmmW communications.

where and are the service curves of RTS and CTS transmissions, respectively. In general, the transmissions of RTS and CTS are symmetrical. Thus, we assume the service curves of RTS and CTS transmissions are same. However, due to the beam alignment using the DSRC channel, the original DSRC traffic could impact on the alignment performance. According to the Leftover Service theory, the service curve of the DSRC-based RTS is , where is the brustiness measure of the RTS-like traffic that is identical to that of the CTS-like traffic. However, the brustiness measure of RTS/CTS-like beacons usually too small to interfere with the channel capacity, compared with that of the DSRC traffic. To simplify the analysis, the brustiness measure of RTS/CTS-like traffic is negligible, i.e., . In addition, the service curve of mmWave data transmission is , where is the channel capacity of mmWave. Therefore, according to Eq.(19), the service curve for mmWave transmission is


When task has arrived at the service vehicle through mmWave transmission, it also competes with other offloaded or local processed tasks for the on-board CPU cycling. The on-board processing service curve of the CmmW offloading for task is


Hence, based on the concatenated property, we get the total service curve of the CmmW offloading, :


The validated inequality for the CmmW offloading is


where , , , and . Since the right hand-side of Eq. (23) can be regarded as the failure probability , the upper bound of CmmW offloading delay for task is


Compared with the C-V2V offloading, the mmWave offloading has an additional item that causes by the DSRC traffic. The additional DSRC traffic deteriorates the performance of CmmW offloading.

Delay Upper Bound of RmmW

The reservation-based control channel is purchased from the mobile network operator or licensed by the standards in the future. Following the same derivation of the CmmW offloading, the service curve of RTS/CTS transmission in the RmmW alignment is where is the capacity of reserved control channel. Therefore, the service curve for the RmmW transmission is


In addition, the on-board processing service curve of the RmmW offloading is the same as that of CmmW offloading. Thus, the delay upper bound of RmmW offloading for task is


where is the brustiness measure of the RTS/CTS-like traffic. Usually, the volume of the control traffic is constrained to be very small to mitigate the communication budgets. Hence, the constant is negligible. Hereafter, the delay upper bound of RmmW offloading is similar to that of the C-V2V offloading, i.e., Eq.(18). Consequently, the performance of C-V2V offloading can be a good reference for that of RmmW. Therefore, we do not individually explore the RmmW offloading in the remainder of this paper.

Iv-B Infrastructure-based Edge Computing

As shown in Fig. 1, cellular base stations, deployed in the roadsides, collect data from vehicles, and deliver the data to the VEC pool for processing. Thus, the C-V2I offloading has two components: uplink transmission and VEC pool processing. Since C-V2I uplink is typical reservation-based communication, its service curve can be expressed as


where is the assignment uplink bandwidth for task . As for the VEC pool processing, the computing resources of VEC pool are shared to all the upload tasks. Thus, the VEC pool processing service curve for task is


in which is the total computing capacity of the VEC pool that is larger than the computing capacity of on-board processor . The total service curve of C-V2I offloading for task is denoted by . Referring to the derivation of Eq. (11), the validated inequality of C-V2I offloading is


where , , , and . Therefore, the delay upper bound of C-V2I offloading is:


Iv-C Local Processing

Vehicular applications could be self-digested by the vehicle’s own on-board processor. However, the local processed task still has to compete with other offloaded and local tasks. Hence, the local processing service curve of task is


And, the corresponding validated inequality is


where , , , and . Let the right hand-side of Eq. (32) equal to , the delay upper bound for local processing is given as


V Model Optimization

This section proposes a new optimization model taking account of the considered communication and computing cost, as well as the failure probability. When vehicles have utilized C-V2X communication, the cellular operator could charge the fee per Mbps to vehicles for transmission service. Moreover, the communication cost of task is only generated by the C-V2X communication due to the licensed band, i.e.,


Regarding the computing cost, the unit computing costs of the VEC pool and on-board processor are per Mbps and per Mbps, respectively. The VEC pool computing cost is produced by C-V2I offloading, while the computing cost of the on-board processor is yielded by the DSRC offloading, the C-V2V offloading, and the mmWave offloading. Since the local processing only employs the local computing resource, it does not generate any cost of communication and computing. Thus, the computing cost for task is


where is the computation complexity of task [Liu2018].

Furthermore, we originally treat the failure probability as another cost in the VEC because the failed offloading will tightly deteriorate the quality of service for automotive tasks. Moreover, the minimized cost model for the heterogeneous VEC is proposed in .


where . . is the delay requirement of task . and the offloading failure probability of different access technologies are attained by Eq. (15), (18), (26), (24), (30), (33), respectively. represents the priority of application . This paper first proposes the offloading failure probability as the optimization objective that is rational and neglected in previous literature. However, since is non-convex that is difficult to solve.

According to Eq. (15), (18), (24), (30), (33), is inversely proportional to . When has been minimized, constraint becomes tight, i.e., . Consequently, can be removed by replacing as . Thus, transforms into , equivalently.


Considering the non-convex objective of , it is also difficult to directly solve. In this paper, we propose two canonical solutions to address . One is a parallel learning-based method, federated Q-learning (FQL) that is presented in Alg. 1, in which is the training times of the federated Q-learning. is the consensus Q-table for aggregating. The other canonical solution is Relaxation optimization that applies the relaxation trick to transfer the original non-convex problem to a convex problem with low complexity.

1 Initialize action-state Q-table ; CQ-table ;
2 for  to  do
3       for each offload technology in parallel do
4             Selecting an action with max or randomly rollout with a certain probability;
5             Update reward using Eq. (38); compute local update of using Eq. (39);
7      if the aggregator receives all local updated  then
8             Update global using Eq. (40); Set for all ;
Algorithm 1 Sync Federated Q-Learning (Sync-FQL)

V-a Federated Q-Learning

The V2X offloading selection is modeled as a markov decision process, which consists of a tuple

where and are the set of states and actions, respectively. Transition probability is the probability of a transition occurred where an agent enters state after taking action at state . Reward is a feedback of selecting action at state [Hendrik2012].

V-A1 State

is a state of the offloading scheduling, where is the assigned proportion of different offloading technologies used in task . is the reserved bandwidth for C-V2X and DSRC offloading. In addition, according to current assignment , the available bandwidth for C-V2X and DSRC offloading are updated to , and , respectively.

V-A2 Action

an action stands for a traffic assignment of different offloading technologies. When one kind of offloading technology increases/decreases (or ) percentage of traffic volume, the other offloading technologies will equally decrease/increase ( or ) percentage of traffic volume, respectively. is the total number of offloading technologies. The agent has a certain probability to select the action with the maximum value of the Q-Table. or randomly choose an action in the available action set with probability .

V-A3 Transition Probability

at each state, is non-zero except for the offloading percentage of one offloading technology is equal to or that cannot increase or decrease, respectively. Otherwise, the current offloading percentage of the technology randomly adds one of the element in to update its offloading percentage.

V-A4 Reward

When the proportion (offloading percentage) of offloading technology updates, the algorithm returns reward that represents the gain of selecting offloading technology . Noted that the reward is a negative form of the total cost.


V-A5 Q-Table

Q-Table could be regarded as an action-state performance index function. In this paper, the Q-Table is a three-dimensional matrix whose size is , where is the number of the offloading technology (including local processing). is the number of the application category. is the number of feasible , and the number of feasible is determined by the accuracy of . In this paper, the accuracy is set to 0.01, which means will increase/decrease with at least 0.01 in each iteration. The number of feasible is 101, counting from 0 to 100. Therefore, the element