Log In Sign Up

Prediction-based Hybrid Slicing Framework for Service Level Agreement Guarantee in Mobility Scenarios: A Deep Learning Approach

Network slicing is a critical driver for guaranteeing the diverse service level agreements (SLA) in 5G and future networks. Inter-slice radio resource allocation (IS-RRA) in the radio access network (RAN) is very important. However, user mobility brings new challenges for optimal IS-RRA. This paper first proposes a soft and hard hybrid slicing framework where a common slice is introduced to realize a trade-off between isolation and spectrum efficiency (SE). To address the challenges posed by user mobility, we propose a two-step deep learning-based algorithm: joint long short-term memory (LSTM)-based network state prediction and deep Q network (DQN)-based slicing strategy. In the proposal, LSTM networks are employed to predict traffic demand and the location of each user in a slicing window level. Moreover, channel gain is mapped by location and a radio map. Then, the predicted channel gain and traffic demand are input to the DQN to output the precise slicing adjustment. Finally, experiment results confirm the effectiveness of our proposed slicing framework: the slices' SLA can be guaranteed well, and the proposed algorithm can achieve near-optimal performance in terms of the SLA satisfaction ratio, isolation degree and SE.


page 2

page 3

page 5

page 7

page 9

page 10

page 13

page 14


A Hard and Soft Hybrid Slicing Framework for Service Level Agreement Guarantee via Deep Reinforcement Learning

Network slicing is a critical driver for guaranteeing the diverse servic...

Predicting Channel Quality Indicators for 5G Downlink Scheduling in a Deep Learning Approach

5G networks provide more bandwidth and more complex control to enhance u...

Optimization Results for 5G Slice-in-Slice Scheduling

Open Radio Access Network (ORAN) Slicing for 5G and Beyond is an emergin...

STORNS: Stochastic Radio Access Network Slicing

Recently released 5G networks empower the novel Network Slicing concept....

Dynamic Bandwidth Allocation for PON Slicing with Performance-Guaranteed Online Convex Optimization

The emergence of diverse network applications demands more flexible and ...

ICN-aware Network Slicing Framework for Mobile Data Distribution

Network slicing offers an opportunity to realize ICN as a slice in 5G de...

1 Introduction

With the emergence of 5G telecommunication technology, cellular networks are envisioned to cater services to a wide variety of innovative vertical applications, such as Cellular Vehicle-to-Everything (C-V2X), augmented/virtual reality (AR/VR), with heterogeneous performance requirements including high data rates, ultra-low latency and high reliability [foukas2017network]. Such highly diverse performance requirements of these new services impose a challenge for 5G in terms of scalability, availability, and cost-efficiency [Peter2017_CM].

Network slicing is recognized as a promising technique to guarantee differentiated service quality of service (QoS) and service level agreements (SLAs). Since it can enable multiple logical networks corresponding to different network services run on top of a common physical network infrastructure such that the slices can be customized to satisfy various SLAs through virtualization, isolation techniques [8685766]. From a perspective of radio resource management, the fundamental challenge of network slicing lies in the trade-off of isolation and resource efficiency. On the one hand, to achieve non-interference between slices, the slicing system intends to ensure complete isolation between network slices. On the other hand, inherent radio spectrum scarcity promotes that all slices share a limited radio resource on-demand to ensure efficient utilization. Therefore, inter-slice radio resource allocation (IS-RRA) in the radio access network (RAN) becomes an open technical challenge [ISRRA2020WC].

1.1 Related Work

The development of core network slicing is more mature and can be implemented through advanced computing, caching, and virtualization container technologies[Peter2017_CM]. In contrast, there are still challenges in the design of radio management of RAN slicing that need to be studied. As mentioned before, these challenges include the utilization efficiency of radio resources, the performance isolation between slices, as well as the dynamics of service traffic flow[ISRRA2020WC]

. Recently, deep reinforcement learning (DRL), a promising artificial intelligence tool, has been widely applied for network slicing

[rli2018access, jie2021tcom, TVT2019LIANG, DRL-eMBBURLLC-TWC2021, DRLLSTM, sun2019mix, myWCL2021, liu2020edgeslice, rongpeng2021tvt, chen2019DRL, JSAC2020GANDRL] to address the above challenges due to its capacity to solve model-free problems.

[rli2018access] investigated the application of DRL in solving radio resource slicing under dynamic traffic demands, and try to maximize the users’ QoS and spectrum efficiency (SE). The results exhibited the advantage of DRL in solving model-free resource allocation problems. Moreover, the authors in [jie2021tcom] proposed a DRL-driven hierarchical control strategy to guarantee the long-term QoS and SE. Similarly, [TVT2019LIANG]

proposed a collaborative learning framework consisting of supervised learning in conjunction with DRL to perform large and small time-scale resource allocation, respectively. In

[DRL-eMBBURLLC-TWC2021], the authors proposed a model-based DRL algorithm where eMBB resource allocation and URLLC preemptive scheduling were jointly optimized without considering the performance isolation between slices. To address the dynamic nature of the environment, the authors of [DRLLSTM] incorporated the long short-term memory (LSTM) into DRL to track user mobility. Furthermore, [sun2019mix] and [myWCL2021] developed DRL methods to heterogeneous networks scenarios to solve joint user association and network slicing to achieve network-level QoS guarantees and maximize SE. Similarly, the authors of [liu2020edgeslice] applied DRL in the alternating direction method of multipliers (ADMM) model to realize a decentralized radio, transport and computing resource orchestration of slices. Innovatively, [rongpeng2021tvt] combined the advantages of DRL and graph attention networks to solve the joint handover and slicing problems in a dense cellular network scenario.

Furthermore, some works take into account the issues faced by the practical deployment of DRL, including convergence speed and performance loss in the training stage. Based on [rli2018access], [chen2019DRL] proposed a faster convergence DRL scheme by integrating discrete normalized advantage functions (DNAF) and the deterministic policy gradient descent (DPGD) algorithm. To reduce the bad effects of randomly exploration action, [JSAC2020GANDRL]

proposed a generative adversarial network-powered deep distributional Q network (GAN-DDQN) to train DQN offline.

Based on our literature review and analysis, the above works on network slicing still have the following limitations.

  • In existing works, the fixed assignment slicing scheme can achieve perfect performance isolation among different slices. However, it is prone to result in low resource efficiency. On the other hand, the shared-based slicing scheme can maximize resource utilization, but requires complex algorithms to ensure the slices’ SLA, and cannot guarantee the performance isolation. Therefore, how can we realize a trade-off between resource efficiency and slices’ isolation in a general way is still an open question.

  • To improve the policy in the unknown environment, a DRL algorithm will try some actions randomly to estimate the reward of different actions. During explorations, the DRL algorithm may try some bad actions, which will deteriorate the QoS significantly and may lead to unexpected accidents in real systems. Thus,

    how guaranteeing the network performance during the exploration stage is a bottleneck for applying DRL in practical systems.

  • In a high dynamic scenario, the mobility of users could exacerbate service request volatility and make the pre-allocated radio resource of each slice inefficient. However, most of the existing works do not consider the high mobility scenario. Therefore, it is critical for the network to intelligently adjust resource slicing to provide seamless services in a high mobility environment.

1.2 Contributions

In this paper, we focus on an OFDMA-based cellular network with user mobility. To address the above challenges, we first propose a soft and hard hybrid slicing framework to find a trade-off between isolation and resource efficiency and facilitate slices’ SLA guarantee in the exploration phase. Meanwhile, we design a customized deep learning framework consisting of the LSTM and DQN. Specially, LSTM networks are used to perform network state-awareness at a large timescale, and DQN is adopted to achieve the precise adjustment of resources based on the predicted network state. The major contribution of the paper are summarized as follows:

  • Proposing a soft and hard hybrid slicing framework. It divides the resource into two parts, i.e., the hard and soft parts corresponding to dedicated resources for each slice and shared resources for all slices. On the one hand, the common slice setting significantly reduces the SLA violation in the DRL exploration phase. On the other hand, with reasonable resource allocation, the hybrid slicing framework can maximize resource efficiency while guaranteeing SLA under a required isolation degree constraint.

  • Design of a joint LSTM-based network state prediction and DQN-based slicing control strategy. In the proposed two-step deep learning (DL)-based solution, LSTM networks are utilized to predict each user’s traffic demand and location. Moreover, channel gain is mapped by location and a radio map. Then, the predicted future channel gain and traffic demand are input to the DQN to give an accurate slicing adjustment to adapt to user mobility.

  • Performance evaluation of the proposed DL-based hybrid framework. To give an accurate evaluation, we give three representative comparison algorithms: the Optimal algorithm, the state-of-the-art algorithms Hard-LSTM-A2C [DRLLSTM] and Hard-DQN [sun2019mix], whose details are explained in Section 5. The numerical results illustrate that the proposed method outperforms Hard-DQN and Hard-LSTM-A2C, and achieves near-optimal performance.

The rest of the paper is organized as follows. Section 2 introduces the communication, slice and SLA model. Section 3 first proposes the soft and hard hybrid slicing framework and then based on the hybrid slicing framework, IS-RRA is formulated as a utility maximization problem. Section 4 proposes a two-step DL-based algorithm, which combines LSTM-based network state prediction and DQN-based resource allocation of slices. The numerical results are given in Section 5, followed by the conclusion in Section 6.

2 System Model

Figure 1: Overview of the system scenario.

2.1 Communication Model

The index of user.
The index of slice.
The index of slicing window.
The SLA satisfaction ratio of slice over slicing window .
The isolation degree of slice over slicing window .
The resource utilization ratio of slice over slicing window .
The resource allocated to slice at slicing window .
The resource that slice occupies from the common slice .
The spectrum efficiency of slicing window .
the position of user at the end of slicing window .
The total traffic demand of user at the end of slicing window .
The resource adjustment action of slice at the start of slicing window .
Table 1: Description of key notations.

We consider a typical OFDMA based downlink cellular network consisting of multiple base stations (BSs), denoted by , as shown in Fig. 1. BSs serve multiple users, which is denoted as the set . Assume that the cellular network consists of a set of network slices denoted as and each user equipment (UE) can associate with one or more slices of and denotes UEs that belong to slice . Radio resource is divided into Transmission Time Intervals (TTIs) denoted by in the time domain. represents the number of resource blocks (RB) partitioned from the bandwidth of BS. The duration of a slicing window, where the resource allocated to each slice remains constant, is called an slicing window, denoted by , and each slicing window contains consecutive TTIs. Key notations used in this paper are presented in Table 1. Considering an equal power allocation, the received SINR of user associated with BS at time is given as


where is the transmit power of BS and is the channel gain between BS and user . is the power of additive white Gaussian noise.

For the traditional traffic with a large packet size, e.g. eMBB traffic, the achievable rate of the user can be directly estimated according to Shannon’s capacity [jie2021tcom, shannon1948mathematical]. For the short-sized packet transmission, such as URLLC and MTC services, the data rate falls in the finite blocklength channel coding regime [polyanskiy2010channel]. Therefore, the data rate of UEs is denoted as (2),


where is the allocated bandwidth to UE within -th TTI.

is the transmission error probability, and

is the inverse of the Gaussian Q-function, and represents the length of codeword block in symbols, and is channel dispersion, given by .

2.2 Slice and Mobility Model

As shown in Fig. 1, we consider a mixed slices’ traffic scenario consisting of traditional traffic, e.g., file and multimedia traffic, and C-V2X traffic in Mode-3, where BSs, e.g., eNodeB (eNB) and gNodeB (gNB), directly allocate radio resources to vehicles for their V2X communication through Uu interface (Uu interface refers to the link between UE to the terrestrial radio access network) in a centralized way. The former is typical Enhanced Mobile BroadBand (eMBB) traffic which is latency and reliability tolerant. Its SLA focuses on the minimum throughput. On the contrary, the SLA of C-V2X in terms of latency and reliability are very stringent compared with eMBB, with ultra-reliability and low latency, such as 99.999% and 1 ms in [3gpp.38.824]. In this work, we consider two slices: eMBB and C-V2X Ultra-Reliable Low-Latency Communication (URLLC), as shown in Fig. 1. The BS reserves a number of RBs for each slice based on their SLA requirements and objectives: high throughput for eMBB, low latency and high packet reliability for URLLC, a certain isolation degree for each slice.

An overview of the mobility model of UEs is given in Fig. 1. For the eMBB UEs, we consider a low-speed mobility model, such as the pedestrian of people. However, the motion of vehicle UEs is faster than eMBB UEs. The vehicle is driven along the road from one place to another. Their motion is affected by intersections, traffic lights and routes. Denote the position of UE at the end of slicing window k as .

2.3 SLA Model

Generally speaking, classical QoS metrics for slices’ SLA include throughput, packet latency and transmission reliability. For the throughput, it can be easily derived by aggregating the amount of data that is successfully transmitted over time. For the packet delay, a detailed queuing model of UEs’ packets needs to be clarified.

In this paper, the arrival distribution of traffic is characterised by the pattern of service, and there is no prior knowledge of volatile demand. The arriving packets of UEs are cached in the BS’s buffer and are delivered according to the first-come-first-serve (FCFS) policy. Assume that each UE is corresponding to one data queue at BS. The packet delay consists of two parts, i.e., queuing time and transmission time, where the former is influenced by scheduling policy and the latter is decided by instantaneous data rate. For example, the delay of the -th packet of UE is calculated as


where is the queuing time of the -th packet and is the transmission time. For the slice , the queuing time is close to zero if the average packet arrival rate of UEs is low. And the packet delay is mainly decided by transmission time. With the increase of packet arrival rate, the packet delay is determined by both the queuing and transmission time.

From the perspective of the network, the packet is dropped if its delay exceeds the predefined maximum packet latency [netw2020mei]. The reliability is determined by the percentage of packets that are successfully delivered. Therefore, the transmission reliability of UE is expressed as


where is the delay of the -th packet of UE , and corresponds to the maximum packet delay of UEs in slice .

For the eMBB slice, its main performance metric is the throughput. In this paper, its SLA satisfaction ratio is defined as the ratio of the achieved data rate to the required rate specified in the SLA. For the C-V2X slice, its performance metrics are latency and reliability. The corresponding SLA satisfaction ratio is defined as the percentage of packets that are reliably transmitted within the specified delay in the SLA. Therefore, the SLA satisfaction ratio of slice within one slicing window is denoted as (5), where is the minimum data rate requirement of slice . represents the transmission reliability of packets of UE within slicing window and denotes the specified maximum packet delay.

Thus, we use the throughput, latency and reliability as the QoS metrics to evaluate the SLA satisfaction in the following.


2.4 Impact of High Mobility on Slicing

As mentioned before, the resource dedicated to one slice remains constant during a slicing window. Depending on the dynamic of the environment and realization of network slicing, the slicing window can be configured with different time granularity, e.g., milliseconds, seconds, minutes and hours [ISRRA2020WC]. Generally speaking, the longer slicing window represents a higher level of isolation and lower resource flexibility, while the shorter implies the opposite.

In a high mobility scenario, when a UE moves to a new cell, if the resources corresponding to its slice are sufficient, it can be instantaneously served as enabled by the seamless handover enhancements of 5G new radio (NR) [3gpp.38.913]. However, if the resources of the slice are not sufficient, the SLA of the corresponding slice deteriorates significantly. Intuitive ideas are configuring the slicing window as short as possible or implementing resource over-provision. However, both methods have their challenges. On the one hand, too short slicing windows bring frequent resource slicing reconfiguration, which in turn increases network complexity. Furthermore, limited hardware capacity and practical signalling process do not allow for slicing window configuration in a too short time level, e.g., millisecond level. For example, such reconfiguration involves radio resource control (RRC) procedures which introduce about 80-100 ms delay [3gpp.36.331]. On the other hand, the over-provision method results in low SE.

To guarantee the SLA requirement of high mobility scenarios and improve the SE, we first propose a hard and soft hybrid slicing framework in Section 3. Then, a two-step DL-based RAN resource slicing is given in Section 4.

3 Hybrid Slicing Framework and Problem Formulation

In this section, we first propose a hard and soft hybrid slicing framework. Then, we formulate the IS-RRA problem as a utility maximization problem.

Figure 2: The illustration of the hybrid slicing framework.

3.1 Hybrid Slicing Framework

The purely hard slicing strategy, where each slice can only occupy the resource allocated to it, can guarantee full isolation among slices while it results in low SE. On the contrary, the soft slicing method driven by flexible resource sharing can maximize SE while limited by isolation. Therefore, we propose a novel hybrid slicing framework that can take advantage of both hard and soft strategies. Especially, soft decision, i.e. common slice setting, is proposed to guarantee SLA and improve resource efficiency in high mobility scenarios and the exploration phase of Section 4. The hybrid slicing framework can be understood from the following two aspects.

1) Common Slice Setting: Fig. 2 shows a hybrid scheme, where the resources are divided into two parts, i.e. resources dedicated to slices and resources of the common slice, corresponding to hard and soft strategies. For the hard part, the resource allocated to each slice can only be occupied by UEs of the corresponding slice. For the soft part, all UEs can utilize the resource of the common slice according to their demand and priority.

In a high mobility scenario, the necessary RBs in worst-case situations, e.g., when UEs move to the cell edge or new UEs move to the cell, are more than the average occupied RBs over the entire slicing window. The general solution of hard strategy is to allocate the required RBs of the worst-case to guarantee the SLA of the entire slicing window, which leads to a low SE. However, the reasonable resource configuration of the hybrid scheme enables both SLA satisfaction and resource efficiency with a small sacrifice of isolation. For example, resources required in worst-case are allocated to each slice as the hard part, which can realize SLA guarantee in most TTIs of slicing window. Moreover, each slice can occupy the RBs of the common slice when the hard part cannot satisfy its QoS metric. Therefore, the optimal resource configuration of the hybrid scheme, e.g., the right part of Fig. 2, can achieve the SLA guarantee and maximize the SE under a specific isolation constraint.

2) Periodically Adjusting Resource Slicing: As Fig. 2 shows, the network allocates resources for each slice at the beginning of each slicing window. Therefore, radio resources among slices can be periodically adjusted to adopt a dynamic wireless environment. Generally speaking, the hybrid slicing framework will go through three stages, namely initialization, exploration and convergence, as shown in Fig. 2. In the initialization phase, due to the unknown wireless environment, the soft decision part of the network will dominate to guarantee the slices’ SLA and cannot achieve the required performance isolation. Then, the network enters the exploration phase and continuously adjusts the inter-slice resource allocation to approach the SLA and isolation requirements based on the available performance feedback. As the exploration phase progresses, the network’s awareness of the wireless environment gradually increases and eventually, the slice resource allocation enters the convergence phase, where slices’ SLA and isolation can be satisfied.

3.2 Problem Formulation

For a slice , the degree of isolation in slicing window is represented by follows


where is the allocated resources of slice and denotes resources that slice occupies from the common slice . The objective of the RAN slicing is to guarantee the SLA of diverse slices and simultaneously maximize the SE, which are defined as follows


where is the fluctuation traffic demand of slice , and function represents the complicated relationship between the SLA and traffic demand, allocated resources to slices and scheduling algorithms within slices.

The utility function of one slicing window is defined as follows


where is the utility coefficient of slice and corresponds to SE. is the indicator function to denote whether the SLA of slice is satisfied. Define the threshold of SLA of slice as , we have


The objective of a slice network is to maximize the long-term utility. A general method to maximize the average utility within a finite time period , e.g., an hour, a day, or a week [8931583]. Hence, the network slice problem is formulated as follows.

s. t. (13)

where represents the threshold of required isolation.

The difficulties of problem is reflected in the following aspects.

  • Heterogeneous QoS: The heterogeneous QoS, i.e., throughput, packet delay, reliability, of slices highly complicates the problem.

  • Highly dynamic environment: Due to the varying network dynamics caused by high mobility, it is very tricky to satisfy the strict latency and ultra-reliability requirements of C-V2X UEs. Accordingly, such a situation makes it intractable to guarantee the SLA of the C-V2X slice.

  • Customized scheduling algorithms: Each slice can adapt the customized scheduling algorithm within the slice according to specific QoS metric. The customized scheduling algorithms and volatile traffic demand make extremely complex. An analytical model of in practical networks is almost impossible to derive [mao2019learning].

  • Markovian characteristics of resource slicing: The inter-slice resource allocation of slicing systems exhibits Markovian characteristics, i.e. the allocation strategy affects not only the current SLAs and resource efficiency but also further network state and utility, e.g., the queue of UEs and delay of packets.

To address the challenges above, we proposed a prediction-based IS-RRA through a two-step DL solution in the next Section.

4 Deep Learning-based Proactive Solution

Figure 3: The overview of the proposed two-step deep learning-based solution.

A prediction-based slicing framework through a two-step DL scheme is proposed to solve the above problem (Problem ). As shown in Fig. 3

, both the LSTM neural network and DRL are incorporated in the two-step DL solution.

First, LSTM networks are adopted to predict the network state, including location and traffic of UEs in slicing window level according to historical data. Then we map the location to channel gain by a perfect radio map that records pathloss and shadowing at different locations [outdoorsurvey]. It is worth noting that the predicted channel gain and traffic demand are on slicing window level corresponding to a large-time scale. Therefore, a precise scheme cannot be directly available by the predicted channel gain and traffic demand. Considering the Markovian characteristic of the problem , as well as the superiority of DRL in model-free problems, a DQN-based solution is proposed to give the final resource allocation of different slices based on the proposed hybrid slicing framework. Especially, the predicted channel gain and traffic demand of next slicing window and real channel gain and traffic of current slicing window are jointly input to the DQN as the state to output the resource slicing of next slicing window. Finally, the DQN learns a slicing policy to achieve optimal utility.

4.1 LSTM Based Location and Traffic Prediction

Before the start of each slicing window, the location and traffic demand of each UE is predicted to promote a seamless resource adjustment in a high mobility scenario. To predict the location and traffic demand, a central processor is deployed in the cloud that stores the historical data, predicts the location and traffic of each UE before the start of each slicing window.

The historical data of location and traffic of user is denoted as and , where represents the total traffic demand of user over slicing window and is the number of samples in the history. Consider a one-step prediction, the prediction results is denoted as and , respectively.

Considering the effectiveness of LSTM on forecasting the continuous-time series, e.g., traffic prediction in [trafficpre2020icc] and user mobility prediction in [locapre2021TITS, vehiclepre2021TITS], we apply an LSTM-based neural network to predict traffic and location for each user with historical data records and .

Two LSTM networks with the same architecture referring to [trajecoty2018ICSP] are designed and trained for traffic and mobility prediction, respectively. The input of the LSTM is denoted as , i.e., or , and the output is , i.e., and , respectively. The relation between input and output can be written as


where and represent the nonlinear transformation and the parameters of LSTM networks. With labeled historical data where denotes the ground truth of user in slicing window , i.e., or , LSTM networks are trained by minimizing the mean-square-error (MSE) loss as follows,


After LSTM networks are trained well, the online prediction outputs the location and traffic of the next slicing window. Furthermore, the channel gain of each user can be mapped through location and a radio map. Then the predicted channel gain and traffic are sent to the DRL network to derive the final resource slicing.

4.2 Deep Reinforcement Learning-based Slicing

As mentioned before, the predicted channel gain and traffic are utilized by DRL to give the refined policy to achieve the optimal utility over the entire slicing window. In this paper, an initial resource allocation of slices, e.g., combining NVS [kokku2011nvs] and the predicted channel gain and traffic information, is first given. Then the DRL agent dynamically adjusts the resource allocated to slices to guarantee the SLA and isolation of slices. To achieve efficient and intelligent slicing, the agent observes the environment, i.e., the dynamics of channel gain and traffic of contiguous slicing windows, and makes a decision according to the observed state at the start of each slicing window. The states, actions and reward of the DRL scheme are defined as follows.

State: The state at slicing window contains the channel and traffic information of last slicing window and the predicted ones of slicing window, and upper layer performance feedback information of slicing window . The DRL network is expected to learn the resource adjustment strategy by observing the channel and traffic variations of two adjacent slicing windows and the performance feedback of the last slicing window.

Considering the dynamic distribution caused by user mobility, directly using the channel and traffic information of the users as the input to the network will lead to dynamic input dimensions, making the network untraceable. Therefore, we introduce a hierarchical channel gain set where . The denote the number of users in slice at slicing window whose channel gain is between and . Similarly, denotes the total traffic demand of the corresponding users. Based on these factors, the state is defined as a tuple as follows

where and . denotes the resource utilization ratio of slice over the slicing window .

Action: The agent intelligently adjusts the resource allocation of slices by selecting an action according to the current state . The action for a slice is defined as decreasing, remaining or increasing the allocated resource. It is worth noting that the object of action interaction is the common slice. For example, slice offloads additional resources to the common slice, and slice require more dedicated resources from the common slice at slicing window . The action set of one slice is defined as , where and is the positive integer. For example, define the action of slice is , where , we have . Therefore, the action of the agent at is the combination of actions of all slices, and it is defined as follows

Reward: The reward of agent is defined as follows

where , and is a punishment constant. operation normalizes SE by dividing the predefined maximum value . The positive part of the reward is designed according to the utility defined in Section 3.2. Moreover, we incorporate the isolation constraint (13) into the objective functions with the reward shaping technique [griffith2013policy]. Therefore, there will be a penalty added to the reward if the isolation constraint is violated. Besides, the required SLA satisfaction ratio of the C-V2X slice can be very high, e.g., 99.99%. Thus, by increasing the SLA satisfaction ratio of the C-V2X slice from 99% to 99.99%, the reward only increases 0.99% if with a linear reward. As a result, the gradient of DQN in the training phases will be very small, which results in a long convergence time of DQN. Therefore, the exponential reward function is utilized to train the network more efficiently as approaches 1.

A DQN is applied to design and train the agent, where a neural network (NN) is used to approximate the action-value function, and represents the parameters of NN. The state is input to the DQN, and the network outputs the predicted Q values of each action. With the experience replay and quasistatic target network, the DQN is trained by minimizing the error between the predicted Q values and true Q values as follows,


where is the batch size. The target value is


where represents the parameters of the target network and is the discount factor.

Traffic Model OAI data period process
Packet Size 6k bits 256 bits
Arrival Rate 100 packets/s 100 packets/s
SLA 95%{5M bps} {5ms, 99.99%}
2 3
90% 90%
Ave Num of UEs 20 50
Table 2: Slices Parameters

5 Numerical Results

5.1 Experiments Setup

We consider a road topology consisting of streets, intersections, traffic lights and routes, as shown in Fig. 1. There are nine road segments indexed by AI and six intersections. The length of road A, {C, D, F, G}, {B, E, H}, I is 0.5 km, 1.5 km, 3 km, 1 km, respectively. The width of the road is 7 m. Consider a motion scenario: eMBB users randomly move with 1-2 m/s speed and V2X vehicles move from the point ”Start” to the point ”End” by randomly selecting three routes: {A-B-D-G-I}, {A-C-E-G-I} and {A-C-F-H-I}. The speed of vehicles ranges from 40 km/h to 70 km/h, and the acceleration and deceleration are 2 m/ and 4 m/, respectively. Vehicles ignore the light when they turn right, while they randomly stop 160 s when they go straight and turn left with a red light. The speed is 20 km/h when vehicles cross the intersection. The arrival rate of vehicles at point ”Start” is 1 vehicle/s.

The transmission power of each BS is dBm, and interference among BSs is not considered in this work. RBs with each bandwidth kHz are considered as total bandwidth resources for a BS. The pathloss model is consistent with our previous work [myWCL2021]. Two slices corresponding to two types of services, i.e. eMBB and C-V2X services, are considered in the simulation. Assume that the coverage of each BS is 500 m and the handover delay is omitted. To obtain the ms level delay, the earliest deadline first (EDF) [kargahi2005non] is utilized as the scheduling algorithm within the C-V2X slice and is performed in each ms. The eMBB slice adopts the classical round-robin (RR) algorithm. We select one BS around the intersection, i.e. BS2 in Fig. 1, to evaluate the proposed algorithm. And the detailed slice parameters in one BS are summarized in Table 2 to simulate a near-full load scenario. The values of , and are , and , respectively. Considering the more stringent SLA requirements for the C-V2X slice, we set its reward factor to be larger than the eMBB slice, as shown in Table 2. The network architectures refer to LSTM in [trajecoty2018ICSP] and DQN in [DLMA]. The hierarchical channel gain set is dB. The resource allocated to the common slice is 40 RBs in the initial phase, and the action set for one slice is RB. Three baseline algorithms are compared in our experiments:

  • Optimal a Priori (OP): Given a priori knowledge of traffic and SINR distributions of UEs, the optimal resource slicing is derived by exhaustive search.

  • Hard-LSTM-A2C[DRLLSTM]: In this algorithm, a purely hard slicing framework that incorporates the LSTM into A2C is proposed to track user mobility and improve the system utility.

  • Hard-DQN[sun2019mix]: In this algorithm, a purely hard slicing framework is proposed. And a centralized DQN is utilized to refine slice ratio of the whole network. And then the global slice ratio is mapped to each BS according to users’ SINR and data rate request. To eliminate the effect of imperfect mapping algorithms, in this paper we deploy the Hard-DQN algorithm at the single BS level.

In order to clearly compare the performance of different algorithms, the rewards of all compared algorithms are calculated based on the rewards designed in this paper.

(a) Location Prediction Error of C-V2X UEs
(b) Traffic Prediction of eMBB UEs
Figure 4: The prediction results of location and traffic.

5.2 Prediction Results

The data set contains location and traffic historical data of two slices: 6000 trajectories of V2X UEs and corresponding length data of eMBB UEs. After LSTM networked is trained well, the online test results are shown in Fig. 4.

For the location prediction, Fig. (a)a. shows the results of one vehicle UE when it is within the coverage of the BS2 (see Fig. 1). We can see that the location prediction error is very small, i.e., around 1 m, except the and slicing windows. The data analysis reveals that on slicing windows, the vehicle is at the intersection. The direction randomness, e.g., the vehicle stops for the red light or turns right, and velocity variation increase the prediction error. However, the channel gain error caused by the 5 m location error can be negligible under a perfect radio map.

For the traffic prediction on an slicing window level, the prediction of periodic URLLC traffic is quite easy. Therefore, we give the prediction result of eMBB traffic in Fig. (b)b, where a real traffic dataset of high-dimensional live streaming generated by software-defined radio platform OpenAirInterface [OAIweb] is utilized. And the traffic dataset is used for eMBB UEs in all experiments. It can be observed that the error tends to increase when the traffic demand changes dramatically, e.g, the error from slicing window to slicing window. On the whole, large-scale traffic predictions are very accurate, and the normalized average prediction error is 0.07.

Figure 5: The convergence process of two DRL based algorithms.
(a) The proposed algorithm
(b) Hard-LSTM-A2C
(c) Hard-DQN
Figure 6: The SLA satisfaction ratio of convergence process for the proposed DRL algorithm, the Hard-LSTM-A2C algorithm and Hard-DQN algorithm.

5.3 The Analysis of DRL Convergence Process

Fig. 5 illustrates the convergence process of three DRL-based algorithms in the experiments: the proposed DRL algorithm, the Hard-LSTM-A2C algorithm and the Hard-DQN algorithm. First, the rewards of three DRL algorithms are low initially and increase with training until convergence. Second, the superiority of the proposed algorithm over the Hard-LSTM-A2C algorithm and the Hard-DQN algorithm in the training process is shown in two aspects in the following.

On the one hand, the reward of the proposed algorithm is significantly greater than that of the Hard-LSTM-A2C algorithm and the Hard-DQN algorithm. This is because DRL algorithms try some actions randomly to improve the policy in the exploration phase. Thus, the bad actions of random exploration of the Hard-LSTM-A2C and the Hard-DQN algorithms deteriorate the SLA satisfaction ratio significantly. However, the common slice setting and the limited action space of the proposed algorithm enable a much better SLA performance even in the exploration phase. For example, assume that the optimal resource allocation in the hard scheme is RBs for the eMBB slice and C-V2X slice, respectively. In the exploration phase, the Hard-LSTM-A2C and the Hard-DQN algorithms may try some bad action like , which dramatically deteriorates the slices’ SLA and the reward. For the proposed algorithm, the initial action is based on a baseline algorithm and a bigger common slice setting, e.g., for the eMBB slice, C-V2X slice and common slice. Its exploration phase is a fine-tuning of the current allocation in each slicing window, such as increasing or decreasing the allocation of some slice. Thus, the bad action of it only slightly reduces the reward. Moreover, the bigger common slice setting of the initial phase greatly reduces the probability of extreme allocation, e,g., . And the detailed SLA satisfaction ratio and isolation degree are shown in Fig. 6.

On the other hand, the proposed DRL algorithm converges faster than the Hard-LSTM-A2C algorithm and the Hard-DQN algorithms. Since the predicted channel gain and traffic demand serve as the part of state information. Moreover, the common slice setting increases the overall reward level. These two facts accelerate the convergence of the proposed algorithm.

Fig. 6 demonstrates the SLA satisfaction ratio of three DRL algorithms in the early stage of the training process. Observing Fig. (a)a, the SLA of URLLC of the proposed algorithm is always guaranteed. The reasons lie in two aspects. First, the packet size of the URLLC slice is much smaller than the eMBB slice so that the required resource is lesser than the eMBB slice. Second, the shared resource of the common slice prevents extreme scenarios, e.g. most RBs are allocated to the eMBB slice. Similarly, the SLA of the eMBB slice is guaranteed after about 200 slicing windows. Compared with this, the URLLC slice’s SLA of the Hard-LSTM-A2C algorithm can be guaranteed only after 500 slicing windows, and the SLA satisfaction ratio of the URLLC slice of the Hard-DQN algorithm keeps fluctuating at the first 1000 slicing windows. For the eMBB slice, both two algorithms cannot achieve the required SLA satisfaction ratio at the first 1000 slicing windows. The reasons for the superiority of the proposed scheme over the other two compared algorithms have been elaborated in the discussion of Fig. 5. That is the hybrid slicing setup prevents extremely harsh exploration actions and improves the SLA guarantee in the training phase. In addition, the SLA satisfaction ratio in the Hard-DQN algorithm is lower and more volatile than that in the Hard-LSTM-A2C algorithm. This is because the latter utilizes the LSTM structure and can better track users’ mobility.

Figure 7: The rewards of four algorithms under 50 slicing windows after all algorithms are converged.

We also give the isolation degree of the proposed DRL algorithm in Fig. (a)a. The isolation degree of the Hard-LSTM-A2C and Hard-DQN algorithms is always 1, and we omit it in Fig. (b)b and Fig. (c)c. Naturally, slices’ isolation degree of the proposed DRL algorithm cannot approach the required thresholds, i.e., , in the first 1000 slicing windows. However, it is pointless to discuss isolation when the slices’ SLA cannot be guaranteed. Furthermore, the isolation degree of the proposed algorithm can achieve the required thresholds after the DRL network converges.

/ /
0.972 0.934 0.916 0.872
Table 3: The performance metric of four algorithms after all algorithms are converged.

5.4 Performance Comparison

Fig. 7 shows the achievable reward of four algorithms from slicing windows to slicing windows, where three DRL-based algorithms have converged. It can be observed that both the proposed DRL algorithm can achieve approximately optimal performance. And the reward of the proposed DRL algorithm is slightly larger than the Hard-LSTM-A2C and Hard-DQN algorithms. Since the common slice setting brings a bigger SE. Second, the Hard-LSTM-A2C algorithm outperforms the Hard-DQN algorithm a little. Since the LSTM structure utilizes the historical state information.

Table 3 gives the average performance of four algorithms from slicing windows to slicing windows in terms of SLA satisfaction ratio, resource utilization ratio, isolation degree and normalized SE. and represent the SLA satisfaction ratio of the eMBB slice and C-V2X slice, respectively. Similarly, , , and denote the resource utilization ratio of the eMBB slice, C-V2X slice and common slice. and represents the isolation degree of two slices. We can see that both the proposed DRL algorithm and Hard-LSTM-V2C achieve near-optimal performance in terms of SLA satisfaction ratio, isolation and SE. The proposed prediction-based hybrid slicing framework exhibits a better performance on SLA satisfaction and SE.

6 Conclusion

In this paper, we first propose a soft and hard hybrid slicing framework. Based on this, we propose a joint LSTM-based network state prediction and DQN-based network slicing adjustment strategy to adapt the user mobility. In the proposed solution, LSTM networks are trained to predict the traffic demand and location of each user. Moreover, channel gain is mapped by location and a radio map. Then, the predicted channel gain and traffic demand are input to the DQN to output the precise slicing adjustment strategy. The numerical results verify the effectiveness of the proposed DL-based hybrid slicing framework: 1) The hybrid slicing framework can not only balance the trade-off between isolation and SE of network slices but also significantly reduce the SLA violation in the DRL exploration phase; 2)It achieves near-optimal performance in terms of SLA satisfaction, isolation degree and SE.


This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant 61871262, 62071284, and 61901251, the National Key R&D Program of China grants 2017YFE0121400 and 2019YFE0196600, the Innovation Program of Shanghai Municipal Science and Technology Commission grant 20JC1416400, Pudong New Area Science & Technology Development Fund, Key-Area Research and Development Program of Guangdong Province grant 2020B0101130012, Foshan Science and Technology Innovation Team Project grant FS0AA-KJ919-4402-0060, and research funds from Shanghai Institute for Advanced Communication and Data Science (SICS).