Log In Sign Up

A Hard and Soft Hybrid Slicing Framework for Service Level Agreement Guarantee via Deep Reinforcement Learning

by   Heng Zhang, et al.

Network slicing is a critical driver for guaranteeing the diverse service level agreements (SLA) in 5G and future networks. Recently, deep reinforcement learning (DRL) has been widely utilized for resource allocation in network slicing. However, existing related works do not consider the performance loss associated with the initial exploration phase of DRL. This paper proposes a new performance-guaranteed slicing strategy with a soft and hard hybrid slicing setting. Mainly, a common slice setting is applied to guarantee slices' SLA when training the neural network. Moreover, the resource of the common slice tends to precisely redistribute to slices with the training of DRL until it converges. Furthermore, experiment results confirm the effectiveness of our proposed slicing framework: the slices' SLA of the training phase can be guaranteed, and the proposed algorithm can achieve the near-optimal performance in terms of the SLA satisfaction ratio, isolation degree and spectrum maximization after convergence.


Federated Deep Reinforcement Learning for Resource Allocation in O-RAN Slicing

Recently, open radio access network (O-RAN) has become a promising techn...

DeepSlicing: Deep Reinforcement Learning Assisted Resource Allocation for Network Slicing

Network slicing enables multiple virtual networks run on the same physic...

OnSlicing: Online End-to-End Network Slicing with Reinforcement Learning

Network slicing allows mobile network operators to virtualize infrastruc...

Consolidated Adaptive T-soft Update for Deep Reinforcement Learning

Demand for deep reinforcement learning (DRL) is gradually increased to e...

Evolutionary Deep Reinforcement Learning for Dynamic Slice Management in O-RAN

The next-generation wireless networks are required to satisfy a variety ...

Constraint-Aware Deep Reinforcement Learning for End-to-End Resource Orchestration in Mobile Networks

Network slicing is a promising technology that allows mobile network ope...

I Introduction

With the emergence of 5G telecommunication technology, cellular networks are envisioned to cater services to a wide variety of innovative vertical applications, such as Cellular Vehicle-to-Everything (C-V2X), augmented/virtual reality (AR/VR), with heterogeneous performance requirements including high data rates, ultra-low latency and high reliability [foukas2017network]. Network slicing is recognized as a promising technique to guarantee differentiated service QoS and service level agreements (SLAs). Since it can enable multiple logical networks corresponding to different network services run on top of a common physical network infrastructure such that the slices can be customized to satisfy various SLAs through virtualization, isolation techniques [8685766].

From a perspective of radio resource management, the fundamental challenge of network slicing lies in the trade-off of isolation and resource efficiency. On the one hand, to achieve non-interference between slices, the slicing system intends to ensure complete isolation between network slices. On the other hand, inherent radio spectrum scarcity promotes that all slices share a limited radio resource on-demand to ensure efficient utilization. Therefore, inter-slice radio resource allocation (IS-RRA) in the radio access network (RAN) becomes an open technical challenge [ISRRA2020WC].

In order to address the above problem, deep reinforcement learning (DRL) technology is widely applied due to its ability in model-free problems [rli2018access, chen2019DRL, jie2021tcom, sun2019mix, myWCL2021]. [rli2018access] investigates the application of DRL in solving radio resource slicing and priority-based core network slicing, and the results exhibit the advantage of DRL in solving model-free resource allocation problems. Based on [rli2018access], [chen2019DRL] proposed a faster convergence DRL scheme by integrating discrete normalized advantage functions (DNAF) and the deterministic policy gradient descent (DPGD) algorithm. The authors in [jie2021tcom] propose a hierarchical control strategy to guarantee the long-term QoS of services and spectrum efficiency (SE), where DQN and DDPG networks are applied to solve the long-term and short-term problems, respectively. [sun2019mix] and [myWCL2021] develop DRL methods to heterogeneous networks (HetNets) scenarios to solve joint user association and network slicing problems.

However, existing works for IS-RRA focus on purely hard isolation schemes where each slice is allocated with dedicated resources. And the performance loss of such schemes caused by action exploration or network fine-tuning may be unbearable. To minimize the performance loss during exploration phases, we propose a hard and soft hybrid slicing framework by introducing a Common slice setting under a specific isolation degree constraint, in which UEs of all slices can utilize the resource of the common slice. Especially, the number of resources of the common slice can be significant in the initial training phase to guarantee slices’ SLA. As the network training, the resource of the common slice is gradually adjusted until the DRL network converges to an optimal state.

Overall speaking, this paper proposes a hard and soft hybrid slicing framework to guarantee the slices’ SLA and maximize the SE as much as possible under a specific isolation constraint. Compared with purely hard algorithms based on DRL, the proposed scheme is capable of guaranteeing slices’ SLA all the time, even in the initial training phase. Moreover, it achieves near-optimal performance in terms of SLA satisfaction, SE and isolation.

Ii System Model

Ii-a Communication Model

We consider a typical OFDMA based downlink cellular network consisting of a single base station (BS), where there exist multiple users denoted as . Assume that the cellular network consists of a set of network slices denoted as and denotes the UEs that belongs to slice . Radio resource is divided into Transmission Time Intervals (TTIs) denoted by in time domain. The bandwidth is partitioned into resource blocks (RBs). The duration of a slicing window, where the resource allocated to each slice remains constant, is called epoch, denoted by

, and each epoch contains

consecutive TTIs. Consider a equal power allocation, the SINR of user at time is given as , where is the transmit power of BS and is the channel gain of user . is the power of additive white Gaussian noise.

For the traditional traffic with a large packet size, e.g. eMBB traffic, the achievable rate of the user

can be directly estimated according to Shannon’s capacity. For the short-sized packet transmission, such as uRLLC and MTC services, the data rate falls in the finite blocklength channel coding regime

[polyanskiy2010channel]. Therefore, the data rate for are modeled as (1),


where is the time duration of one TTI and is the allocated RBs to UE within -th TTI.

is the transmission error probability, and

is the inverse of the Gaussian Q-function, and represents the the length of codeword block in symbols, and is channel dispersion, given by .

Ii-B SLA Model

Generally speaking, classical QoS metrics for slices’ SLA include throughput, packet latency and transmission reliability. For the throughput, it can be easily derived by aggregating the amount of data that is successfully transmitted over time. For the packet delay, a detailed queuing model of UEs’ packets needs to be clarified.

In this paper, the arrival distribution of traffic is characterised by the pattern of service, and there is no prior knowledge of volatile demand. The arriving packets of UEs are cached in the BS’s buffer and are delivered according to the first-come-first-serve (FCFS) policy. Assume that each UE is corresponding one data queue at BS. The packet delay consists of two parts, i.e., queuing time and transmission time, where the former is influenced by scheduling policy and the latter is decided by instantaneous data rate.

From the perspective of the network, the packet is dropped if its delay exceeds the predefined maximum packet latency [netw2020mei]. The reliability is determined by the percentage of packets that are successfully delivered. Therefore, the transmission reliability of UE is expressed as


where is the delay of the -th packet of UE , and corresponds to the maximum packet delay of UEs in slice .

For the throughput, the SLA satisfaction ratio of slice within one epoch is defined as follows


where is the minimum data rate requirement.

For the latency and reliability, given the the maximum packet delay , the SLA satisfaction ration can be represented by the reliability. Therefore we have


where represents the transmission reliability of packets of UE under maximum delay constraint. Thus, we use the throughput, latency and reliability as the QoS metrics to evaluate the SLA satisfaction in the following.

Iii Hybrid Slicing Framework and Problem Formulation

Figure 1: The illustration of the hybrid slicing framework.

Iii-a Hybrid Slicing Framework

The purely hard slicing strategy can guarantee full isolation among slices, while it suffers from the dynamic environment and results in SLA deterioration and low resource efficiency. On the contrary, the soft slicing method can maximize resource efficiency while limited by isolation. Therefore, we propose a novel hybrid slicing framework that can take advantage of both hard and soft strategies. Especially, soft decision, i.e. common slice setting, is utilized to guarantee SLA and improve resource efficiency in the exploration phase. The hybrid slicing framework can be understood from the following two aspects.

1) Common Slice Setting: In purely hard schemes, resources dedicated to a slice need to be large enough or over-provisioning to fully guarantee the SLAs, even in the worst-case scenario of the entire slicing window. Fig. 1 shows a hybrid scheme, where the resources are divided into two parts, i.e. resources dedicated to slices and resources to common slice, corresponding to hard and soft strategies. All UEs can utilize the resource of the common slice according to their demand and priority. Reasonable resource configuration of the hybrid scheme enables both SLA satisfaction and resource efficiency with a small sacrifice of isolation. For example, resources required in worst-case scenarios can realize SLA guarantee in most cases and resources of the common slice are shared to guarantee the slices’ performance of worst cases such that both SLA satisfaction and resource efficiency can be maximized under a specific isolation constraint.

2) Periodically Adjusting Resource Slicing: As Fig. 1 shows, radio resources can be periodically allocated to each slice to adopt a dynamic wireless environment. For example, the resources of the common slice can be significant in the initial phase to guarantee slices’ SLA. As increasing awareness of the environment increases, the slice configuration converges to a precise scheme according to isolation requirement, which corresponds to the last slicing window in Fig. 1.

Iii-B Problem Formulation

For a slice , the degree of isolation in epoch is represented by follows


where is the allocated resources of slice and denotes resources that slice occupies from the common slice . The objective of the RAN slicing is to guarantee the SLA of diverse slices and simultaneously maximize the SE, which are defined as follows


where is the fluctuation traffic demand of slice , and function represents the complicated relationship between the SLA and traffic demand, allocated resources to slices and scheduling algorithms within slices.

The utility function of one epoch is defined as follows


where and are utility coefficients, and is the indicator function to denote whether the SLA of slice is satisfied.

The objective of a slice network is to maximize the long-term utility. A general method to maximize the average utility within a finite time period , e.g., an hour, a day, or a week [8931583]. Hence, the network slice problem is formulated as follows.

s. t. (11)

where represents the threshold of required isolation.

The difficulties of the problem is reflected in two aspects. First, the heterogeneous QoS, i.e., throughput, packet delay, reliability, of slices, highly complicates the problem. Second, customized scheduling algorithms within slices and volatile traffic demand make extremely complex. An analytical model of in practical networks is almost impossible to derive [myWCL2021]. Moreover, resource allocation of slicing systems exhibit Markovian characteristic, i.e. the allocation strategy affects not only the current SLAs and resource efficiency but also further network state and utility, e.g., the queue of UEs and delay of packets. Therefore, DRL based solution is designed in the following section.

Iv DRL based Solution

Iv-a Design of the DRL scheme

As mentioned before, the resource slicing problem can be solved by the DRL technique. In this paper, an initial slice resource allocation, e.g., NVS [kokku2011nvs], is first given. Then the DRL agent dynamically adjusts the resource allocated to slices to guarantee the SLA and isolation of slices. To achieve efficient and intelligent slicing, the agent observes the environment, e.g., performance feedback, resource utilization and so on, and makes a decision according to the observed state at the start of each epoch. The states, actions and reward of the DRL scheme is defined as follows.

State: The state is defined as a tuple as follows


where is the resource utilization of slice that is defined as the ration of used resources to the allocated resources.

Action: The agent intelligently adjusts the resource allocation of slices by selecting an action according to the current state . The action for a slice is defined as a set of decreasing, remaining and increasing the allocated resource. It is worth noting that the object of action interaction is the common slice. For example, slice offloads additional resources to the common slice and slice require more dedicated resources from the common slice at epoch . And the action set of one slice is defined as , where and is the positive integer. For example, define the action of slice is , where , we have . Therefore, the action of agent at is defined as follows


Reward: The reward of agent is defined as follows

where , and is a punishment constant. operation normalizes SE by dividing the predefined maximum value . The exponential reward function is to train the network more efficiently as approaches 1.

Figure 2: The convergence process of two DRL based algorithms.

Iv-B Training of Agents

A deep Q network (DQN) is applied to design and train the agent, where a neural network (NN) is used to approximate the action-value function, and represents the parameters of NN. The state is input to the DQN, and the network outputs the predicted Q values of each action. With the experience replay and quasistatic target network, the DQN is trained by minimizing the error between the predicted Q values and true Q values as follows,


where is the batch size. The target value is


where represents the parameters of the target network and is the discount factor.

Traffic Model Poisson process period process
Packet Size 55k bits 256 bits
Arrival Rate 100 packets/s 100 packets/s
SLA 95% {5M bps} 99% {5 ms and 99.99% }
2 3
80% 90%
Number of UEs 20 50
Schedule Proportional Fairness Earliest Deadline First
Table I: Slices Parameters

V Numerical Results

V-a Experiments Setup

(a) The proposed algorithm
(b) Hard-DQN
Figure 3: The SLA ration of convergence process for the proposed algorithm and Hard-DQN algorithm.

In a given area of m, one BS is located at the center with dBm transmission power. RBs with each bandwidth kHz are considered as total bandwidth resources. The pathloss model is consistent with [myWCL2021]. Two slices corresponding to two types of services, i.e. eMBB and uRLLC services, are considered in the simulation. And the detailed slice parameters are summarized in Table I. The values of , and are , and , respectively. The network architecture refers to DQN in [DLMA]. The resource allocated to the common slice is 30 RBs in the initial phase, and the action set for one slice is . Three baseline algorithms are compared in our experiments:

  • Optimal a Priori (OP): Given a priori knowledge of traffic and SINR distributions of UEs, the optimal resource slicing is derived by exhaustive search.

  • Hard-DQN[sun2019mix]: In this algorithm, a purely hard slicing framework using DQN is utilized.

  • NVS[kokku2011nvs]: NVS considers a static weight-based slicing with the assumption that the channel status of each user in the slice is known in priori.

V-B The Analysis of Convergence Process

As Fig. 2 shows, the rewards of the proposed and the Hard-DQN algorithms are low initially and increase with training until they converge to the same level. It can be observed that the proposed algorithm converges slightly faster than the Hard-DQN algorithm. Since the setting of the common slice increases the SLA satisfaction on the exploration phase if compared with purely hard scheme.

Fig. 3(a) and Fig. 3(b) demonstrate the SLA satisfaction ratio of two algorithms in the first 100 epochs when training the agents. Observing Fig. 3(a), the SLA of uRLLC is always guaranteed. The reasons lie in two aspects. First, the packet size of the uRLLC slice is much smaller than the eMBB slice so that the required resource is lesser than the eMBB slice. Second, the shared resource of common slice prevents extreme scenarios, e.g. most RBs are allocated to eMBB slice. Similarly, the SLA of the eMBB slice is guaranteed after about 50 epochs. Compared with this, the uRLLC slice’s SLA of the Hard-DQN algorithm can be guaranteed only after 70 epochs, and the eMBB SLA always fluctuates at the first 100 epochs. Naturally, the isolation degree of two slices of the proposed algorithm cannot approach the required thresholds at the initial phases and the isolation degree of Hard-DQN is always 1. However, it is pointless to discuss isolation when the slices’ SLA cannot be guaranteed. Furthermore, the isolation degree of the proposed algorithm can achieve the required thresholds after 60 epochs as shown in 3(a).

V-C Performance Comparison

Fig. 4 shows the achievable reward of four algorithms after two DQN-based algorithms converge. First, both the proposed algorithm and Hard-DQN can achieve approximately optimal performance. However, the performance of Hard-DQN fluctuates at -th epoch due to a purely hard scheme. Second, the proposed algorithm far outperforms the NVS algorithm. Since NVS considers a static bandwidth provisioning slicing based on the aggregate throughput, it cannot satisfy the demand of mixed SLAs, e.g. latency and reliability metrics.

Figure 4: The rewards of four algorithms

Vi Conclusion

In this paper, we proposed a hard and soft hybrid slicing framework that introduces the common slice setting. A DRL-based solution is carefully designed. The comparison experiments indicate the proposed solution can guarantee slices’ SLA all the time, even in the initial training phase. Moreover, it achieves near-optimal performance in terms of SLA satisfaction, spectrum efficiency and isolation.


This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant 61871262, 62071284, and 61901251, the National Key R&D Program of China grants 2017YFE0121400 and 2019YFE0196600, the Innovation Program of Shanghai Municipal Science and Technology Commission grant 20JC1416400, Pudong New Area Science & Technology Development Fund, Key-Area Research and Development Program of Guangdong Province grant 2020B0101130012, Foshan Science and Technology Innovation Team Project grant FS0AA-KJ919-4402-0060, and research funds from Shanghai Institute for Advanced Communication and Data Science (SICS).