Towards Organic 6G Networks: Virtualization and Live Migration of Core Network Functions

by   Michael Gundall, et al.

In the context of Industry 4.0, more and more mobile use cases are appearing on industrial factory floors. These use cases place high demands on various quantitative requirements, such as latency, availability, and more. In addition, qualitative requirements such as flexibility are arising. Since virtualization technology is a key enabler for the flexibility that is required by novel use cases and on the way to organic networking as it is addressed by 6G, we investigate container virtualization technology in this paper. We focus on container technology since OS-level virtualization has multiple benefits compared to hardware virtualization, such as VMs. Thus, we discuss several aspects of container based virtualization, e.g. selection of suitable network drivers and orchestration tools, with respect to most important 5GC functions. In addition, the functions have different quantitative or qualitative requirements depending on whether they are stateless or stateful, and whether the specific function is located at either the control or user plane. Therefore, we also analyze the aforementioned live migration concepts for the 5GC functions and evaluate them based on well-defined metrics, such as migration time and process downtime.



page 1

page 2

page 3

page 4


Downtime Optimized Live Migration of Industrial Real-Time Control Services

Live migration of services is a prerequisite for various use cases that ...

Multi-access Edge Computing: The driver behind the wheel of 5G-connected cars

The automotive and telco industries have taken an investment bet on the ...

Link-Level Performance Evaluation of IMT-2020 Candidate Technology: DECT-2020 New Radio

The ETSI has recently introduced the DECT-2020 New Radio (NR) as an IMT-...

Towards Per-user Flexible Management in 5G

Flexible management is one of the key components of next-generation 5G n...

The use of 5G Non-Public Networks to support Industry 4.0 scenarios

The on-going digital transformation is key to progress towards a new gen...

AMMCOA - Nomadic 5G Private Networks

This paper presents ideas and concepts for interconnected off-road vehic...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In the scope of Industry 4.0, more and more mobile use cases appear in industrial factory halls [etfa2018, etfa2021]. These use cases have stringent demands on different requirements, such as latency, availability, and more. Therefore, high performance wireless communications systems are required. Here, mobile radio communications, such as 5G [access2021, Mobilkom2019] and 6G [jiang2021road], can play an important role. Besides the aforementioned quantitative requirements, there are also qualitative requirements that raise novel challenges and opportunities. Examples for these requirements are security, integration possibilities, and flexibility.

Therefore, Fig. 1 shows and exemplary use case that requires both low-latency communication as well as a high flexibility.

Figure 1: Exemplary industrial use case, where a drone moves between factory halls.

If a mobile device, such as a drone, offloads certain algorithms, it is important that this algorithm is executed by an edge server that is located as close as possible to this device. If the drone moves between factory halls or even factories the algorithm has to be processed by another server. Besides the required flexibility on application side, also communication networks have to support this mobility. In order to deliver data packets in time, several network functions have to be deployed close to the mobile device. Here, the so-called nfv comes into place. Together with virtualization technologies, such as os-level virtualization and hardware virtualization it is possible to automatically deploy and run vnf on nearly any device that offers computational resources. Thus, we investigate, whether existing technologies are suitable for the application of nfv for functions of the 5gc in industrial environments.

Therefore, the paper is structured as follows: Sec. 2 gives an overview about related work on this topic, while Sec. 3 presents key technologies for the realization of organic networking. Moreover, Sec. 4 details 5G sba in detail and introduces both chances and challenges given by virtualization and live migration for relevant 5gc functions. Finally, a conclusion is given (Sec. 5).

2 Related Work

In order to achieve the flexibility that is demanded by emerging mobile use cases, virtualization technology can be used, whereas hardware and os-level virtualization are well-known concepts in the it environment. Thus, it has been shown that os-level virtualization using Linux containers is more efficient compared to traditional vm that belong to hardware virtualization [7164727, 10.1145/2851613.2851737, indin2020]. Furthermore, the authors in [10.1145/2851613.2851737, indin2020] investigated the use of os-level virtualization technology for industrial applications. Even if both works are targeted for industrial automation systems, the results can be transferred to vnf of 5gc, since they place comparable requirements.

In order to improve flexibility, 5G applies the sba paradigm. Consequently, the functions are not only service-based but also more fine grained, compared to ealier technologies, such as 4G. Due to this reason, it can be assumed that the application of virtualization technologies to 5gc is advantageous compared 4G, even if there are also approaches for applying xaas to 4G cn [taleb2015ease].

3 Key Technologies for the Realization of Organic Networking

In order to realize organic networking, several technologies, which are well-known in the it, have to be introduced in the communication domain. Therefore, this section introduces related technologies and concepts.

3.1 Container Virtualization

As already mentioned, several works indicate that virtualization using containers is suitable if efficiency and performance of the vnf are important [7164727, 10.1145/2851613.2851737, indin2020]. Here, the network drivers play a central role. However, they differ not only in performance, but also in their networking capabilities and security level, such as network isolation.

Thus, Tab. 1 gives an overview about the standard network drivers of Docker containers regarding rtt, which was measured between containers that were deployed on two different hosts, networking capabilities, and security level.

Netw. Driver RTT [µs] Networking  Security
Host 522 L2 / L3 -
Bridge 600 L2 / L3
Macvlan 520 L2 / L3
Ipvlan (L2) 520 L3
Ipvlan (L3) 539 L3
Overlay 656 (L2)1/L3 +
1 Only valid for L2 overlay network drivers of K8s.
Table 1: Network driver overview [indin2020].

While efficiency and performance, such rtt and overhead, could be most important for several applications [reichardt2021benchmarking], some industrial applications require special networking capabilities, such as l2 support, which means the exchange of Ethernet frames without ip layer (l3). A typical example for this are ie protocols and tsn. Since this feature is not supported by all Docker network drivers by a rule, it is also a selection criteria that should be considered.

3.2 Container Orchestration

If an automated deployment and scaling of a service is required, an orchestration tool, such as Docker Swarm or k8s, is required. Here, it is important to name that they typically bring up additional network drivers that build overlay networks. In case of Docker Swarm, the "Overlay" network driver is not able to transmit l2 packets, while k8s has several l2 overlay network drivers, e.g. multus. However, for Docker Swarm it is possible to use several standard network drivers of Docker also for a scalable service but requires more configuration effort.

Furthermore, both orchestration tools allow to automatically deploy services and to create as much replicas as required. This method can be used for load balancing as well as for the application of fail-over mechanisms. Here, k8s provides more possibilities to create highly individualized and complex service compositions that are called "Deployment". The reason for this is probably the higher industry support


3.3 Live Migration Approaches

The aforementioned service composition can typically only be applied in order to replicate containers that are not state synchronized. On the other hand, if a stateful container should be redeployed, e.g., due to mobility requirements, live migration is a possible method. Thus, the so-called c/r tactic has become widely accepted for the live migration of processes. Here, a process is "frozen" and its current status on the disk is checkpointed. This data can then be transferred to a new target system. There, the process can be restarted at exactly the same point in time at which it was previously frozen. In the last few years, developments have increasingly been moving in the direction of user-space-based methods. These offer the enormous advantage of high transparency in combination with not too invasive intervention in the central components of the operating system.

The Linux Foundation introduced its criu software in 2012 and has since further developed it into a powerful tool for live migration of processes. In the meantime, criu is either integrated into OpenVZ, LXC/LXD, Docker, and Podman or can be used in combination with them without much effort [criu1]. While live migration with criu is already widespread in the area of high-performance computing [hpc1], its use in other application areas has been rather limited so far.

The main focus of research here is on memory transfer, which is indispensable for process migration. In a classical (inter-copy) c/r procedure, which is shown in Fig. 2,

Figure 2: c/r migration with inter-copy memory transfer.

the process is frozen, all data in the memory is completely transferred from one system to another, before the process is restarted. The downtime of the process and the migration time are therefore almost identical. To further minimize the downtime, two primary strategies can be used: pre- and post-copying of the memory.

In the pre-copy tactic (see Fig. 3),

Figure 3: c/r migration with pre-copy memory transfer.

as much data as possible is first transferred to the target system, primarily data that is not expected to be needed in the coming process iterations. Then the process is shut down on the source system, the remaining data is transferred, and the process is restarted on the target system. With the post-copy tactic (see Fig. 4),

Figure 4: c/r migration with post-copy memory transfer.

on the other hand, the process is frozen immediately at the start of the migration process, similar to the inter-copy method. Afterwards, however, only the parts of the memory that are important for the next process iterations are transferred. The remaining parts of the memory are then transferred after the process has already restarted on the target system [reber1].

Both strategies are part of intensive research [performance1, precopy1]. The post-copy strategies in particular increase the risk of a complete process failure if missing data cannot be transferred in time afterwards. The pre-copy strategy brings few advantages in terms of downtime if large parts of the data change in just a few process steps. Both methods require additional precise prediction of future steps.

Therefore, latest approaches go one step further and use the ppm methodology [parallel1, parallel2]. In previous approaches, only one instance of the process was active at a time. Thus, Fig. 5

Figure 5: ppm procedure including handover mechanism.

depicts the idea that the process is already running on the target system and both processes are supplied with the same data. If a migration is triggered, ideally only a very small part of the memory still has to be transferred to the target system. This leads in a considerably reduced downtime. However, there are multiple challenges that lie on the one hand in managing a smooth handover, such as time and state synchronization, and on the other hand in checking that all instances of the processes running in parallel are always supplied with the identical data at the same time.

4 5g sba

This section introduces the 5G sba and discusses the possibilities and challenges of organic networking for most relevant 5gc functions. Therefore, Fig. 6 shows the mandatory components of a 5G system and their corresponding interfaces.

Figure 6: Mandatory components of a 5G network architecture and corresponding interfaces [rommer20195g].

Furthermore, the functions that are explained in the following sections (Sec. 4.1-4.7), can be mapped to either the user plane or control plane. While user plane traffic is most important for end user applications, control plane contains the relevant functions for a suitable operation of the 5G system. Therefore, a decrease in qos in the user plane has a direct impact in end user applications, while performance variations in the control plane do not necessarily affect the end user application.

4.1 upf

The main task of the upf, which is located in the user plane, is the processing and forwarding of user data, with the smf controlling its functionality. This implies that the upf can be considered stateless, but has high demands on latency and availability, since a failure would cause a direct loss of connectivity for end users. It connects to external ip networks serving as an anchor for the ue towards the external network, hiding the mobility. As a result, ip packets with a destination address belonging to a ue are always routed from the Internet to the specific upf serving that device, regardless of whether the device is moving around the network. The upf generates records of charging data and traffic usage which can be sent to the smf. It also performs packet inspections that can be used for applying configured policies, gating, redirecting traffic, and applying data rate limits. In addition, it can also apply qos polices in the downlink direction. Additionally, 5G systems allow not only the possibility for ip based pdu Sessions, but support also Ethernet pdu Session type [rommer20195g, 3gpp.23.501].

Since the upf is stateless, live migration is not required. However, it is suitable to use virtualization technology in order to automatically deploy and restart upf on each targeted hardware node. Moreover, multiple instances of upf can be deployed on one device, e.g., to apply redundancy or load balancing mechanisms. Since k8s has benefits regarding deployment policies, this orchestration tool can be the preferred option for this function. However, if all pdu Session types should be supported, the standard network driver of k8s cannot be used and a specialized third party network driver is required, in order to transmit l2 data packets. Alternatively Docker Swarm in combination with one of the standard Docker network drivers could be an appropriate solution.

4.2 smf

The smf, which is part of control plane, is mainly responsible for the management of the end user sessions. The main tasks are creating, updating and deleting pdu Sessions, and managing the session context with upf. It communicates indirectly with end user devices through the amf, which forwards session-related messages between the devices and the smf. Separating other control plane functions from the user plane, the smf takes over some of the functions previously performed by the MME and assumes the role of DHCP server and ip address management system. Additionally, the smf plays a crucial role in the charging-related function within the network. By collecting its own charging data, it manages the charging functions of the upf.

As already indicated, the smf is stateful. Thus, live migration approaches should be applied if this function should be redeployed on a different hardware node. This can be required, e.g., if the hardware node is more close to the ue, and very fast and dynamic reconfigurations of the corresponding upf are required, as it is the case for mobile devices that have high demands on latency and are covering a wide serving area. If a high service availability should be guaranteed, pre-copy c/r migration or ppm are suitable live migration approaches.

4.3 amf

The amf is responsible for the interaction between the ng-ran via the N2 interface as well as with the interaction between ue via the N1 interface. The amf is part of most signaling call flows in a 5G network, providing support for encrypted signaling connections to devices in order to register, authenticate, and switch between different radio cells in the network. It is also responsible for paging ue in the idle state. The amf relays all session management related signals between amf and ue, which is different from the 4G cn architecture. A further difference consists in the fact that amf itself does not perform authentication, but orders it as a service from ausf [3gpp.23.501].

Due to the fact that all control layer data flows between ue and 5gc as well as ng-ran and 5gc are forwarded by the amf to other nf, e.g., smf, the requirements on service availability are even higher compared to smf. Therefore, the application of ppm can be the preferred live migration approach.

4.4 ausf

The ausf functions are rather limited, but very important. It provides the authentication service of a specific ue using the authentication data created by udm, as well providing services that allow secure updating of roaming information and other parameters in the ue.

Since the ausf is highly security relevant, it should not be compromised by an attacker. Therefore, both network and guest/host isolation should be high for this function. Here, overlay networks can be superior compared to other network drivers. Since a service outage would only prevent novel devices to join the network, no special needs for latency and service availability are required. Thus, inter-copy migration is the best option for live migration, since it minimizes the migration time and overhead of the process, because all data has only to be send once. However, the cases where a live migration of the ausf is required seems quite limited.

4.5 udm

The udm manages data for access authorization, data network profiles, and user registration, all of which are managed by the smf. In addition, access is authorized for specific users based on subscription data. For instance, for roaming subscribers and home subscribers, this could mean that different access rules apply. udm can be stateful or stateless [3gpp.29.503]. In case of a stateful version, data is stored locally, whereas a stateless version stores the data externally in the udr. With a stateful architecture, data is shared between services that manage the communication between network layers. The disadvantage is that in case of a problem, all services that are sharing information must be taken down from the network at once. With a stateless architecture, subscriber data is kept separate from the functions that support it. This provides more stability and flexibility because database access is separate from the operational network, but also prevents the same information from being updated at the same time by multiple nodes, which can cause delays in the network. With more than one instance of amf and smf in the network, the udm keeps track of which instance is serving a particular device.

In case of the stateful version it is most important that the states are transferred correctly. Since a small service downtime should not cause direct loss of connectivity, traditional inter-copy c/r migration is sufficient. Additionally, no synchronization error or similar could occur. In the stateless version, either k8s or Docker Swarm orchestration tool can be used, since no special needs on networking performance or capabilities are given. However, in this case, the udr is stateful, and inter-copy c/r migration can be applied for this function.

4.6 udr

The udr is the central storage where the structured data is stored. For instance, the udm can store and retrieve subscriber data such as access and mobility data or network slice selection data. Equally, the pcf can store policy-related data or the nef can store structured data for exposure and application data. Multiple udr systems may be deployed in the network, each taking different data sets or subsets, or serving different nf.

4.7 nrf

The nrf is one of the most important components of the 5G architecture. It provides a single record of all nf, along with the services provided by each element that can be instantiated, scaled and terminated without or minimal manual intervention in the operator’s network.

The nrf places equal demands on virtualization and live migration as udm/udr. However, the migration time and the corresponding downtime might be higher, dependent on its size and the data amount that has to be transferred. In this case, it has to be carried out if either process downtime or migration time should be minimized. If the migration time is most important, C/R migration with inter-copy memory transfer can be used. Otherwise, pre-copy c/r or ppm is beneficial.

5 Conclusion

In this paper, we investigated key technologies that are required by organic networking that is targeted by 6g. Therefore, we proposed the recent state of research for both virtualization and live migration technologies. Additionally, we introduced most important 5gc functions and analyzed them based on latency and availability requirements.