DeepAI
Log In Sign Up

RunPHI: Enabling Mixed-criticality Containers via Partitioning Hypervisors in Industry 4.0

Orchestration systems are becoming a key component to automatically manage distributed computing resources in many fields with criticality requirements like Industry 4.0 (I4.0). However, they are mainly linked to OS-level virtualization, which is known to suffer from reduced isolation. In this paper, we propose RunPHI with the aim of integrating partitioning hypervisors, as a solution for assuring strong isolation, with OS-level orchestration systems. The purpose is to enable container orchestration in mixed-criticality systems with isolation requirements through partitioned containers.

READ FULL TEXT VIEW PDF

page 1

page 2

08/31/2022

PaRTAA: A Real-time Multiprocessor for Mixed-Criticality Airborne Systems

Mixed-criticality systems, where multiple systems with varying criticali...
09/01/2022

Towards Assessing Isolation Properties in Partitioning Hypervisors

Partitioning hypervisor solutions are becoming increasingly popular, to ...
12/13/2021

Virtualizing Mixed-Criticality Systems: A Survey on Industrial Trends and Issues

Virtualization is gaining attraction in the industry as it promises a fl...
09/20/2019

Isolating Real-Time Safety-Critical Embedded Systems via SGX-based Lightweight Virtualization

A promising approach for designing critical embedded systems is based on...
08/30/2022

On Temporal Isolation Assessment in Virtualized Railway Signaling as a Service Systems

Railway signaling systems provide numerous critical functions at differe...
10/21/2021

A Fresh Look at the Architecture and Performance of Contemporary Isolation Platforms

With the ever-increasing pervasiveness of the cloud computing paradigm, ...
08/04/2022

Static Hardware Partitioning on RISC-V – Shortcomings, Limitations, and Prospects

On embedded processors that are increasingly equipped with multiple CPU ...

I Introduction

Nowadays, we are witnessing the spread of Information Technologies in several industrial domains (e.g., railways, avionic, automotive). This transforms industrial scenarios in Edge cloud environments populated by many Industrial Internet of Things (IIoT), looking towards the Industry 4.0 (I4.0) vision [iiot_survey, stavdas2022networked]. Thus, these systems must meet not only mandatory regulatory requirements involving functional safety and control timeliness, but also performance scalability, interoperability, low latency, and reconfigurability through fast and efficient deployment.

Virtualization is an enabling technology for I4.0 since it responds to the needs of reconfiguration, modularity, and consolidation through resource partitioning and multiplexing. It allows the execution of heterogeneous Operating Systems (OS) (real-time and general-purpose) on the same system-on-a-chip (SoC), becoming a prominent way for the industry to realize Mixed-Criticality systems due to the current COVID-19-induced silicon shortage phenomenon [bloomberg_chip_shortage, cinque2021virtualizing, cilardo2021virtualization]. Partitioning hypervisors (e.g., Jailhouse [ramsauer2017look], Bao [martins2020bao], Xtratum [crespo2010partitioned]) have gained the attention of both academia and industry [hermes_project, selene] due to the strong isolation provided through static allocation, at the cost of a reduced flexibility of deployment compared to classical virtualization (recently named consolidating hypervisors).

However, in the I4.0 vision, the stress is on the automatic management, reconfiguration, and self-healing of IT systems. Thus, criticality-aware orchestration systems are paramount since they automatically place, deploy, monitor, and migrate the packaged software across the infrastructure[barletta2022achieving]; still being aware of the isolation guarantees required by critical workloads to prevent interferences in terms of faults and attacks from non-critical jobs. Currently, containers are seamlessly integrated into orchestration systems, but ensuring their isolation is still an open issue, threatening the practicability of OS-level virtualization under strict real-time, safety, and security requirements. Unikernels seem to be a solution since they do not share the underlying host kernel, but their portability issues and real-time support are still open issues [chen2022unikernel].

In this position paper, we propose RunPHI, a framework that integrates partitioning hypervisors into container orchestration systems with the aim of leveraging the strong isolation provided by partitioning solutions while taking advantage of orchestration techniques. This project advances the state of the art since i) it simplifies the deployment of critical workloads in edge/cloud environments, useful for maintenance, upgrades, and new deployments; ii) it enables failure mitigation through migration and spawning of new partitions; iii) it is a driving force for the full reconfigurability of I4.0 for workloads with isolation requirements.

Ii Related Work

In the literature, several solutions (summarized in Table I) adapt general-purpose hypervisors with the aim of providing true isolation between containers, aka sandboxed containers. IBM Nabla111https://nabla-containers.github.io/ builds containers on top of unikernels. Google gVisor222https://gvisor.dev/ creates a dedicated guest kernel to run containers. Amazon Firecracker333https://firecracker-microvm.github.io/ is a lightweight hypervisor for sandbox applications. Both KubeVirt444https://kubevirt.io/ and vSphere Integrated Containers (VIC)555https://vmware.github.io/vic-product/ integrate VMs and containers under a single orchestration infrastructure. Kata Containers666https://katacontainers.io/ allows running secure container runtime with lightweight VMs. RunX777https://github.com/Xilinx/runx uses Xen hypervisor to run containers in multiple separate VMs, either with the provided custom-built Linux-based kernel, or with container-specific kernel/ramdisk.

0.50 Solution Guest Type Used Hypervisor Orchestration Support Nabla Container Unikernel Nabla Tender Docker gVisor Container + user-space kernel KVM Kubernetes, Docker Firecracker Light VM KVM OCI compliant KubeVirt VMs and Containers KVM Kubernetes vSphere Integrated Containers (VIC) VMs and Containers VMware ESXi VMware Orchestrator, Docker KataContainer Light VM QEMU/KVM Kubernetes, Docker RunX Light VM Xen Kubernetes, Docker RunPHI Light VM Partitioning Hypervisors Kubernetes, Docker, OCI compliant

TABLE I: State-of-the-art solutions for partitioned containers.

These solutions are mainly based on general-purpose hypervisors, which do not fit well with mixed-criticality real-time requirements. In contrast, current partitioning hypervisor solutions seem to provide enough guarantees about both safety and security isolation. To the best of our knowledge, there are no solutions that support sandboxed containers in conjunction with partitioning hypervisors. The objective of RunPHI is to provide both isolation requirements and flexible orchestration capabilities for next-generation I4.0 scenarios.

Iii Proposal

Fig. 1: Proposed RunPHI Architecture.

Figure 1 shows a first design of RunPHI. Users provide partition descriptions via classical tools inherited from container-based orchestration (e.g., Dockerfiles, Kubernetes manifests). Partitioned container descriptions can be extended with requirements related to physical resources, criticality levels (e.g., low, mid, or high), real-time constraints, etc. RunPHI leverages a partitioning hypervisor to provide strong isolation between containers. In particular, according to partition descriptions, RunPHI, implemented in the privileged partition, tries to allocate physical resources in line with free resources within the host node. The inference engine fills the gaps with predicted values for resources not specified in the partitioned container description, according to requests from the orchestration platforms and current usage of hardware. RunPHI manages the lifecycle of partitioned containers and is designed to be highly flexible with the aim of orchestrating partitioned containers with a: i) low-level criticality (e.g., partition A in Figure 1) that includes a classical container abstraction with several applications running on top of it, a number of virtual CPUs (vCPUs) with no specific affinity on physical CPUs (pCPUs), without specific real-time guarantees; ii) mid-level criticality (e.g., partition B in Figure 1

) that includes running a RTOS/unikernel single-app with strict temporal and memory isolation requirements (e.g., by using 1-to-1 vCPU-pCPU mapping and cache/RAM coloring mechanisms respectively) and use of GPUs accelerators for running machine learning algorithms; iii)

high-level criticality (e.g., partition C in Figure 1) that includes same mechanism for mid-level criticality with the addition of running bare-metal tasks on real-time CPUs and use of programmable logic blocks like FPGAs.

Iv Research Questions and Objectives

In the following, we delineate research questions to be considered in the next steps of our project.

[title= RQ1. How I4.0 mixed-criticality systems can be deployed via RunPHI? ]0.95

Objectives:

To support partitioned containers run at different criticality with real-time constraints

To support running bare-metal applications

To support accelerator devices like FPGAs and GPUs

To induce minimal overhead in terms of CPU, memory, and I/O

[title= RQ2. How to quantify the isolation between partitioned containers provided by RunPHI? ]0.95

Objectives:

To support temporal, memory, and fault isolation assessment (e.g., fault injection testing)

To support security isolation assessment (e.g., fuzzing)

[title= RQ3. How to support orchestration for partitioned containers in RunPHI? ]0.95

Objectives:

To support different runtime containers and OCI-compliance

To support partitioned containers description via existing runtime containers API and existing tools for configuration file (e.g., Dockerfile, Kubernetes manifest)

To implement an inference engine to determine (sub)optimal resource allocation for partitioned containers

To support migration, checkpointing, and high-availability mechanisms for partitioned containers

Acknowledgment

This work has been supported by the project COSMIC of UNINA DIETI.

References