Log In Sign Up

On Byzantine Fault Tolerance in Multi-Master Kubernertes Clusters

by   Gor Mack Diouf, et al.

Docker container virtualization technology is being widely adopted in cloud computing environments because of its lightweight and effiency. However, it requires adequate control and management via an orchestrator. As a result, cloud providers are adopting the open-access Kubernetes platform as the standard orchestrator of containerized applications. To ensure applications' availability in Kubernetes, the latter uses Raft protocol's replication mechanism. Despite its simplicity, Raft assumes that machines fail only when shutdown. This failure event is rarely the only reason for a machine's malfunction. Indeed, software errors or malicious attacks can cause machines to exhibit Byzantine (i.e. random) behavior and thereby corrupt the accuracy and availability of the replication protocol. In this paper, we propose a Kubernetes multi-Master Robust (KmMR) platform to overcome this limitation. KmMR is based on the adaptation and integration of the BFT-SMaRt fault-tolerant replication protocol into Kubernetes environment. Unlike Raft protocol, BFT-SMaRt is resistant to both Byzantine and non-Byzantine faults. Experimental results show that KmMR is able to guarantee the continuity of services, even when the total number of tolerated faults is exceeded. In addition, KmMR provides on average a consensus time 1000 times shorter than that achieved by the conventional platform (with Raft), in such condition. Finally, we show that KmMR generates a small additional cost in terms of resource consumption compared to the conventional platform.


page 17

page 22


Strengthened Fault Tolerance in Byzantine Fault Tolerant Replication

Byzantine fault tolerant (BFT) state machine replication (SMR) is an imp...

Automatic Integration of BFT State-Machine Replication into IoT Systems

Byzantine fault tolerance (BFT) can preserve the availability and integr...

ROS Rescue : Fault Tolerance System for Robot Operating System

In this chapter we discuss the problem of master failure in ROS1.0 and i...

Enhancing Byzantine fault tolerance using MD5 checksum and delay variation in Cloud services

Cloud computing management are beyond typical human narratives. However ...

Behind the Last Line of Defense – Surviving SoC Faults and Intrusions

Today, leveraging the enormous modular power, diversity and flexibility ...

Hardened Paxos Through Consistency Validation

Due to the emergent adoption of distributed systems when building applic...

Scheduling and Checkpointing optimization algorithm for Byzantine fault tolerance in Cloud Clusters

Among those faults Byzantine faults offers serious challenge to fault to...

1 Introduction

Faced with the continuous increase in capital expenditure (CAPEX) and operating expenditure (OPEX) costs of fully reliable and available Information Technology (IT) systems, companies tend towards outsourcing their IT services to specialized companies such as Cloud Service Providers (CSP). The main advantage of this strategy is to claim an excellent service quality while paying only for the necessary and consumed resources. As for the CSP, its purpose is to meet the needs of clients by providing the required resources when demanded. A common approach of the CSP is to pool (or slice) its resources to share them between several clients. In this context, many challenges emerge to provide a reliable cloud environment, e.g. quality-of-service (QoS) guarantee, resources management, and service continuity.

Docker virtualization has gained popularity among CSPs, since it allows to address performance issues such as inefficient use of resources bernstein2014containers ; peinl2016docker . The resource allocation unit in the cloud is the Docker container. Given the need to manage Docker containers, CSPs such as Google, Docker, Mesosphere, Microsoft, Vmware, IBM and Oracle have set up the standardization group Cloud Native Computing Foundation (CNCF) cncf . CNCF adopts Kubernetes as the standard platform to orchestrate containerized applications sill2015emerging ; burns2016borg ; k8s . Kubernetes is a Google open project advocating the vision of a modular, customizable and therefore scalable orchestration platform bernstein2014containers . It uses the Raft protocol to replicate the states between its machines and ensure the availability of hosted applications oliveira2016evaluating ; ongaro2014search . In spite of its simplicity and rapidity in the replication process, Raft protocol has major limitations when it comes to machines’ failure. Indeed, Raft can only detect and correctly deal with shutdown events of machines. In other words, if a machine experiences a Byzantine (random) behavior, Raft is unable to guarantee service continuity.

Being conscious of the risks of software errors and malicious attacks that can push a machine into a Byzantine behavior, we propose in this paper the adaptation and integration of the BFT-SMaRt fault-tolerant replication protocol into Kubernetes. By doing so, we expect our proposed platform to resist to any type of faults while guaranteeing service continuity. To the best of our knowledge, this is the first work that proposes a Kubernetetes platform tolerant to Byzantine and non-Byzantine faults.

The main contributions of this paper are summarized as follows:

  1. We present an overview of Docker virtualization, Kubernetes platform, fault-tolerance within this platform and its limits.

  2. We propose the Kubernetes multi-Master Robust (KmMR) platform, a platform tolerant to Byzantine and non-Byzantine faults. KmMR is based on the integration of BFT-SMaRt into Kubernetes environment.

  3. We propose an efficient method to adapt and integrate the replication protocol BFT-SMaRt (written in Java) into Kubernetes (written in Golang).

  4. We implement the proposed KmMR solution in an OpenStack-based cloud environment, evaluate its performances and compare it to the conventional platform, called Kubernetetes multi-Master Conventional (KmMC). Comparison is realized through experiments in non-Byzantine and Byzatine environments, where both crash and Distributed Denial-of-Service (DDoS) attacks are performed to destabilize the machines and corrupt their replication process. The obtained results confirm the effectiveness and robustness of KmMR.

The rest of the paper is organized as follows. Section II describes Docker containerization technology. In section III, the Docker containers orchestration platform Kubernetes is explained. Whereas, section IV discusses fault-tolerance in Kubernetes. Section V presents our KmMR platform. Experimental evaluation and results are discussed in Section VI. Finally, section VII closes the paper.

2 Background

In this section, we present an overview of Docker container technology and its orchestration.

2.1 Docker Containers

Container virtualization (also known as containerization) relies directly on kernel functionalities to create isolated virtual environments (VE). VEs are called containers, while the features provided by the operating system (OS) kernel are named namespaces and control groups (cgroups) moga2016level . The namespaces control and limit the amount of resources used for a process, while the cgroups manage the resources of a process group. Hence, a container provides the resources needed to run applications as if they were the only processes running in the host machine’s OS. Docker is the most modern incarnation of containerization.

Parameters Virtual Machines Docker Containers
Operating System (OS) Every VM virtualizes the host material and loads its own OS No container emulates host material. Host OS is used.
Communication Through Ethernet peripherals Through Inter-Process Communication (IPC) standard mechanisms, e.g. sockets, pipes, shared memory, etc.
Resources Usage (CPU and RAM) High Quasi-native
Startup time Few minutes Few seconds
Storage High requirement for OS and associated software installation and execution Low since host OS is used
Isolation Libraries and files’ sharing among VMs is impossible Libraries and files can be seamlessly mounted and shared
Security Depends on the Hypervisor’s configuration Requires access control
Table 1: Characteristics of Virtualization Technologies

Indeed, traditional virtualization uses a hypervisor to create new virtual machines (VMs) and provide isolation between them. However, Docker containerization requires only the installation of the Docker software on the OS’s kernel of the host machine. Nevertheless, both are autonomous systems that use a higher system (that of the host machine) to perform their tasks. The difference is that VMs must contain a whole OS (a guest OS), while the containers use the host machine’s OS.

In Table 1, we summarize the characteristics of VMs and Docker containers. According to it, the container can be created and destroyed almost in real time and thus introduces a negligible task overload with respect to the host machine’s resources use felter2015updated ; 7164727 ; sharma2016containers . Compared to VMs, containers are advantageous in terms of network management, boot speed, deployment/migration flexibility, and resources use (RAM, storage, etc.) joy2015performance . However, they suffer from the weak isolation of the host machine. Indeed, if a Docker container is compromised, then an attacker can get full access to the operating system of the host machine 7092943 ; manu2016docker . Consequently, there is an urgent need for a robust and secured environment for Docker containers. Moreover, Docker is unable on its own to deploy containers on distributed machines and ensure their interaction 7092943 . In this matter, an orchestration mechanism is needed to manage Docker containers in distributed systems.

2.2 Containers Orchestration

Handling a few Docker containers on one machine is an easy task. However, when it comes to moving these containers into production on a set of distributed hosts, many questions arise. Indeed, driven by providing availability, scaling, and networking, an integration and management tool is required not only for defining initial container deployment, but also for managing multiple containers as one entity. Clearly, handling everything manually is not conceivable because it would be very difficult to ensure the viability, maintenance and sustainability of the system. Thus, the process of deploying multiple containers to implement an application can be optimized through automation, especially at scale. This type of automation is referred to as orchestration and includes features like work nodes’ location determination, load balancing, inter-container communication, service discovery, updates, migrations, scaling up, and especially tolerance to malfunctions.

Several orchestrators have been proposed and implemented. Examples include Fleet fleet , Mesos mesos , Swarm luzzardiswarm and Kubernetes kubernetes . Nevertheless, in the remainder of this paper, we are interested in Kubernetes only. The latter is a stable and free solution that can automate the deployment, migration, monitoring, networking, scalability, and availability of applications, based on Docker containers technology peinl2016docker ; burns2016borg .

3 Kubernetes: An Open-Access Orchestrator of Docker Containers

Kubernetes, abbreviated K8s, is a project initiated by Google in 2014 when it saw the advantages of Docker containers over traditional virtualization. The Kubernetes Orchestrator automates the deployment and management of large-scale containerized applications. Its platform runs and coordinates containers on sets of physical and/or virtual machines. Kubernetes is designed to fully manage the life cycle of containerized applications, using predictability, extensibility, and high availability methods kratzke2017understanding .

Figure 1: Architecture of Kubernetes (ex: one master node and one work node)

3.1 Kubernetes Architecture

Kubernetes architecture is based on the master/slave model bila2017leveraging . It consists of a cluster of one master node and several work nodes, called , as shown in Fig. 1. Their roles are given as follows:
Kubernetes master: This node is responsible of the overall management and availability of the Kubernetes cluster. Its components, i.e. the API server, controller and scheduler, support the interaction, monitoring and scheduling tasks within the cluster. The API server provides the interface to the shared state of the cluster through which the other components, e.g. work nodes, interact. The controller monitors the shared state of the cluster through the API server and makes decisions to bring the cluster back from an unstable state to a stable one. The scheduler manages the cluster load. It takes into account individual and collective resource requirements, QoS requirements, hardware/software constraints, policies, etc. The Kubernetes cluster data is stored in a database, e.g. etcd etcd , whereas cluster administration is at the master level via the K8s command-line interface . The latter stores its configuration and authentication information to access the API server in the file.
Kubernetes minions: Containerized applications run on these nodes. On one hand, the client nodes communicate with the work node via their through the master node. The receives commands from the master node and executes them through its Docker engine. It also reports the state of the work node to the API server. On the other hand, the kube-proxy runs on each work node to manage clients’ access to deployed services. Each service is compiled into one or many Pods. A Pod is a logical set of one or several containers. This is the smallest unit that can be programmed as a deployment in Kubernetes. Containers in the same Pod share resources such as storage capacity, IP address, etc.

3.2 Pods Instantiation

In Kubernetes, the placement of Pods is realized following a specific strategy. In fact, considering a Kubernetes cluster consisting of a master node and a finite set of minions , a pod asking for CPU cycles, RAM, a specific communication port and a storage capacity, needs to be deployed within the cluster. To select the on which the pod will be instantiated, the K8s master node proceeds in two steps: 1) it filters the minions. Then, 2) it ranks the remaining minions to determine the best one suited for the pod. These two steps are detailed as follows:
Filtering: In this operation, nodes without required resources are removed. Kubernetes uses multiple predicates to perform filtering, including:

  • PodFitsResources: does the node have enough resources (CPU and RAM) to accommodate the pod?

  • PodFitsHostPorts: is the node able to run the pod via the port without conflicts?

  • NoVolumeZoneConflict: does the node have the amount of storage that the pod requests?

  • MatchNodeSelector: does the node match the parameters of the selector query defined in the pod description?

These predicates can be combined to set up sophisticated filters.
Ranking: After filtering, Kubernetes uses priority functions to determine the best among the nodes able to host the pod. A priority function assigns a score between 0 and 10 where 0 is the least preferred and 10 is the most preferred node. Each priority function is weighted by a positive number and the final score is the sum of the weighted scores. The main priority functions that can be activated in Kubernetes are:

  • BalancedResourceAllocation: it aims at balancing the charge. Indeed, it places the pod in a node in a way that the resource utilization rate (CPU and RAM) is balanced among the minions.

  • LeastRequestedPriority: it favors the node that has most resources available.

  • CalculateSpreadPriority: it minimizes the number of pods belonging to the same service on the same node.

  • CalculateAntiAffinityPriority: it minimizes the number of pods belonging to the same service on nodes sharing a particular attribute or label.

  • CalculateNodeLabelPriority: it favors nodes with a specific label.

Once the final scores of all nodes are calculated, the having the highest score is selecte to instantiate the pod. If there is more than one that has the highest score, the master node selects one of them randomly.

4 Fault Tolerance in Kubernetes

In this section, we explain the fault tolerance mechanism in Kubernetes. We start by a brief description of faults. Next, we present the associated consensus problem. Finally, the built-in fault tolerance protocol “Raft" is detailed.

4.1 Background

The robustness of a system refers to its ability to continue functioning when part of the system fails cristian1991understanding

. A system fails when the outputs are no longer conform to the original specification. The occurrence of a failure can be: 1) transient, i.e. appears, disappears and never occur again, 2) intermittent, i.e. reproducible in a given context and 3) persistent, i.e. appears until repair. A non-faulty (non-failing) node or process is called correct when it follows its specifications. Whereas, a faulty node/process may stop or exhibits a random behavior. In general, failures/faults may be caused by software defects, malicious attacks, or human-machine interaction errors. In distributed systems orchestrated by Kubernetes, faults may occur at the master node or minions. They can be classified into two categories:

  1. Fail-stop faults: They are characterized by the complete activity’s stop (or crash) of a node. This state is perceived by others as the absence of expected messages until the eventual application’s termination. A system that is able to detect only these faults considers that a node/process can be in one of two states, either it works and gives the correct result, or it does nothing.

  2. Byzantine faults: Byzantine faults are characterized by any behavior deviating from the node/process’s specifications and producing non-conform results lamport1982byzantine . We distinguish between natural Byzantine faults, such as undetected physical errors on messages’ transmissions, memory and instructions, and malicious Byzantine faults, designed to defeat the system, such as viruses, worms and sabotage instructions.

In large and/or uncontrolled systems, the risk of faults is high and shall be mitigated to ensure service continuity. One way to realize it is to use the State Machine Replication (SMR) mechanism aublin2014vers . The latter consists of using multiple copies of a system, implemented as a state machine, to tolerate faults and keep the system’s availability. Each copy of the system, called a replica, is placed on a different node schneider1990implementing . SMR allows a set of nodes to execute the same instruction sequences on each request sent by a client. There are two approaches to execute requests: 1) active replication, where all nodes execute requests, update their state machines, and respond to clients. And 2) passive replication, where only one node, called leader, executes the requests and forwards state machine changes to other nodes, then responds to clients.
To avoid inconsistency in replication, nodes/replicas need to be sure that their state machines are identical before responding to clients. The following section describes this state machine replication problem, called the Consensus problem.

4.2 Consensus Problem

The Consensus is a fundamental condition in fault-tolerant distributed systems. It consists of tuning replicas’ values to the same one, proposed by one of the nodes. The Consensus problem can be formulated as follows: We assume a system composed of a set of replicated nodes, and that at most only nodes can fail, where . Let be a subset of nodes. The consensus problem consists of finding a protocol that allows the following:

  1. Any node can propose a replica’s value to the other nodes.

  2. When all nodes agree on the same value, a consensus is achieved.

Without loss of generality, protocols that satisfy these conditions, possess four properties pease1980reaching :

  1. Termination: Each correct node eventually decides a value.

  2. Validity: The decided value has been proposed by one or many other nodes.

  3. Integrity: The decision is unique and final.

  4. Agreement: Two correct nodes cannot decide different values.

According to schneider1990implementing , any protocol that verifies the following safety and liveness conditions has the previous four properties:

  1. Safety: All the correct replicas execute the requests they receive in the same order.

  2. Liveness: Each request is correctly executed by correct nodes.

Such a protocol is commonly referred as consensus/replication protocol. Its decisions are based on exchanged messages between all or a part of the nodes in the system. Indeed, a consensus is achieved if the quorom, defined as the minimum number of correct nodes required to build the consensus, participate in the consensus process. The quorum depends on the size of the system and the maximum number of tolerated faults.
Two fault-tolerant classes of replication protocols exist. In the first, called Non-Byzantine, nodes fail only when they stop functioning. For nodes, at most crash faults can be tolerated. Examples of non-Byzantine protocols include Raft ongaro2014search , Paxos lamport2001paxos , and Zab van2015vive . In the second class, called Byzantine, any type of failures can be tolerated. However, they typically tolerate only faults bracha1985asynchronous . As Byzantine protocols examples, we can cite Pratical Byzantine Fault-Tolerance (PBFT) castro2002practical , Efficient Byzantin Fault-Tolerance (EBFT) veronese2013efficient , UpRight clement2009upright , Primeamir2011prime , and Byzantin Fault-Tolerance State Machine Replication (BFT-SMaRt) sousa2013state ; sousa2018byzantine .

4.3 Built-in Fault Tolerance in Kubernetes: Raft Protocol

Raft is the replication protocol built into Kubernetes ongaro2014search ; raftsite . Basically, it ensures that the replicas maintain identical state machines, while tolerating only crash faults. It is based on passive replication, where a node may be leader, follower or candidate, as illustrated in Fig. 2:

  • Leader: In a cluster, a single active node directs the communication, by receiving requests, processing them, forwarding state machine changes to other nodes, and responding to clients.

  • Follower: When a leader is active, all other nodes are set as followers. They wait for the changes sent by the leader to update their state machines.

  • Candidate: When the leader breaks down, the followers become candidates and trigger votes to elect a new leader.

Figure 2: Raft Protocol’s Election Process

The mandate of a leader lasts from its election until its breakdown. In order to organize elections, Raft assigns an index to each mandate. These indexes are called terms. Any leader or candidate node includes the term index in its messages. Whereas, a follower needs to wait for a random time, typically between and ms, before transiting into candidate. An active leader periodically sends heartbeat messages () to all nodes in the cluster. Any node receiving this message resets its wait time to a random value. Otherwise, at the expiration of its wait timer, the follower changes status to candidate and triggers a new election. The candidate proceeds as follows: 1) Increments its current term number, 2) votes for itself, and 3) sends vote request () to all other nodes. The latter vote for the request containing a term index greater than theirs, update their term index and return to the follower status. Once a candidate receives the votes of the majority, defined as votes, it becomes the new leader. However, if no candidate obtains the majority of votes, e.g. in a tie situation, no leader is elected in this term, and a new term will be triggered by the node that sees its timer expiring first. The requirement for a majority of votes ensures that a single leader is elected in a term (Safety condition), while the wait time of followers guarantees that a leader will eventually be elected (Liveliness condition).

To run in Kubernetes environment, some changes have been made to Raft protocol:

  1. Unlike the conventional Raft, where requests to followers are redirected to the leader, Raft is converted to active replication to be conform to the load balancing property of Kubernetes raftsite .

  2. Raft is re-implemented in Golang, the same programming language used to develop Kubernetes and Docker containers.

Besides Raft, another non-Byzantine replication protocol, called was proposed for Kuberenetes netto2017state . This protocol is similar to Raft, but requires sharing the master node’s memory to all instanciated containers in work nodes, in order to store their state machines. This approach allows to achieve shorter consensus times than Raft, but aggravates the containers’ isolation issue.

Despite their simplicity, Raft and are particularly powerless against Byzantine behaviors lim2014scalable . Indeed, a failing node may not stop, and adopts continually a Byzantine (random) behavior, e.g. not following the protocol, corrupting its local state, or producing incorrect or inconsistent outputs schneider1990implementing . To mitigate this problem, we propose in the next section a novel Kubernetes platform, where both non-Byzantine and Byzantine faults can be tolerated, while ensuring service continuity.

5 KmMR: A K8s multi-Master Robust Platform

Kubernetes allows to deploy and orchestrate groups of containers with a single master node. The latter replicates the containers on different work nodes to provide service continuity. However, if the master node fails, containers are no longer available and all management data is lost. To avoid such case, the deployment of multi-master clusters, where several master nodes cooperate, becomes necessary.But, duplicating master nodes only does not provide complete fault tolerance perronne2016vers . In fact, it must be associated with a replication protocol to ensure consistency between the master nodes states. Multi-master systems are important for critical applications such as telecommunication and energy services, where the continuous availability of services is required 24 hours a day, and 7 days a week.

In this section, we propose to create a resistant Kubernetes multi-master platform to all kinds of faults, in order to guarantee service continuity. We consider a Kubernetes cluster consisting of replicated K8s master nodes and work nodes. Work nodes process clients’ service requests and send their reports (requests) to the master nodes, as shown in Fig. 3. We assume that communications between nodes may experience important delays, thus causing communication failures.

Figure 3: System Model
Figure 4: Consensus Process by BFT-SMaRt

5.1 BFT-SMaRt: Replication Protocol for KmMR

Among the known Byzantine protocols, only PBFT castro2002practical , UpRight clement2009upright and BFT-SMaRt sousa2013state implement a Byzantine fault-tolerant replication system. The choice of BFT-SMaRt is motivated by the following:

  • BFT-SMaRt is very well suited for modern hardware, e.g. multi-core systems, unlike other protocols such as PBFT.

  • BFT-SMaRt outperforms other protocols, e.g. UpRight, in terms of consensus time, defined as the required time to process a client’s request.

  • BFT-SMaRt guarantees a high accuracy in replicated data, when a Byzantine faulty behavior is exhibited within the system.

  • Unlike other Byzantine protocols, BFT-SMaRt supports reconfiguration of the replica sets, i.e. addition and removal of nodes bessani2013efficiency .

In BFT-SMaRt, a consensus is established according to the following steps, as illustrated in Fig. 4. First, a work node broadcasts its request to master nodes, who trigger the execution of the consensus protocol. Each instance of the consensus begins with the leader master node proposing to other nodes a batch of requests in the PROPOSE message. Master nodes validate the authenticity of the PROPOSE message and its content. If valid, they register the proposed batch and broadcast WRITE messages with cryptographic hashes of the proposed batch, to all other nodes. If a master node receives WRITE messages with the same hash, it sends an ACCEPT message to all other nodes. This message contains its decision batch for the consensus instance. If the leader master node is not correct, a new election must be triggered, and all nodes need to converge to the same execution by consensus. This procedure is described in detail in sousa2012byzantine .

Figure 5: Integration Methodology of BFT-SMaRt into Kubernetes

5.2 Proposed Integration Methodology of BFT-SMaRt into K8s

The BFT-SMaRt protocol is implemented in Java, an object-oriented programming language, while Kubernetes and the Docker engine are written in Golang, a service-oriented programming language golang . In order to integrate BFT-SMaRt into Kubernetes, two options can be considered:

  1. Rewrite all BFT-SMaRt library’s source code in Golang.

  2. Wrap the BFT-SMaRt library in a Docker container.

Unlike Raft, with a source code less than 3000 lines and easily rewrited in Golang, BFT-SMaRt source code is larger and more complex, with approximately 100 files and a total of 13500 lines of Java code. Consequently, the second option is more likely to be realizable. This choice is supported by the advantages offered by Docker. Indeed, Docker containers run fast and their introduced overhead is negligible felter2015updated ; joy2015performance . The proposed procedure to integrate BFT-SMaRt into Kubernetes is illustrated in Fig. 5. First, we recover the library BFT-SMaRt and all its dependencies from Github BFT-SMaRtlibrary . Then, we customize it by setting the parameters of the master nodes. Next, we create our Docker file Dockerfile, as detailed in Fig. 6. Afterwards, we execute Dockerfile to produce the BFT-SMaRt containerized image. Finlly, we instantiate in each K8s master node the Docker image with its information.

Figure 6: Dockerfile to Create the BFT-SMaRt Container

6 Experimental Evaluation

6.1 Simulation Settings

We implemented the KmMR solution in an OpenStack cloud environment provided by Ericsson Canada sefraoui2012openstack . The available resources are as follows: 50 GB of RAM and 20 virtual processors (VCPU), usable on a maximum of 10 machines.

The experiment is carried out on clusters composed of several Kubernetes master nodes ( and ), connected to each other via the OpenStack GigabitEthernet network and accessible from the Internet. Each node is a virtual machine equipped with the Ubuntu server 18.04 TLS 64-bit OS, a dual-core i7 CPUs (VCPU) clocked at 2.4 GHz, 4 GB of RAM and 20 GB storage capacity. The Docker engine 18.05.0-ce is installed on Kubernetes nodes for container instantiation needs. We deployed Kubernetes 1.11.0 to orchestrate the Docker containers. The master Kubernetes role kubeadm has been enabled on all master nodes (multi-master configuration). The remaining machines are used to act as work nodes and DDoS attackers. BFT-SMaRt has been containerized and integrated into the master nodes to provide coordination and consensus. Work nodes send their requests in closed loop, i.e. they wait for the response of a request before sending a new one, as defined in schroeder2006open .

In the cluster, we initialize the replication protocol on master nodes. Then, two work nodes broadcast their requests. Upon request reception, master nodes exchange messages to build the consensus. To measure the performance of KmMR, we used the micro-benchmark where both request and response messages are empty castro2002practical .

In order to model a Byzantine behavior, we generate DDoS attacks using Hping3 command hping3 ; ops2016denial . DDoS attacking machines target simultaneously a single master node. Each attacker sends successively and continuously requests of size 65495 bytes in open loop, i.e. without waiting for responses, through the command Hping3 -f IP address of targeted master node -d 65495.
We evaluate the performance of our solution and compare it to the Kubernetetes multi-Master Conventional (KmMC) platform, where non-Byzantine replication protocol Raft is used. Two scenarios are considered for our experiments:

  • Scenario 1: In this scenario, we consider a Kubernetes platform where, initially, the number of (crash) faults in the cluster is lower than the maximum number of faults tolerated by the replication protocol in place. This corresponds to and for KmMC and KmMR respectively. Then, we perform a DDoS attack on one master node, and evaluate the consensus times for each platform.

  • Scenario 2: Unlike Scenario 1, the initial number of (crash) faults is set to be the maximum that can be tolerated by the used replication protocol. Then, DDoS attacks are performed on one master node. In this scenario, we evaluate established consensus times as well as resources consumption by the DDoS victim (CPU, RAM, and available communication Bandwidth). Resources are measured using commands IPerf3 for Bandwidth, and top for CPU and RAM iperf3 ; top .

6.2 Results and Discussions

DDoS attack rate 5 K8s Master Nodes 7 K8s Master Nodes 5 K8s Master Nodes 7 K8s Master Nodes
0 1701.91 2048.25 2746.45 3161.83
2 2004.38 2132.93 2940.87 3179.45
4 2178.72 2471.39 3362.42 4521.79
4.5 2201.37 2501.73 3525.17 4632.38
5 2287.65 2623.87 3612.93 4729.98
5.5 2304.12 2702.99 3867.32 4970.93
6 2331.12 2732.25 4053.53 4970.93
Table 2: Consensus Times (sec) versus DDOS Attack Rate () (Scenario 1)

Considering Scenario 1, we present in Table 2 the achieved consensus times versus DDoS attack rate of KmMC and KmMR, for a cluster of 5 and 7 master nodes respectively. For both platforms, consensus times increase slightly and proportionally to DDoS attack rates. Indeed, even with the additional Byzantine fault, and respect the maximum number of tolerated faults111Notice that KmMC sees the DDoS attack as a crash event in this case.. Hence, platforms’ operation continue without significant degradation. However, KmMC realizes shorter consensus times than KmMR. This is expected, since the replication protocol Raft is designed with few consensus message exchanges between master nodes, compared to BFT-SMaRt. Finally, we conclude that it is recommended to select the KmMC platform if the risk of exceeding the maximum number of faults, dictated by Raft, is very low.

Figure 7: Consensus time versus DDOS attack rate (Scenario 2, )

For Scenario 2, we present in Fig. 7

the consensus time versus DDoS attack rate, for a cluster of 5 master nodes. The results show that the consensus time increases with DDoS attack rate. When the attack rate is below 4.25 Gbps, KmMC provides a slightly better performance than KmMR. Indeed, in this case, the DDoS victim resists to the attack thanks to its sufficient resources. However, for an attack rate above 4.25 Gbps, KmMC deteriorates rapidly and significantly. This is mainly due to the vulnerability of Raft replication protocol in front of Byzantine faults. Indeed, the DDoS victim would behave improperly, e.g. not responding to other nodes in a timely manner. Thus, from this moment, Raft triggers changes in the cluster’s leadership since it is no longer able to reach a consensus with its current

leader. This triggering considerably slows down consensus in the KmMC platform. Meanwhile, KmMR resists to all DDoS attacks, and is able to achieve consensus time 1000 times better than KmMC in average.

Figure 8: Consensus time versus DDOS attack rate (Scenario 2, )

Fig. 8 illustrates the consensus time versus DDoS attack rate in the same environment as Fig. 7, but for a cluster of 7 master nodes. The same behavior is exhibited for and master nodes. However, for , consensus is established faster thanks to the smaller number of exchanged messages. As increases, KmMC becomes more susceptible to DDoS attacks. Indeed, the rapid degradation of KmMC’s performance starts at attack rate 4.1 Gbps for , compared to 4.25 Gbps for . Whereas, KmMR is able to establish consensus in a reasonable time, even for high attack rates.

Figure 9: CPU consumption rate versus DDOS attack rate (Scenario 2, )
Figure 10: RAM consumption versus DDOS attack rate (Scenario 2, )
Figure 11: Available Bandwidth versus DDOS attack rate (Scenario 2, )

Figs. 9-11 present the CPU, RAM and Bandwidth performances of the DDoS victim node, for Scenario 2 and . When the DDoS attack rate is below 4.5 Gbps, KmMR uses as much or more resources than KmMC. This is expected since establishing a consensus in KmMR using BFT-SMaRt requires a larger number of messages exchange. However, for attack rates above 4.5 Gbps, KmMR and KmMC have almost the same level of resource utilization. Indeed, Raft starts to make changes in the cluster in order to regain its stability, resulting in higher resources consumption than usual.

7 Conclusion

With the increased importance of virtualization in cloud computing, Docker containerization is favored for its lightweight and efficient virtualization. This implies the emergence of new forms of architectures organizing cloud services in containers, ready to be instantiated in virtual and/or physical machines. Since the main objective is to guarantee service continuity, orchestrating these containers may seem challenging. Recently, Kubernetes has been adopted as the orchestration platform of Docker containers. Although efficient in managing containers, Kubernetes guarantees service continuity only in presence of non-Byzantine (crash) faults occuring within the system. In fact, the current replication protocol within Kubernetes “Raft" cannot handle Byzantine faults. In this paper, we propose a new orchestration platform capable of overcoming this limitation in Kubernetes. The KmMR platform, based on Byzantine replication protocol BFT-SMaRt, is presented. We detailed our approach to integrate the BFT-SMaRt library (written in Java) into Docker and Kubernetes (written in Golang). Then, we implemented a Kubernetes multi-master platform in an OpenStack-based cloud environment. The system is evaluated for two different scenarios, where initially the maximum number of tolerated faults is either reached or not, and for two orchestration platforms, KmMC and KmMR. The results show that the conventional approach KmMC is efficient and robust in a non-Byzantine and controlled environment, i.e. number of maximum tolerated faults is not exceeded. However, in a Byzantine and not fully controlled environment, KmMR guarantees the continuity of services, while KmMC collapses in front of severe Byzantine faults. In a such environment, KmMR resources consumption is typically stable, compared to KmMC. In future works, we will design a hybrid and intelligent replication protocol, capable of acting adaptively as Raft or BFT-SMaRt according to the environment’s behavior.


This work was partially funded by NSERC-CRD program.



  • (1) D. Bernstein, Containers and Cloud: From LXC to Docker to Kubernetes, IEEE Cloud Computing 1 (3) (2014) 81–84.
  • (2) R. Peinl, F. Holzschuher, F. Pfitzer, Docker cluster management for the cloud-survey results and own solution, J. Grid Comput. 14 (2) (2016) 265–282.
  • (3) CloudNativeCon, CNCF: The Cloud Native Computing Foundation (2018).
  • (4) A. Sill, Emerging Standards and Organizational Patterns in Cloud Computing, IEEE Cloud Computing 2 (4) (2015) 72–76.
  • (5) B. Burns, B. Grant, D. Oppenheimer, E. Brewer, J. Wilkes, Borg, Omega, and Kubernetes, Queue 14 (1) (2016) 10.
  • (6) Google, Kubernetes (2018).
  • (7) C. Oliveira, L. C. Lung, H. Netto, L. Rech, Evaluating Raft in Docker on Kubernetes, in: Proc. Int. Conf. Syst. Science, Springer, 2016, pp. 123–130.
  • (8) D. Ongaro, J. K. Ousterhout, In Search of An Understandable Consensus Algorithm, in: PRoc. USENIX Annual Technical Conf., 2014, pp. 305–319.
  • (9) A. Moga, T. Sivanthi, C. Franke, OS-Level Virtualization for Industrial Automation Systems: Are We There Yet?, in: Proc. 31st Annual ACM Symp. Applied Comput., ACM, 2016, pp. 1838–1843.
  • (10) W. Felter, A. Ferreira, R. Rajamony, J. Rubio, An Updated Performance Comparison of Virtual Machines and Linux Containers, in: Proc. Int. Symp. Perf. Analysis of Syst. and Soft. (ISPASS), IEEE, 2015, pp. 171–172.
  • (11) A. M. Joy, Performance Comparison Between Linux Containers and Virtual Machines, in: Int. Conf. Advances in Comput. Eng. and Appl., 2015, pp. 342–346.
  • (12) P. Sharma, L. Chaufournier, P. Shenoy, Y. Tay, Containers and Virtual Machines at Scale: A Comparative Study, in: Proc. 17th Int. Middleware Conf., ACM, 2016, p. 1.
  • (13) A. M. Joy, Performance Comparison Between Linux Containers and Virtual Machines, in: Proc. Int. Conf. Advances in Comput. Eng. and Appl. (ICACEA), IEEE, 2015, pp. 342–346.
  • (14) W. Li, A. Kanso, Comparing Containers versus Virtual Machines for Achieving High Availability, in: Proc. IEEE Int. Conf. Cloud Eng., 2015, pp. 353–358. doi:10.1109/IC2E.2015.79.
  • (15)

    A. Manu, J. K. Patel, S. Akhtar, V. Agrawal, K. B. S. Murthy, Docker Container Security via Heuristics-based Multilateral Security-conceptual and Pragmatic Study, in: Proc. Int. Conf. Circuit, Power and Comput. Tech. (ICCPCT), IEEE, 2016, pp. 1–14.

  • (16) CoreOS, Fleet Project (2014).
  • (17) Apache, Mesos Project (2014).
  • (18) A. Luzzardi, V. Victor, Swarm: A Docker-native Clustering System (2014).
  • (19) K8s, Kubernetes Source Code (2014).
  • (20) N. Kratzke, P.-C. Quint, Understanding Cloud-native Applications After 10 Years of Cloud Computing -A Systematic Mapping Study, J. Systems and Software 126 (2017) 1–16.
  • (21) N. Bila, P. Dettori, A. Kanso, Y. Watanabe, A. Youssef, Leveraging the Serverless Architecture for Securing Linux Containers, in: Proc. IEEE 37th Int. Conf. Dist. Comput. Syst. Wrkshps. (ICDCSW), IEEE, 2017, pp. 401–404.
  • (22) CoreOS, Coreos ETCD (2018).
  • (23) F. Cristian, Understanding fault-tolerant distributed systems, Commun. of the ACM 34 (2) (1991) 56–78.
  • (24) L. Lamport, R. Shostak, M. Pease, The Byzantine Generals Problem, ACM Trans. Programm. Languages and Syst. (TOPLAS) 4 (3) (1982) 382–401.
  • (25) P.-L. Aublin, Towards Efficient and Robust Fault-Tolerant Protocols (in French), Ph.D. thesis, Université de Grenoble (2014).
  • (26) F. B. Schneider, Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial, ACM Comput. Surveys (CSUR) 22 (4) (1990) 299–319.
  • (27) M. Pease, R. Shostak, L. Lamport, Reaching Agreement in the Presence of Faults, Journal of the ACM (JACM) 27 (2) (1980) 228–234.
  • (28) L. Lamport, et al., Paxos Made Simple, ACM Sigact News 32 (4) (2001) 18–25.
  • (29) R. Van Renesse, N. Schiper, F. B. Schneider, Vive la différence: Paxos vs. Viewstamped Replication vs. Zab, IEEE Trans. Dependable and Secure Comput. 12 (4) (2015) 472–484.
  • (30) G. Bracha, S. Toueg, Asynchronous Consensus and Broadcast Protocols, J. ACM (JACM) 32 (4) (1985) 824–840. doi:10.1109/EDCC.2012.32.
  • (31) M. Castro, B. Liskov, Practical Byzantine Fault Tolerance and Proactive Recovery, ACM Trans. Computer Syst. (TOCS) 20 (4) (2002) 398–461.
  • (32) G. S. Veronese, M. Correia, A. N. Bessani, L. C. Lung, P. Verissimo, Efficient Byzantine Fault-Tolerance, IEEE Trans. Computers 62 (1) (2013) 16–30.
  • (33) A. Clement, M. Kapritsos, S. Lee, Y. Wang, L. Alvisi, M. Dahlin, T. Riche, Upright Cluster Services, in: Proc. 22nd Symp. Operating Systems Principles (SIGOPS), ACM, 2009, pp. 277–290.
  • (34) Y. Amir, B. Coan, J. Kirsch, J. Lane, Prime: Byzantine Replication Under Attack, IEEE Trans. Dependable and Secure Comput. 8 (4) (2011) 564–577.
  • (35) A. Bessani, J. Sousa, E. E. P. Alchieri, State Machine Replication for the Masses with BFT-SMART, in: Proc. 44th Annual IEEE/IFIP Int. Conf. Dependable Syst. and Net., 2014, pp. 355–362.
  • (36) J. Sousa, A. Bessani, M. Vukolic, A Byzantine Fault-Tolerant Ordering Service For The Hyperledger Fabric blockchain platform, in: Proc. 48th Annual Int. Conf. Dependable Syst. and Net. (DSN), IEEE, 2018, pp. 51–58.
  • (37) X. L. Blake Mizerany, Y. Qin, The Raft Consensus Algorithm (2018).
  • (38) H. V. Netto, L. C. Lung, M. Correia, A. F. Luiz, L. M. S. de Souza, State Machine Replication in Containers Managed by Kubernetes, J. of Syst. Arch. 73 (2017) 53–59.
  • (39) J. Lim, T. Suh, J. Gil, H. Yu, Scalable and Leaderless Byzantine Consensus in Cloud Computing Environments, Inf. Syst. Frontiers 16 (1) (2014) 19–34.
  • (40) L. Perronne, Towards Efficient and Robust Fault-Tolerant Protocols (in French), Ph.D. thesis, Université Grenoble Alpes (2016).
  • (41) A. N. Bessani, M. Santos, J. Felix, N. F. Neves, M. Correia, On the Efficiency of Durable State Machine Replication, in: Proc. USENIX Annual Tech. Conf., 2013, pp. 169–180.
  • (42) J. Sousa, A. Bessani, From Byzantine Consensus To BFT State Machine Replication: A Latency-Optimal Transformation, in: PRoc. 9th European Dependable Comput. Conf. (EDCC), IEEE, 2012, pp. 37–48.
  • (43) Golang, The Go Programming Language (2018).
  • (44) GitHub, BFT-Smart Library (2018).
  • (45) O. Sefraoui, M. Aissaoui, M. Eleuldj, OpenStack: Toward An Open-source Solution for Cloud Computing, Int. J. Computer Appl. 55 (3) (2012) 38–42.
  • (46) B. Schroeder, et al., Open versus closed: A cautionary tale, in: IN NSDI, USENIX Association, 2006, pp. 239–252.
  • (47) Sanfilippo, Hping3 (2014).
  • (48) B. Ops, Denial-of-Service Attack–DOS Using Hping3 with Spoofed IP in Kali Linux, BlackMORE Ops. BlackMORE Ops 17.
  • (49) iPerf, IPerf (2018).
  • (50) S. BISWAS, A Guide to The Linux Top Command (2018).