Seamless Paxos Coordinators

10/21/2017
by   Gustavo M. D. Vieira, et al.
0

The Paxos algorithm requires a single correct coordinator process to operate. After a failure, the replacement of the coordinator may lead to a temporary unavailability of the application implemented atop Paxos. So far, this unavailability has been addressed by reducing the coordinator replacement rate through the use of stable coordinator selection algorithms. We have observed that the cost of recovery of the newly elected coordinator's state is at the core of this unavailability problem. In this paper we present a new technique to manage coordinator replacement that allows the recovery to occur concurrently with new consensus rounds. Experimental results show that our seamless approach effectively solves the temporary unavailability problem, its adoption entails uninterrupted execution of the application. Our solution removes the restriction that the occurrence of coordinator replacements is something to be avoided, allowing the decoupling of the application execution from the accuracy of the mechanism used to choose a coordinator. This result increases the performance of the application even in the presence of failures, it is of special importance to the autonomous operation of replicated applications that have to adapt to varying network conditions and partial failures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/27/2018

Recoverable Consensus in Shared Memory

Herlihy's consensus hierarchy is one of the most widely cited results in...
research
01/14/2018

Shrink or Substitute: Handling Process Failures in HPC Systems using In-situ Recovery

Efficient utilization of today's high-performance computing (HPC) system...
research
01/15/2019

Optimal Replacement Policy under Cumulative Damage Model with Strength Degradation

A machine or production system is subject to random failure and it is re...
research
05/19/2020

Multiple Source Replacement Path Problem

One of the classical line of work in graph algorithms has been the Repla...
research
06/16/2018

Fast Distance Sensitivity Oracle for Multiple Failures

When a network is prone to failures, it is very expensive to compute the...
research
03/24/2020

Recovery command generation towards automatic recovery in ICT systems by Seq2Seq learning

With the increase in scale and complexity of ICT systems, their operatio...
research
05/29/2019

The Impact of RDMA on Agreement

Remote Direct Memory Access (RDMA) is becoming widely available in data ...

Please sign up or login with your details

Forgot password? Click here to reset