Recoverable Consensus in Shared Memory

by   Wojciech Golab, et al.

Herlihy's consensus hierarchy is one of the most widely cited results in distributed computing theory. It ranks the power of various synchronization primitives for solving consensus in a model where asynchronous processes communicate through shared memory and fail by halting. This paper revisits the consensus hierarchy in a model with crash-recovery failures, where the specification of consensus, called recoverable consensus in this paper, is weakened by allowing non-terminating executions when a process fails infinitely often. Two variations of this model are considered: independent failures, and simultaneous (i.e., system-wide) failures. Universal primitives such as Compare-And-Swap solve consensus easily in both models, and so the contributions of the paper focus on lower levels of the hierarchy. We make three contributions in that regard: (i) We prove that any primitive at level two of Herlihy's hierarchy remains at level two if simultaneous crash-recovery failures are introduced. This is accomplished by transforming (one instance of) any 2-process conventional consensus algorithm to a 2-process recoverable consensus algorithm. (ii) For any n > 1 and f > 0, we show how to use f+1 instances of any conventional n-process consensus algorithm and Θ(f + n) read/write registers to solve n-process recoverable consensus when crash-recovery failures are independent, assuming that every execution contains at most f such failures. (iii) Lastly, we prove for any f > 0 that any 2-process recoverable consensus algorithm that uses TAS and read/writer registers requires at least f+1 TAS objects, assuming that crash-recovery failures are independent and every execution contains at most f such failures per process (or at most 2f failures total).



There are no comments yet.


page 1

page 2

page 3

page 4


Is Compare-and-Swap Really Necessary?

The consensus number of a synchronization primitive, such as compare-and...

A Simple Object that Spans the Whole Consensus Hierarchy

This paper presents a simple generalization of the basic atomic read/wri...

Seamless Paxos Coordinators

The Paxos algorithm requires a single correct coordinator process to ope...

The Impact of RDMA on Agreement

Remote Direct Memory Access (RDMA) is becoming widely available in data ...

Wait-Free Universality of Consensus in the Infinite Arrival Model

In classical asynchronous distributed systems composed of a fixed number...

Genome-Wide Epigenetic Modifications as a Shared Memory Consensus Problem

A distributed computing system is a collection of processors that commun...

Upper and Lower Bounds on the Space Complexity of Detectable Object

The emergence of systems with non-volatile main memory (NVM) increases t...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Herlihy’s consensus hierarchy [waitfree] ranks the power of various synchronization primitives for solving consensus – a problem where processes agree on a decision chosen from a set of proposed values. The position of a primitive in the hierarchy is defined by its consensus number, which is the largest number such that used in conjunction with read/write registers solves -process consensus in the asynchronous shared memory model. Consensus numbers answer the question of implementability in the following sense: if primitives and have consensus numbers and , respectively, where , then can be used to implement in a wait-free manner for up to processes, but cannot be used to implement in a wait-free manner for more than processes. Common synchronization primitives appear in Herlihy’s consensus hierarchy at three distinct levels: read/write registers have consensus number 1, so-called “interfering” primitives (e.g., Test-And-Set) as well as stacks and queues have consensus number 2, whereas Compare-And-Swap (CAS) has infinite consensus number.

The forthcoming adoption of non-volatile main memory (NVRAM) has revived interest in models of computation where processes may fail by crashing, and then recover. Synchronization remains challenging in such models because the response of a shared memory operation is returned to a process using a volatile CPU register, which is part of the local state of process and is lost if a failure occurs before the response can be saved to non-volatile memory. This observation calls into question the power of Read-Modify-Write (RMW) primitives for solving various synchronization tasks in the presence of crash-recovery failures, and exposes a new perspective for re-examining Herlihy’s consensus hierarchy.

Building on the results of Berryhill, Golab, and Tripunitara [berry], who proved that consensus remains sufficiently powerful to implement any shared object type for any number of processes (i.e., remains universal) in the crash-recovery failure model, this paper advances the state of the art by establishing three fundamental results. The contributions are phrased with respect to recoverable consensus – a natural weakening of consensus that accommodates executions with crash-recovery failures.

  1. We show how to use a single instance of any conventional 2-process consensus algorithm, and a constant number of read/write registers, to solve 2-process recoverable consensus when failures are simultaneous (i.e., system-wide). Thus, any primitive at level two of Herlihy’s hierarchy remains at level two when simultaneous crash-recovery failures are introduced.

  2. For any and , we show how to use instances of any conventional -process consensus algorithm and read/write registers to solve -process recoverable consensus when failures are independent, assuming that every execution has at most such failures.

  3. Lastly, we prove for any that any 2-process recoverable consensus algorithm that uses TAS and read/writer registers requires at least TAS objects, assuming that failures are independent and every execution contains at most such failures per process (or at most failures total).

Results (ii) and (iii) establish a tight bound of on the space complexity of any 2-process recoverable consensus based on TAS for executions with independent failures. This implies that 2-process recoverable consensus is not solvable in general using a finite number of TAS objects and read/write registers if failures are independent. In contrast, (i) shows that the problem is solvable using only a single TAS object and read/write registers when failures are simultaneous. To our knowledge, this is the first separation between the two variations of the crash-recovery failure model under consideration with respect to the computability of consensus.

2 Model

Our model is based closely on Herlihy’s [waitfree].

Processes and objects.  The system comprises asynchronous processes, labeled , that communicate by accessing a finite number of typed shared objects. The type of an object is described by a set of states, a distinguished initial state, a set of operations, and a state transition function that dictates the response of a given operation applied in a given state. Both processes and objects are deterministic. Processes and objects can be modeled formally using I/O automata [lynch], but in this paper we describe their behavior less formally using pseudo-code.

System and process states.  The state of the system (state for short) comprises the local state of each process (i.e., its program counter and local variables) as well as the state of each shared object (i.e., its current value). There is a well-defined initial state of the system in which the program counter of each process points to the beginning of its algorithm, its local variables hold their initial values, and each shared object is in the initial state prescribed by its type. The state of the system changes in response to processes taking steps, each of which entails bounded local computation followed by an atomic operation on one shared object (ordinary step), or a crash-recovery failure (crash step). A crash step leaves the state of all shared objects unchanged, and resets the local state of one or more processes back to the initial state. A process may recover after suffering a crash failure, in which case it resumes execution of its algorithm from the beginning.111A process that recovers from a failure does not know the value of its own program counter immediately prior to failure, even if it uses a dedicated read/write register to record its own progress. This is because only one object or memory location can be accessed in one atomic step. Two variations of the crash failure are considered: an independent failure affects a single process, where as a simultaneous failure affects all processes system-wide. A crash step in the independent failure model identifies the failed process uniquely.

Executions.  An execution is a sequence of steps that is possible with respect to the algorithm prescribed for each process and the types of the shared objects, starting from the initial system state. The set of such executions is prefix-closed. A step is enabled in state if is the state of the system at the end of some finite execution, and the sequence obtained by appending to (denoted ) is also an execution. A crash step is enabled in any state.

Consensus.  The algorithm executed by each process is a procedure that takes a proposal value as input, and returns a decision. The proposal value for each process is fixed for the duration of an execution, and is known to in the initial system state. Different proposals map to different initial states. In the classic model without failures (or, equivalently, with halting failures), the correctness properties of the algorithm are captured by the specification of the widely-studied consensus problem:

  1. Agreement: distinct processes never output different decisions.

  2. Validity: each decision returned is the proposal value of some process.

  3. Wait-freedom: each process returns a decision after a finite number of its own steps.

In this paper, we introduce a weakening of this problem called recoverable consensus (or r-consensus for short) that accommodates recovery from crash failures, which means that each process may attempt to compute a decision multiple times, as well as the inherent loss of liveness that may occur if a process fails repeatedly. Specifically, the third correctness property is revised as follows:

  1. Recoverable wait-freedom: each time a process executes its algorithm from the beginning, it either returns a decision after a finite number of its own steps, or crashes.

Consensus Numbers.  For any shared object type , the consensus number of with respect to a given failure model is the maximum number such that a finite number of objects of type and a finite number of additional read/write registers can be used to solve consensus (or recoverable consensus in our model) for processes, but not for processes. If no such exists then the consensus number is infinite, and is called universal.

Valency.  A state of the system at the end of a finite execution is called -potent if there is some execution that extends and in which some process decides . State is called -valent if it is -potent but not -potent for any . State is called univalent if it is -valent for some . If (i.e., the model has only two processes), then state is called bivalent if it is not univalent.

3 Solving Recoverable Consensus Under Simultaneous Failures

Shared variables:

  • [leftmargin=5mm]

  • : array of proposal values, init

  • : conventional 2-process consensus

  • : decision, init

Private variables:

  • [leftmargin=5mm]

  • other: process ID

  • : decided value