1 Wait-free Computing Model and the Consensus Hierarchy
Crash-prone asynchronous read/write-based systems
This paper considers the classical distributed computing model called read/write wait-free model . It is composed of a set of sequential processes denoted , …, , which communicate through atomic read/write registers [8, 9, 11, 14].
Each process is asynchronous, which means that it proceeds at its own speed, which can be arbitrary and remains always unknown to the other processes, and executes its local algorithm until it possibly crashes, where a crash is a premature halt. Any number of processes may crash in a run, and after crashing a process does not recover. A process that crashes in a run is said to be faulty. Otherwise, it is correct or non-faulty. Let us notice that, due to process crashes and asynchrony, no process can know if an other process crashed or is only very slow.
The notion of a universal object with respect to fault-tolerance was introduced by M. Herlihy . An object type is universal if it is possible to implement any object (defined by a sequential specification) in the read/write wait-free model enriched with any number of objects of type . An algorithm providing such an implementation is called a universal construction. It is shown in  that consensus objects are universal. These objects allow the processes to propose values and agree on one of them. More precisely, such an object provides the processes with a single operation, denoted , that a process can invoke only once, and returns it a value. When invokes ,we say that it “proposes the value ”, and if is the returned value we say that it “decides ”. The consensus object is defined by the three following properties:
Validity. If a process decides a value, this value was proposed by a process.
Agreement. No two processes decide different values.
Termination. If a correct process invokes , it decides a value.
Termination states that if a correct process invokes , it decides a value whatever the behavior of the other processes (wait-freedom progress condition). Validity connects the output to the inputs, while Agreement states that the processes cannot decide differently. A sequence of consensus objects is used in the following way in a universal construction. According to its current view of the operations invoked on, and not yet applied to, the object of type that is built, each process proposes to the next consensus instance a sequence of operations to be applied to , and the winning sequence is actually applied. An helping mechanism  is used to ensure that all the operations on by any correct process are eventually applied to .
Consensus numbers and consensus hierarchy
The notion of a consensus number associated with an object type (denoted CN() in the following) was introduced by Herlihy in . It is the greatest integer such that consensus can be implemented in a system of processes with atomic read/write registers and objects of type . If there is no such finite , the consensus number of is . Hence, a type such that CN is universal in a system of (or less) processes.
It appears that the consensus numbers define an infinite hierarchy (Herlihy’s hierarchy) in which atomic read/write registers have consensus number , object types such as Test&Set, Fetch&Add, and Swap, have consensus number 2, etc., until object types such as Compare&Swap, Linked Load/Store Conditional (and a few others) that have consensus number . In between, read/write registers provided with -assignment111Such an assignment updates atomically read/write registers. It is sometimes written where the are the registers, and each the value assigned to . with , have consensus number . (Recent developments on synchronization objects and consensus numbers can be found in [2, 4, 10].)
Content of the paper
This paper addresses the following question: Does it exist a simple object family, parameterized by an integer , that covers the whole consensus hierarchy (i.e., whose object instantiated with number has exactly consensus number )? The paper answers positively this question by presenting a simple object family, and shows that, for any , its -parameterized instance has consensus number . This object is a very simple and natural generalization of the most basic computing object, namely the atomic read/write register, extended to become a sliding window register of size . This object family has two noteworthy properties. One is its simplicity. The other one lies in the fact that (to our knowledge) it is the only generic object spanning all consensus numbers. This has several advantages, among which, its pedagogical dimension (easy to understand and teach to students), its universality dimension (no need to introduce a specific object at each level of the consensus hierarchy to capture it), and its definition itself (a simple and natural generalization of an atomic read/write register).
2 The Atomic -Sliding Read/Write Register (RW)
As previously indicated, a -sliding read/write register (in short RW) is a natural generalization of an atomic read/write register, which corresponds to the case . Let be such an object. It can be seen as a sequence of values, accessed by two atomic operations denoted and . “Atomic” means that these operations appear as if they have been executed in some sequential order, and this total order is such that, if operation terminates before operation starts, then appears before [9, 11, 14].
The invocation of by a process adds the value at the end of the sequence , while an invocation of returns the ordered sequence of the last written values (if only values have been written, the default value replaces each of the missing values).
Hence, an RW object is a sequence containing all the values that have been written (in their atomicity-defined writing order), and whose each read operation returns the values that have been written just before it, according to the atomicity order. As already indicated, it is easy to see that, for , RW is a classical atomic read/write register. For , each read operation returns the whole sequence of values written so far. Let us notice that RW objects appear in some applications (e.g., the object that models the content of a screen in an instant messenging service where only the last received messages are displayed, or the screen describing plane time departures in airports ).222An object close to RW objects was concurrently and independently introduced in  to address complexity issues in the context of multiprocessor synchronization.
Ranking the objects of the Rw family
Let RW RW denotes the fact that an RW object can be built from an RW object. The following property follows directly the length of the sequences returned by these objects.
3 The Consensus Number of RW is
This section shows that the consensus number of an RW object is at least . To this end, Algorithm 1 builds a consensus object for processes from an RW object .
Proof Let us consider a read/write wait-free system of processes. The consensus Termination property follows from the Termination properties of the operations and of the underlying atomic object (lines 1 and 1), and the fact that the algorithm contains neither loops, nor wait statements.
As at most processes invoke the consensus operation , the underlying object contains at most values. Moreover, the oldest of them is the value written by the first process that executed (line 1). It follows that the value extracted (line 1) from its local sequence by any process is , which proves the consensus Agreement property. The proof of the consensus Validity property follows from the same reasoning.
4 The Consensus Number of RW is
This section shows that, for any finite value , the consensus number of an RW object is smaller than . The proof is a simple adaptation of impossibility proofs found in textbooks (such as [3, 13, 16, 17]), which all rest on the basic concepts (e.g., notion of valence) and techniques introduced in  in the context of message-passing systems (and then used in  in the context of wait-free read/write systems).
(The definitions that follow are from .) Without loss of generality, the proof considers binary consensus, i.e., only the values and can be proposed by the processes (there are algorithms that implement multivalued consensus on top of binary consensus ).
A configuration is a global state made up of the local states of each process and the state of every object shared by the processes. In our case, as RW RW(Property 1), we consider that the only objects shared by the processes are RW objects.
Assuming an algorithm implementing a consensus object, a configuration attained by an execution of is -valent (), if only the value can be decided from . Such configurations are said to be monovalent. Otherwise, they are said to be bivalent (the dices are not yet cast!). Let us observe that there is an initial configuration that is bivalent333Assume proposes , while proposes . It follows from the consensus Validity property that, if all the processes except crash initially, only can be decided. Similarly, if all the processes except crash initially, only can be decided. It follows that the corresponding initial configuration is bivalent.. Moreover, let us notice that -due to its very definition- any configuration that follows a -valent configuration is -valent.
A schedule is a sequence of operations on shared objects issued by the processes. Let us observe that, given an initial configuration, any consensus algorithm must terminate (all correct processes must decide). Consequently all the schedules it can produce (whatever the failure and asynchrony pattern) must eventually attain a monovalent configuration.
being a configuration, let denotes the configuration attained from by executing (the next read or write operation on a RW object issued by ), and be the configuration attained from by executing the schedule .
A maximal bivalent schedule is a schedule that ends in a bivalent configuration such that the next operation issued by any process produces a monovalent configuration. Let us notice that, if there is an algorithm solving consensus, any of its executions has a maximal schedule (otherwise will have non-terminating executions).
Let . CN(RW) .
The proof can be seen as a straightforward generalization of the proof given in , which shows that atomic registers (i.e., RW registers) have consensus number .
Proof As in , starting with an algorithm assumed to implement consensus, and an initial bivalent configuration, the proof consists in building an execution of in which there is no maximal schedule. Consequently, all its configurations are bivalent, from which follows that the schedule is infinite: does not satisfy the consensus Termination property.
Hence, let us consider a read/write wait-free system of processes, enriched with any number of RW objects. As is assumed to terminate, each of its executions generates a maximal schedule, i.e., produces a bivalent configuration after which there is no more bivalent configurations. The proof is a classical case analysis depending on whether the next operation issued by each process is a read or write operation, and whether they are on the same or different RW objects. Let and be two processes whose next operations to execute in are and , producing the -valent configuration , and the -valent configuration , respectively.
Case 2: The next operations and issued by and are on the same RW object and one of them (e.g., ) is a read. In this case, there is a schedule , starting from the -valent configuration , in which all the processes except (which stops for an arbitrary long period or crashes) issue operations and eventually decide. As is -valent, they decide .
Let us now consider . This configuration differs from only in the local state of (which read the RW object in the configuration , while it does not in ) See an illustration on the right size of Figure 1. Let us apply the schedule to configuration . This is possible because no process (except ) can distinguish from . From the schedule , it follows that decides , contradicting the fact that the configuration is -valent.
Case 3: In , the next operation by each process is a write, and these write operations are on the same RW object (444The intuition that underlies this case is the following. While can be the first process that writes a value (say ) in (thereby producing a -valent configuration) and then pauses for an arbitrarily long period, it is possible that the next process writes , and the other processes write also a value, whose net effect is the elimination of the value written by from the current window.). The reasoning is similar to Case 2. Let be -valent, and be -valent. Let be a schedule, starting from in which
(a) the first operations are the write of invoked by the processes different from and .
(b) all processes, except , execute steps until each of them decides, and
(b) executes no operation.
Let us notice that such a schedule is possible because, in , the next operation of each process is a write into (Case assumption, which implies item (a)555The important point is here the following: in no process different from can know the value written in by .), and the algorithm terminates (hence each correct process invokes the consensus operation and decides, which implies item (b)).
Let denote the schedule composed of followed by . As is -valent, all processes involved in (i.e., all processes except ) decide .
Let us now consider the monovalent state , in which applies . Let us observe that no process, except , can distinguish from (they have the same local states in both). It follows that the schedule (executed previously from ) can also be executed from . The first operations of this schedule are a write operation on issued by each process different from . Moreover, at the end of this schedule, all the processes (except , which is not involved in ) decide . This contradicts the fact that is -valent, which concludes the proof.
This paper first introduced a new type of concurrent object, parameterized by an integer , namely an atomic read/write sequence which can be accessed by a read and a write operation. Each write adds a new value at the end of the sequence, while a read returns the last written values. This generic object, called -sliding read/write register, has an instance for each integer . The instance corresponds to the classical atomic read/write register, which is the most basic object of computing. Then, the paper has shown that the consensus number of such a -parameterized object is . Hence, this object family covers the whole spectrum of Herlihy’s consensus hierarchy, a noteworthy pedagogical property. From a technical point of view, this result may help better understand the synchronization power of concurrent objects. Moreover, it is sufficient to show that an object can be implemented with a -sliding read/write register to prove its consensus number is at most .
This work has been partially supported by the French ANR project DESCARTES devoted to layered and modular structures in distributed computing.
-  Afek Y., Ellen F., and Gafni E., Deterministic objects: life beyond consensus. Proc. 35th ACM Int’l Symposium on Principles of Distributed Computing (PODC’16), ACM Press pp. 97-106 (2016)
-  Attiya H. and Welch J., Distributed computing: fundamentals, simulations and advanced topics, (2d Edition), Wiley-Interscience, 414 pages (2004)
-  Censor-Hillel K., Petrank E., and Timnat S., Help! Proc. 34th ACM Int’l Symposium on Principles of Distributed Computing (PODC’15), ACM Press pp. 241-250 (2015)
-  Ellen F., Gelashvili R., Shavit N., and Zhu L., A complexity-based hierarchy for multiprocessor synchronization. Proc. 35th ACM Int’l Symposium on Principles of Distributed Computing (PODC’16), ACM Press, pp. 97-106 (2016)
-  Fischer M.J., Lynch N.A., and Paterson M.S., Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2):374-382 (1985)
-  Herlihy M. P., Wait-free synchronization. ACM Transactions on Programming Languages and Systems, 13(1):124-149 (1991)
-  Herlihy M., Rajsbaum S., and Raynal M., Power and limits of distributed computing shared memory models. Theoretical Computer Science, 509:3-24 (2013)
-  Herlihy M.P. and Wing J.M, Linearizability: a correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems, 12(3):463-492 (1990)
-  Imbs D. and Raynal M., The multiplicative power of consensus numbers. Proc. 29th ACM Int’l Symposium on Principles of Distributed Computing (PODC’16), ACM Press, pp. 26-35 (2010)
-  Lamport L., On interprocess communication, Part I: basic formalism. Distributed Computing, 1(2):77-85 (1986)
-  Loui M. and Abu-Amara H., Memory requirements for agreement among unreliable asynchronous processes. Advances in Computing Research, 4:163-183, JAI Press (1987)
-  Lynch N.A., Distributed algorithms. Morgan Kaufmann Pub., San Francisco (CA), 872 pages (1996) ISBN 1-55860-384-4.
-  Misra J., Axioms for memory access in asynchronous hardware systems. ACM Transactions on Programming Languages and Systems, 8(1):142-153 (1986)
-  Perrin M., Spécification des objets partagés dans le systèmes répartis sans attente. PhD Thesis, 201 pages (2016)
-  Raynal M., Concurrent programming: algorithms, principles and foundations. Springer, 515 pages, ISBN 978-3-642-32026-2 (2013)
-  Taubenfeld G., Synchronization algorithms and concurrent programming. Pearson Prentice-Hall, 423 pages, ISBN 0-131-97259-6 (2006)