The -set agreement problem, introduced by Chaudhuri [Cha93], is a well-known synchronization task in which processes, each with an input value, are required to output at most different values, each of which is the input of some process. This is a generalization of the classical consensus problem, which is the case .
Two celebrated results in distributed computing are the impossibility of solving consensus deterministically when at most one process may crash [FLP85, LA87] and, more generally, the impossibility of solving -set agreement deterministically when at most processes may crash [BG93, HS99, SZ00], using only registers. One way to bypass these impossibility results is to design protocols that are obstruction-free [HLM03]. Obstruction-freedom is a termination condition that requires a process to terminate given sufficiently many consecutive steps, i.e., from any configuration, if only one process takes steps, then it will eventually terminate. -obstruction-freedom [Tau17] generalizes this condition: from any configuration, if only processes take steps, then they will all eventually terminate. It is known that -set agreement can be solved using only registers in an -obstruction-free way for [YNG98]
. Another way to overcome the impossibility of solving consensus is to use randomized wait-free protocols, where non-faulty processes are required to terminate with probability[BenOr83].
It is possible to solve consensus for processes using registers in a randomized wait-free way [Abr88, AH90, SSW91, AC08] or in an obstruction-free way [GR05, Bow11, Zhu15, BRS15]. A lower bound of was proved by Ellen, Herlihy, and Shavit in [FHS98]. Recently, Gelashvili proved an lower bound for anonymous processes [Gel15]. Anonymous processes [FHS98, AGM02] have no identifiers and run the same code: all processes with the same input start in the same initial state and behave identically until they read different values. Then Zhu proved that any obstruction-free protocol solving consensus for processes requires at least registers [Zhu16]. All these lower bounds are actually for protocols that satisfy nondeterministic solo termination [FHS98], which includes both obstruction-free and randomized wait-free protocols.
In contrast, there are big gaps between the best known upper and lower bounds on the number of registers needed for -set agreement. The best obstruction-free protocols require registers [Zhu15, BRS15]. Bouzid, Raynal, and Sutra [BRS15] also give an -obstruction-free protocol that uses registers, improving on the space complexity of Delporte-Gallet, Fauconnier, Gafni, and Rajsbaum’s obstruction-free protocol [DFGR13]. All of these algorithms work for anonymous processes. Delporte-Gallet, Fauconnier, Kuznetsov, and Ruppert [DFKR15] proved that it is impossible to solve -set agreement using register. For anonymous processes, they also proved a lower bound of for -obstruction-free protocols, which still leaves a polynomial gap between the lower and upper bounds.
There are good reasons why proving lower bounds on the number of registers needed for -set agreement may be difficult. At a high level, the impossibility results for -set agreement consider some representation (for example, a simplicial complex) of all possible process states in all possible executions. Then, a combinatorial property (Sperner’s Lemma [Lef49book]) is used to prove that, roughly speaking, for any given number of steps, there exists an execution leading to a configuration in which outputs are still possible. Although there is ongoing work to develop a more general theory [GKM14, SHG16, GHKR16], we do not know enough about the topological representation of protocols that are -obstruction-free or use fewer than multi-writer registers [HKR13book] to adapt topological arguments to prove space lower bounds for -set agreement. There are similar problems adapting known proofs that do not explicitly use topology [AC13, AP16].
Approximate agreement [DLPSW86] is another important task for which no good space lower bound was known. In -approximate agreement, each process starts with an input in . The processes are required to output values in the interval that are all within of each other. Moreover, each output value must lie between the smallest input and the largest input. This problem can be deterministically solved in a wait-free manner, i.e. every non-faulty process eventually outputs a value. The only space lower bound for this problem, , was in a restricted setting with single-bit registers [Sch96]. The best upper bounds are [Sch96] and [attiya1994wait].
Our contribution. In this paper, we prove a lower bound of on the number of registers necessary for solving -process -obstruction-free -set agreement. As corollaries, we get a tight lower bound of registers for obstruction-free consensus and a tight lower bound of 2 for obstruction-free -set consensus. We also prove a space lower bound of registers for obstruction-free -approximate agreement, for sufficiently small . More generally, we prove space lower bounds for colorless tasks.
In addition, in Section 5, we prove that any lower bound on the number registers needed for obstruction-free protocols to solve a task also applies to nondeterministic solo terminating protocols and, in particular, to randomized wait-free protocols solving that task. Hence, our space lower bounds for obstruction-free protocols also apply to such protocols. We also show that the same result may be obtained for a large class of objects.
Technical Overview. Using a novel simulation, we convert any obstruction-free protocol for -set agreement that uses too few registers to a protocol that solves wait-free -set agreement using only registers. Since solving wait-free -set agreement is impossible using only registers, this reduction gives a lower bound on the number of registers needed to solve obstruction-free -set agreement. This simulation technique, described in detail in Section 4, is the main technical contribution of the paper. It is the first technique that proves lower bounds on space complexity by applying results obtained by topological arguments. We also use this new technique to prove a lower bound on the number of registers needed for -approximate agreement by a reduction from a step complexity lower bound for -approximate agreement. Specifically, we convert any obstruction-free protocol for -approximate agreement to a protocol that uses few registers to a protocol that solves -approximate agreement for two processes such that both processes take few steps.
The executions of the simulated processes in our simulation are reminiscent of the executions constructed by adversaries in covering arguments [BL93, AE14]. In those proofs, the adversary modifies an execution it has constructed by revising the past of some process, so that the old and new executions are indistinguishable to the other processes. It does so by inserting consecutive steps of the process starting from some carefully chosen configuration. In our simulation, a real process may revise the past of a simulated process, in a way that is indistinguishable to other simulated processes. This is possible because each simulated process is simulated by a single real process. In contrast, in the BG simulation [BGLR01], different steps of simulated processes can be performed by different real processes, so this would be much more difficult to do.
A crucial component of our simulation is the use of an augmented snapshot object, which we implement in a non-blocking manner from registers. Like a standard snapshot object, this object consists of a fixed number of components and supports a operation, which returns the contents of all components. However, it generalizes the operation to a operation, which can update multiple components of the object. In addition, a returns some information, which is used by our simulation. The specifications of and our implementation of an augmented snapshot object appears in Section 3.
An asynchronous shared memory system consists of a set of processes and instances of base objects, which processes use to communicate. An object has a set of possible values and a set of operations, each of which takes some fixed number of inputs and returns a response. The processes take steps at arbitrary speeds and may fail, at any time, by crashing. Every step consists of an operation on some base object by some process plus local computation by that process to determine its next state from the response returned by the operation.
Configurations and Executions. A configuration of a system consists of the state of each process and the value of each object. An initial configuration is determined by the input value of each process. Each object has the same value in all initial configurations. A configuration is indistinguishable from a configuration to a set of processes in the system, if every process in is in the same state in as it is in and each object in the system has the same value in as in .
A step by a process is applicable at a configuration if can be the next step of process given its state in . If is applicable at , then we use to denote the configuration resulting from taking step at . A sequence of steps is applicable at a configuration if is applicable at and, for each , is applicable at . In this case, is called an execution from . An execution denotes the execution followed by the execution . A configuration is reachable if there exists a finite execution from an initial configuration that results in .
For a finite execution from a configuration , we use to denote the configuration reached after applying to . If is empty, then . We say an execution is -only, for a set of processes , if all steps in are by processes in . A -only execution, for some process , is also called a solo execution by . Note, if configurations and are indistinguishable to a set of processes , then any -only execution from is applicable at .
Implementations and Linearizability. An implementation of an object specifies, for each process and each operation of the object, a deterministic procedure describing how the process carries out the operation. The execution interval of an invocation of an operation in an execution is the subsequence of the execution that begins with its first step and ends with its last step. If an operation does not complete, for example, if the process that invoked it crashed before receiving a response, then its execution interval is infinite. An implementation of an object is linearizable if, for every execution, there is a point in each operation’s execution interval, called the linearization point of the operation, such that the operation can be said to have taken place atomically at that point [HW90]. This is equivalent to saying that the operations can be ordered (and all incomplete operations can be given responses) so that any operation which ends before another one begins is ordered earlier and the responses of the operations are consistent with the sequential specifications of the object [HW90].
Progress Conditions. An implementation of an object is wait-free if every process is able to complete its current operation on the object after taking sufficiently many steps, regardless of what other processes are doing. An implementation is non-blocking if infinitely many operations are completed in every infinite execution.
A protocol is -obstruction-free if, from any configuration and for any subset of at most processes, every process in that takes sufficiently many steps after outputs a value, as long as only processes in take steps after . A protocol is obstruction-free if it is 1-obstruction-free and wait-free if it is -obstruction-free.
Registers and Snapshot objects. A register is an object that supports two operations, and . A operation writes value to the register, and a operation returns the last value that was written to the register before the read. A multi-writer register allows all processes to write to it, while a single-writer register can only be written to by one fixed process. A process is said to be covering a register if its next step is a write to this register. A block write is a consecutive sequence of operations to different registers performed by different processes.
An -component multi-writer snapshot object [AADGMS93] stores a sequence of values and supports two operations, and . An operation sets component of the object to . A operation returns the current view, consisting of the values of all components. A single-writer snapshot object shared by a set of processes has one component for each process and each process may only its own component. A process is said to be covering component of a snapshot object if its next step is an update to the component . A block update is a consecutive sequence of operations to different components of a snapshot object performed by different processes.
It is easy to implement registers from an -component multi-writer snapshot object, by replacing each to the ’th register by an to the ’th component and replacing a to the ’th register by a and then discarding all but the value of the ’th component. An -component snapshot object can also be implemented from registers [AADGMS93].
Tasks and Protocols. A task specifies a set of allowable combinations of inputs to the processes and, for each such combination, what combinations of outputs can be returned by the processes. A protocol for a task provides a procedure for each process to compute its output, so that the task’s specifications are satisfied.
A task is colorless if the input or output of any process may be the input or output, respectively, of another process. Moreover, the specification of the task does not depend on the number of processes in the system. More precisely, a colorless task is a triple , where contains sets of possible inputs, contains sets of possible outputs, and, for each input set , specifies a subset of , corresponding to valid outputs for . Moreover, , , and , for each , are closed under taking subsets; i.e. if a set is present, then so are its non-empty subsets. The following are all examples of colorless tasks:
Consensus: Each process begins with an arbitrary value as its input and, if it does not crash, must output a value such that no two processes output different values and each output value is the input of some process.
-Set agreement: Each process begins with an arbitrary value as its input and, if it does not crash, must output a value such that at most values are output and each output value is the input of some process.
-Approximate Agreement: Each process begins with an arbitrary (real) value as its input and, if it does not crash, must output a value such that any two output values are at most apart. Moreover, the set of output values is in the interval , where and are the smallest and largest input values, respectively.
The space complexity of a protocol is the maximum number of registers used in any execution of the protocol. Each -component snapshot object it uses counts as registers. The space complexity of a task is the minimum space complexity of any protocol for the task.
2.1 Our Setting
We consider two asynchronous shared memory systems, the simulated system and the real system.
Simulated system. The simulated system consists of simulated processes, , that communicate through an -component multi-writer snapshot object. Thus, any task that can be solved in the simulated system has space complexity at most .
Without loss of generality, we assume that each process alternately performs and operations on the snapshot object: Between two consecutive operations, can perform a and ignore its result. If is supposed to perform multiple consecutive s, it can, instead, perform one and use its result as the result of the others. This is because it is possible for all these s to occur consecutively in an execution, in which case, they would all get the same result.
Real system. The real system consists of real processes, , that communicate through a single-writer snapshot object. For clarity of presentation, real processes use single-writer registers in addition to the single-writer snapshot object. The single-writer registers to which a particular process writes can be treated as additional separate fields of the component of the snapshot object belonging to that process. In Section 3, we define and implement an -component augmented snapshot object shared by the real processes.
In our simulation, the processes in the simulated system are partitioned into sets, , and real process is solely responsible for simulating the actions of all processes in in the simulated system. This is illustrated in Figure 1.
3 Augmented Snapshot Object
In this section, we define an augmented snapshot object and show how it can be deterministically implemented in the real system. This object plays a central role in our simulation. It is used by real processes to simulate steps performed by simulated processes. In particular, a real process uses this object to simulate an or a by any simulated process in , or a block update by any subset of processes in . Our simulation, which is explained in Section 4, is non-standard. Unfortunately, to satisfy its technical requirements, the augmented snapshot has to satisfy some non-standard properties.
An -component augmented snapshot object is a generalization of an -component multi-writer snapshot object. A operation returns the current view, consisting of the values of all components. The components can be updated using a operation. The key difference between a multi-writer snapshot and an augmented snapshot is that a may update multiple components, although not necessarily atomically. In addition, a may return a view from some earlier point in the execution. Otherwise, it returns a special yield symbol, .
A linearizable, non-blocking implementation of an augmented snapshot object in the real system is impossible. This is because a operation that updates 2 components would then be the same as a operation. However, , together with or , can be used to deterministically solve wait-free consensus among 2 processes [her91], which is impossible in the real system [AADGMS93, LA87].
Instead, a operation can be considered to be a sequence of atomic operations, which each update one component of the augmented snapshot object. (Analogously, a collect operation [beame1986limits, attiya1994wait] is not atomic, but the individual reads that comprise it are atomic.)
An -component augmented snapshot object, , shared by processes, , consists of components and supports two operations, and , which can be performed by all processes. A view of the augmented snapshot consists of the value of each of the components at some point in an execution. A operation returns the current view. A operation to a sequence of different components of with a sequence of values is comprised of a sequence of operations . Each atomically sets to . These operations may occur in any order. A operation also returns either or a view of .
A that does not return is called atomic. We require that every execution has a linearization in which the operations comprising each atomic are linearized consecutively.
Consider any atomic , . Let be the first in . Let be the last prior to that is part of an atomic or the beginning of the execution, if there is no such . Then must return a view of at some point between and such that no occurs between and .
We prove that, in our implementation, a only returns under certain circumstances, as described in Theorem 20. For example, a by process will always be atomic and, if a experiences no step contention [AGK05], it will be atomic. The simulation in Section 4 relies on this property of our implementation.
In this section, we describe how to implement an -component augmented snapshot object, , shared by processes, , in the real system. Our implementation is non-blocking: every operation is wait-free, while a operation can only be blocked by an infinite sequence of concurrent operations.
The implementation uses a shared single-writer snapshot object . All components of are initially . The ’th component of is used by to record a list of every it performs, each represented by a triple. A triple contains a component of , a value, and a timestamp. Each time performs a to components of , it appends triples to , all with the same timestamp. For clarity of presentation, we also use unbounded arrays of single-writer registers, , for all , each indexed by the non-negative integers. Each register is initially . Only process can write to and only process reads from it. The register is used by to help determine what value to return from its ’th .
The set of arrays , for all , may be viewed as an additional field, , of ’s component . To write value to , updates , appending to the field. If a process , , wishes to read the value of , then it scans and checks if a triple is present in the field of . If not, it considers the value of to be . Otherwise, finds the last such triple and it considers the value of to be .
Observe that, given this representation of , it is possible for to perform a sequence of writes to , for , by performing a single update to . Similarly, it can read the arrays , for all , by performing a single scan on .
Notation. We use upper case letters to denote instances of and on , instances of and on single-writer registers, and instances of , and on . The corresponding lower case letter denotes the result of a , , or operation. For example, denotes the result of a . We use to denote the value of the ’th component of and to denote the number of operations has performed on , which is exactly the number of different timestamps associated with the triples recorded in . The only shared variables are and , the rest are local variables.
Auxiliary Procedures. A timestamp is a label from a partially ordered set, which can be associated with an operation, such that, if one operation completes before another operation begins, the first operation has a smaller timestamp [Lam78]
. We use a variant of vector timestamps[Fid91, Mat89, AW04book]: Each timestamp is an -component vector of non-negative integers, with one component per process. Timestamps are ordered lexicographically. We use to denote that timestamp is lexicographically larger than timestamp and to denote that is lexicographically at least is large as .
Let be the result of a . Process generates a new timestamp from using the locally computed function . It sets to for all and sets to .
For each , let be the value with the lexicographically largest associated timestamp among all update triples in all components of , or if no such triple exists. The view of , denoted , is the vector . It is obtained using the locally computed function .
Main Procedures. To perform a of , process repeatedly performs of until two consecutive results are the same. Then returns the of its last . Notice that is not necessarily wait-free. However, it can only be blocked by an infinite sequence of operations that modify between every two operations performed by the . To help other processes determine what to return from a , records the result, , of each in register , for all .
To perform a of , first performs a scan of . Then it generates a timestamp, , from the result, , of and appends the triples , , to via an . This associates the same timestamp, , with the and each of the operations comprising it. Next, process helps processes with lower identifiers by performing another of and recording its result, , in for all . Then performs a third to check whether any process with a higher identifier has performed an after . If so, returns . This is the only way in which a can return . Consequently, all operations performed by are atomic.
If does not return , it reads for all , where is the number of operations that had previously performed. It determines which among these and is the result of the latest and returns its view. The mechanism for determining the latest is described next.
If and are the results of two scans of and, if is a prefix of for all , then we say that is a prefix of . In addition, if for some , we say that is a proper prefix of . Since each to the single-writer snapshot appends one or more update triples to a component, the following is true.
Let and be scans of with results and , respectively. If occurred before , then is a prefix of . Conversely, if is a proper prefix of , then occurred before .
Thus, by Observation 1, for any set of , the result of the earliest of these is a prefix of the result of every other in the set.
The next lemma shows that our implementation of is wait-free while our implementation of is non-blocking.
Lemma 2 (Step Complexity).
Each operation consists of 6 steps. If is the number of different updates by other processes (which append update triples) that are concurrent with an operation, then it completes after at most steps.
The writes in the loop on lines 6–7 in the pseudocode for an may be simultaneously performed by a single update. Similarly, the reads in the loop on lines 12–15 may be simultaneously performed with a single scan. Thus, each operation consists of 6 operations on the single-writer snapshot .
An operation begins with a scan of . The writes in the loop on lines 5–6 may be simultaneously performed by a single update. Hence, each iteration of the loop performs two steps: an update and a scan. Each unsuccessful iteration of the loop is caused by a different update by another process that occurs between the scan in that iteration and the scan in the previous iteration. In addition, there is one successful iteration of the loop. Hence, the performs at most steps. ∎
3.3 Proof of Correctness
In this section, we prove that our implementation is correct. We begin by describing the linearization points of our operations.
Linearization Points. A complete operation is linearized at its last of , performed on Line 7. Now consider a , with associated timestamp , that updates components . For , the to component is linearized at the first point in the execution at which contains a triple beginning with and ending with a timestamp . If multiple operations are linearized at the same point, then they are ordered by their associated timestamps (from earliest to latest) and then in increasing order of the components they update.
Each of a , , performed without step contention is linearized at ’s to on Line 4. However, it is not possible to do this for all operations; otherwise, we would be implementing a linearizable, non-blocking augmented snapshot, which, as discussed earlier, is impossible. In our linearization, if an , , that is part of updates a component which is also updated by an , , that is part of a concurrent by a process with a lower identifier, then may be linearized before .
We now prove a useful property of our helping mechanism.
Since only appends new triples on Line 4, we also have the following.
Let be the first performed on Line 4 by process after some of with result . Let be any other of before with result . Then, .
Our linearization rule for s implies the following observations.
Let be an to component with an associated timestamp that is part of a and let be any to that appends an update triple with component and timestamp to . Then is linearized no later than .
If a of occurs after the linearization point of an to component with associated timestamp , then the result of contains an update triple with component and timestamp at least as large as .
We say that the result, , of a of contains a timestamp , if (or, more precisely, some component of ) contains an update triple with timestamp . The corollary of the next lemma says that a timestamp generated from is lexicographically larger than any timestamp contained in .
For any timestamp contained in the result, , of a of , , for all .
Suppose is generated from the result of a scan by some process . Then and , for . Since appends an update triple with timestamp to before is contained in the result of a , and occurs after . By Observation 1, is a prefix of . Hence, , for . ∎
Let be the result of a and let by any process. Then, for any timestamp contained in , .
Now we show that timestamps are unique.
Any two triples appended to that involve the same component of are associated with a different timestamp.
We show that every operation is associated with a different timestamp. Since no operation appends more than one triple for any component of , the claim follows.
Suppose two processes generate timestamps and from and of that return and , respectively. Then , , , and . If , then and . It follows that and . However, by Observation 1, this is impossible. Therefore, .
Now, consider two timestamps generated by the same process . Since appends one or more updates triples with timestamp to immediately after it generates , the result of any subsequent scan by contains . Thus, by Corollary 8, any timestamp generated by after is lexicograpically larger than . ∎
Next, we show that, operations that do not return can be considered to take effect atomically at their update on Line 4.
Let be the result of and be the result of . Suppose that performs an after and before . Since every appends triples with a new timestamp to , will hold on Line 9 in , and the must return . ∎
Let be a operation by that does not return and let be the on Line 4 in . Then, all s in are linearized at , consecutively, in order of the components they update.
Let and be the operations on Line 2 and Line 8 in , respectively, and let be the result of . Consider the timestamp associated with . Suppose some to before appends a triple with a timestamp . Then, contains this triple with timestamp and, by Corollary 8, .
Consider any that has appended an update triple with timestamp . If occurs before , then occurs between and . Let be the that contains , let be the of in on Line 2 from which is generated and let be the result of . is concurrent with and thus, not performed by . If , then, since , we have , implying that occurs after . But this is impossible, since occurs before . Therefore .
Since , there exists such that . This is only possible if process performed an after and before , or if is performed by . In the first case, since occurs before , which occurs before , which occurs before , this contradicts Lemma 10. In the second case, is an by between and . Since occurs before , this also contradicts Lemma 10.
Thus, all s with timestamp occur after . All s that are part of have the same timestamp . Therefore, all s by are linearized at . By Lemma 9, timestamps are unique. s linearized at the same point are ordered first by their timestamps and then by the components they update. Hence, all s that are part of will be ordered consecutively, sorted in order of their components. ∎
Next, let us consider operations that return .
Let be an to component with associated timestamp that is part of . is linearized at the first point that contains an update triple with component and timestamp . Note that is generated from on Line 3 in . By Corollary 8, all of the timestamps contained in are lexicographically smaller than . Thus, is linearized after . Since appends an update triple with component and timestamp , is linearized no later than by Observation 5. ∎
Thus, every is linearized within its execution interval.
Let be a by whose execution interval does not contain any s by a process to on Line 4 with . Then, does not return .
Next, we show that our choice of linearization points for s and s produces a valid linearization.
Let be a that returns . Suppose . Then, for each , is the value of the last to component of linearized before , or if no such exists.
Suppose that contains an update triple involving component . This triple was appended to by some update that is part of a . By Lemma 11 and Lemma 12, all in are linearized at or before . Hence, if no to component is linearized before , then .
Now, consider the last to component linearized before . Let be its associated timestamp. Let be the largest timestamp of any update triple with component in . By Observation 6, . By Lemma 9, there is exactly one update triple in with component and timestamp . By definition of , is the value of this update triple. Let be the to that appended during a operation and let be the to component in . Since is contained in , occurs before . By definition of , is linearized at .
Corollary 15 (s).
Consider any that returns . Then, for each , is the value of the last to component of linearized before the operation, or if no such exists.
We now consider the linearization of s. Suppose is a that does not return . Throughout the rest of this section, we use , , , , and as follows. Let be the of in on Line 2, let be the in on Line 4, let be the in on Line 8, let be the value of when returns on Line 16, and let be the last of that returns .
Consider any operation that does not return . Then occurs no earlier than and before .
Suppose is performed by process . Let be the result of and let be the value from for on Line 13 during . By Line 6 and Line 7, a process only to when it takes a of with result such that . appends triples with a new timestamp to , so any of performed after returns a result, , such that . Thus, if , then is the result of a of performed before .
By Lemma 16, occurs no earlier than and before , and thus the interval starting immediately after and ending with is contained within ’s execution interval. We call this interval the window of .
Consider any operation that does not return . Then, no operation is linearized during the window of .
For a contradiction, suppose that a operation is linearized in the window of . Let be the last in , performed on Line 7, and let be the result of . By definition, is the linearization point of , which, by assumption, occurs during the window of . It follows that is not performed by , which performs as its first step after . Let be the process that performs .