Lower Bounds for Shared-Memory Leader Election under Bounded Write Contention

08/05/2021
by   Dan Alistarh, et al.
0

This paper gives tight logarithmic lower bounds on the solo step complexity of leader election in an asynchronous shared-memory model with single-writer multi-reader (SWMR) registers, for both deterministic and randomized obstruction-free algorithms. The approach extends to lower bounds for deterministic and randomized obstruction-free algorithms using multi-writer registers under bounded write concurrency, showing a trade-off between the solo step complexity of a leader election algorithm, and its worst-case write contention.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

02/15/2018

Fine-Grained Complexity of Safety Verification

We study the fine-grained complexity of Leader Contributor Reachability ...
03/02/2020

Concurrent Disjoint Set Union

We develop and analyze concurrent algorithms for the disjoint set union ...
04/20/2021

Upper and Lower Bounds for Deterministic Approximate Objects

Relaxing the sequential specification of shared objects has been propose...
11/09/2020

Probabilistic Indistinguishability and the Quality of Validity in Byzantine Agreement

Lower bounds and impossibility results in distributed computing are both...
05/25/2021

The Topology of Randomized Symmetry-Breaking Distributed Computing

Studying distributed computing through the lens of algebraic topology ha...
11/07/2018

A new exact approach for the Bilevel Knapsack with Interdiction Constraints

We consider the Bilevel Knapsack with Interdiction Constraints, an exten...
05/26/2021

Contention Resolution with Predictions

In this paper, we consider contention resolution algorithms that are aug...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Leader election is a classic distributed coordination problem, in which a set of processors must cooperate to decide on the choice of a single “leader” processor. Each processor must output either a win or lose decision, with the property that, in any execution, a single processor may return win, while all other processors have to return lose. Moreover, any processor returns win in solo executions, in which it does not observe any other processor.

Due to its fundamental nature, the time and space complexity of variants of this problem in the classic asynchronous shared-memory model has been the subject of significant research interest. Leader election and its linearizable variant called test-and-set are weaker than consensus, as processors can decide without knowing the leader’s identifier. Test-and-set differs from leader election in that no processor may return lose before the eventual winner has joined the computation, and has consensus number two. It therefore cannot be implemented deterministically wait-free [19]. Tromp and Vitányi gave the first randomized algorithm for two-processor leader election [29], and Afek, Gafni, Tromp and Vitányi [1] generalized this approach to processors, using the tournament tree idea of Peterson and Fischer [27].

Their algorithm builds a complete binary tree with leaves; each processor starts at a leaf, and proceeds to compete in two-processor leader-election objects located at nodes, returning lose whenever it loses at such an object. The winner at the root returns win. Since each two-processor object can be resolved in expected constant time, their algorithm has expected step complexity against an adaptive adversary. Moreover, their algorithm only uses single-reader multiple-writer (SWMR) registers: throughout any execution, any register may only be written by a single processor, although it may be read by any processor.

Follow-up work on time upper bounds has extended these results to the adaptive setting, showing logarithmic expected step complexity in the number of participating processors  [4, 16]. Further, Giakkoupis and Woelfel [16] showed that, if the adversary is oblivious to the randomness used by the algorithm, step complexity is achievable, improving upon a previous sub-logarithmic upper bound by Alistarh and Aspnes [2]. Another related line of work has focused on the space complexity of this problem, which is now resolved. Specifically, it is known that distinct registers are necessary [28, 16], and a breakthrough result by Giakkoupis, Helmi, Higham, and Woelfel [15] provided the first asymptotically matching upper bound of , improving upon an algorithm by the same authors [14].

The clear gap in the complexity landscape for this problem concerns time complexity lower bounds. Specifically, in the standard case of an adaptive adversary, the best known upper bound is the venerable tournament-tree algorithm we described above [1], which has expected time complexity and uses SWMR registers. It is not known whether one can perform leader election in classic asynchronous shared-memory faster than a tournament.111Sub-logarithmic step complexity is achievable in other models, e.g. distributed and cache-coherent shared-memory [17] or message-passing [5]. Due to the simplicity of the problem, none of the classic lower bound approaches, e.g. [23, 24, 22], apply, and resolving the time complexity of shared-memory leader election is known to be a challenging open problem [2, 16]. Moreover, given that the step complexities of shared-memory consensus [8] and renaming [3] have been resolved, leader election remains one of the last basic objects for which no tight complexity bounds are known.

We show tight logarithmic lower bounds on the step complexity of leader election in asynchronous shared-memory with SWMR registers. Our motivating result is a natural potential argument showing that any deterministic obstruction-free algorithm for leader election—in which processors must return if they execute enough solo steps—must have worst-case step complexity in solo executions, that is, even if processors execute in the absence of concurrency, as long as registers are SWMR.

Our main contribution is a new and non-trivial technique showing that a similar statement holds for randomized algorithms: in the same model, any obstruction-free algorithm for leader election has worst-case expected cost . In this case as well, the lower bound holds in terms of expected step complexity in solo executions. The lower bound technique is based on characterizing the expected length of solo executions by analyzing the number of reads and writes over distinct registers required by a correct algorithm.

These are the first non-trivial lower bounds on the time complexity of classic shared-memory leader election, although they assume restrictions on the algorithms. They are both matched asymptotically by the tournament-tree approach, as the algorithm of [1] can be modified to be deterministic obstruction-free, by using two-processor obstruction-free leader election objects. This essentially shows that the tournament strategy is optimal for SWMR registers. The results will also apply to the case where algorithms may employ stronger two-processor read-modify-write primitives, such as two-processor test-and-set operations instead of reads and exclusive writes. Interestingly, the result holds for a weak version of leader election, in which all processors may return lose if they are in a contended execution.

The main limitation of the approach concerns the SWMR restriction on the registers used by the algorithm. We investigate relaxations of this, and show that, for deterministic algorithms, if is the maximum number of processors which might be poised to write to a register in any given execution, then any algorithm will have worst-case solo step complexity . Conversely, assuming is super-constant with respect to , any algorithm with solo step complexity will have an execution in which distinct processors are poised to write concurrently to the same register. Since this latter quantity is an asymptotic lower bound on the worst-case stall complexity at a processor [13],222If processors are poised to write concurrently to a register, then the last processor to write will incur stalls. this yields a logarithmic trade-off between the worst-case slow-down due to steps in a solo execution, and the worst-case slow-down at a processor due to high register contention, measured in stalls, for any deterministic algorithm.

We generalize this argument to the randomized case as well, to show that, for , any algorithm ensuring that at most worst-case stalls at a processor must have expected step complexity . In practical terms, our results show that any gain made due to decreased steps on the solo fast-path is paid for by an increase in the worst-case stall complexity at a processor incurred by any obstruction-free leader election algorithm.

Additional Related Work.

The previous section already covered known time and space complexity results for the classic leader election problem in the standard asynchronous shared-memory model. This fundamental problem has also been considered in under related models and complexity metrics. Specifically, Golab, Hendler and Woelfel [17] have shown that leader election can be solved using constant remote memory references (RMRs) in the cache-coherent (CC) and distributed shared-memory (DSM) models. Their result circumvents our lower bounds due to differences in the model and in the complexity metrics. In the same model, Eghbali and Woelfel [12] have shown that abortable leader election requires time in the worst case. The abortability constraint imposes stronger semantics, and they consider a different notion of complexity cost, but multi-writer registers.

In addition, our results are also related to early work by Anderson and Yang [30], who assume bounds on the write contention at each register, and prove lower bounds for a weak version of mutual exclusion, assuming bounded write contention per register. Upon careful consideration, one can obtain that their approach can be used to prove a similar logarithmic lower bound for obstruction-free leader election in the read-write model with contention constraints. However, we emphasize that their argument works only for deterministic algorithms.

Specifically, relative to this paper, our contribution is the randomized lower bound. The argument of Anderson and Yang [30] does not generalize to randomized algorithms, for the same reason that the simple deterministic argument we provide as motivation does not generalize in the randomized case. Even focusing on the deterministic case, our approach is slightly different than the one of Anderson and Kim: we use covering plus a potential argument, while they use a different covering argument based on eliminating contending processors by leveraging Turan’s theorem. However, their approach can provide a better dependency on contention in the bound: versus in our case.

We note that similar trade-offs between contention and step complexity have been studied by Dwork, Herlihy and Waarts [11], and by Hendler and Shavit [18], although in the context of different objects, and for slightly different notions of cost. We believe this paper is the first to approach such questions for randomized algorithms, and for leader election.

From the technical perspective, the simple deterministic argument we propose can be viewed as a covering argument [23, 10, 9, 24, 7], customized for the leader-election problem, and leveraging the SWMR property. The new observation is the potential argument showing that some processor must incur distinct steps in a solo execution. To our knowledge, the lower bound approach for randomized algorithms is new. The generalized argument for bounded concurrent-write contention implies bounds in terms of the stall metric of Ellen, Hendler and Shavit [13], which has also been employed by other work on lower bounds, e.g. [7]. These prior approaches do not apply to leader election.

2 Model, Preliminaries, and Problem Statement

We assume the asynchronous shared-memory model, in which processors may participate in an execution, of which may fail by crashing. Processors are equipped with unique identifiers, which they may use during the computation. For simplicity, we will directly use the corresponding indices, e.g. , to identify processors in the following, and denote the set of all processors by . Unless otherwise stated, we assume that processors communicate via atomic read and write operations applied to a finite set of registers. The scheduling of processor steps is controlled by a strong (adaptive) adversary, which can observe the structure of the algorithm and the full state of processors, including their random coin flips, before deciding on the scheduling.

As stated, our approach assumes that the number of processors which may be poised to write to any register during the execution is deterministically bounded. Specifically, for an integer parameter , we assume algorithms ensure -concurrent write contention: in any execution of the algorithm, at most processors may be concurrently poised to write to any given register. We note that, equivalently, we could assume that the worst-case write-stall complexity of the algorithms is , as having processors concurrently poised to write to a given register necessarily implies that the “last” processor scheduled to write incurs stalls, one for each of the other writes.

Notice that this assumption implies a (possibly random) mapping between each register and the set of processors which write to it in every execution. For , we obtain a variant of the SWMR model, in which a single processor may write to a given register in an execution. Specifically, we emphasize that we allow this mapping between registers and writers to change between executions: different processors may write to the same register, but in different executions. This is a generalization of the classic SWMR property, which usually assumes that the processor-to-registers mapping is fixed across all executions.

Without loss of generality, we will assume that algorithms follow a fixed pattern, consisting of repetitions of the following sequence: 1) a shared read operation, possibly followed by local computation, including random coin flips, and 2) a shared write operation, again possibly followed by local computation and coin flips. Note that any algorithm can be re-written following this pattern, without changing its asymptotic step complexity: if necessary, one can insert dummy read and write operations to dedicated NULL registers.

We measure complexity in terms of processor steps: each shared-memory operation is counted as a step. Total step complexity will count the total number of processor steps in an execution, while individual step complexity, which is our focus, is the number of steps that any single processor may perform during any execution.

We now introduce some basics regarding terminology and notation for the analysis, following the approach of Attiya and Ellen [9]. We view the algorithm as specifying the set of possible states for each processor. At any point in time, for any processor, there exists a single next step that the processor is poised to take, which may be either a shared-memory read or write step. Following the step, the processor changes state, based on its previous state, the response received from the shared step (e.g., the results of a read), and its local computation or coin flips. Deterministic protocols have the property that the processor state following a step is exclusively determined by the previous state and the result of the shared step, e.g. the value read. Randomized

protocols have the property that the processor has multiple possible next steps, based on the results of local coin flips following the shared-memory step. Each of these possible next steps has a certain non-zero probability. As standard, we assume that the randomness provided to the algorithm is

finite-precision, and so, the number of possible next steps at each point is countable.333Our analysis would also work in the absence of this requirement. However, it appears to be standard, and it will simplify the presentation: it will allow us to sum, rather than integrate, over possible executions.

A configuration of the algorithm is completely determined by the state of each processor, and by the contents of each register. We assume that initially all registers have some pre-determined value, and thus the initial configuration is only determined by the input state (or value) of each processor. Two configurations and are said to be indistinguishable to processor if has the same state in and , and all registers have the same contents in both configurations.

A processor is said to be poised to perform step , which could be a read or a write, in configuration if is the next step that will perform given . Given a valid configuration and a valid next step by , we denote the configuration after is performed by as . An execution is simply a sequence of such valid steps by processors, starting at the initial configuration. Thus, a configuration is reachable if there exists an execution resulting in . In the following, we will pay particular attention to solo processor executions, that is, executions in which only a single processor takes steps.

Our progress requirement for algorithms will be obstruction-freedom [20], also known as solo-termination [23]. Specifically, an algorithm satisfies this condition if, from any reachable configuration , any processor must eventually return a decision in every -solo extension of , i.e. in every extension such that only consists of steps by .

In the following, we will prove lower bounds for the following simplified variant of the leader election problem.

Definition 1 (Weak Leader Election).

In the Weak Leader Election problem, each participating processor starts with its own identifier as input, and must return either win or lose. The following must hold:

  1. (Leader Uniqueness) In any execution, at most a single processor can return win.

  2. (Solo Output) Any processor must return win in any execution in which it executes solo.

We note that this variant does not fully specify return values in contended executions—in particular, under this definition, all processors may technically return lose if they are certain that they are not in a solo execution—and does not require linearizability [21], so it is weaker than test-and-set. Our results will apply to this weaker problem variant.

3 Lower Bound for Deterministic Algorithms

As a warm-up result, we provide a simple logarithmic lower bound for the solo step complexity of leader election with SWMR registers. Specifically, the rest of this section is dedicated to proving the following statement:

Theorem 3.1.

Any deterministic leader election protocol in asynchronous shared-memory with SWMR registers has worst-case solo step complexity.

3.1 Adversarial Strategy

We will specify the lower bound algorithmically, as an iterative procedure that the adversary can follow to create a worst-case execution. More precisely, the adversarial strategy will proceed in steps and will maintain two sets of processors at each step, the available set and the frozen set . In addition, we maintain a prefix of the worst-case execution, which we denote by .

Initially, all processors are in initial state, and placed in the pool of available processors , while the set of frozen processors is empty, and the worst-case execution is empty as well. In addition, we will associate a blame counter to each available processor , initially . Intuitively, this represents the number of processors that were placed in the frozen set because of .

In each step , we first identify the processor whose blame count is minimal among processors from the available set , breaking ties arbitrarily. We then execute the sequence of solo steps of processor , until we first encounter a write step of to some register which is read by some available processor in its solo execution. Note that the step itself is not added to the execution prefix . Below, in Lemma 3.3, we will show that such a write step by must necessarily exist: otherwise, we could run until it returns win, without this fact being visible to any other processor in the available set.

Having identified this first write step by , we “freeze” processor exactly before , and place it in the frozen set at the next step, , removing it from . We then update the worst-case execution prefix to . Finally, we increment the blame count by 1 for every processor with the property that reads from in its solo execution. At this point, step is complete, and we can move on to step . The process stops when there are no more available processors.

3.2 Analysis

We begin by noting the following invariants, maintained by the adversarial strategy:

Lemma 3.2.

At the beginning of each step , the adversarial strategy enforces the following:

  1. All available processors are in their initial state;

  2. The contents of all registers read by processors in during their respective solo executions are the same as in the initial configuration.

Proof.

Both claims follow by the structure of the construction. The first claim follows since the only processor which executes in any step is eliminated from . The second claim follows since, at every step , we freeze the corresponding processor before it writes to any register read by any of the remaining processors in . ∎

Notice that this result practically ensures that the execution prefix generated up to every step is indistinguishable from the initial configuration for processors in the available set . Next, we show that the strategy is well-defined, in the sense that the step processor specified above must exist at each iteration of the strategy.

Lemma 3.3.

Fix a step and let be the chosen processor of minimal blame count . Then there must exist a step in the solo execution of which writes to some register which is read by some available processor .

Proof.

We will begin by proving a slightly stronger statement, that is, for any processor , there must exist a register which is written by in its solo execution and read by in its solo execution. We will then choose to be the first such register written to by in its solo execution, and to be the corresponding write step.

Assume for contradiction that there exists a processor , which does not read from any registers written to by in its solo execution. By Lemma 3.2, the current execution is indistinguishable from a solo execution for . Thus, if runs solo from the prefix until completion, must return win. However, if runs solo after returns, also must return win, since it does not read from any register which wrote to, and therefore, by Lemma 3.2, it observes a solo execution as well. This contradicts the leader uniqueness property in the resulting execution.

We have therefore established that every other processor must eventually read from a register written to by in its solo execution. (Notice that these registers need not be distinct with respect to processors.) Let be the step where first writes to during its solo execution. To ensure the requirements of the adversarial strategy, it suffices to pick to be the first such step , in temporal order, in ’s solo execution. ∎

We now return to the proof, and focus on the blame counts of available processors at any fixed step , . Define the potential to be at time .

Since and , for all processors , we have that . Next, we show that, due to the way in which we choose the next processor to be executed, we can always lower bound this potential by .

Lemma 3.4.

For any step , we have .

Proof.

We will proceed by induction. The base step is outlined above. Fix therefore a step such that .

Again, let be the processor we freeze at step . For each , let be the weight by which we incremented the blame count of processor in this step. By Lemma 3.3. we have that there exist such that . Further, since we chose to execute the processor with minimal blame count, we have that . Let us now analyze the difference

Hence, , as required. ∎

To complete the proof of Theorem 3.1, let be the last remaining non-frozen processor before the process completes, i.e. . By Lemma 3.4, we have that , which implies that . Further, notice that processor must have performed at least distinct read operations: for every increment of , there must exist a unique processor which wrote to some register from which reads in its solo execution. Since we are assuming SWMR registers, the reads performed by must be also unique. Hence, processor performs steps in a solo execution, implying an solo step complexity lower bound for the algorithm. This strongly suggests that the tournament-tree approach is optimal for SWMR registers.

3.3 Discussion

Bounded Concurrent-Write Contention and Stalls.

Second, it is interesting to observe what happens to the above argument in the case of multi-writer registers. Let be the bound on the concurrent-write contention over any single register, in any execution, that is, on the maximum number of processors which may be concurrently poised to write to a register. Notice that the overall construction and the blaming mechanism would still work. Therefore, the potential lower bound still holds, but in the proof of the last step, the different steps taken by the last processor do not necessarily need to be distinct. Specifically, we note that a single read step by may be counted at most times, once for each different processor which may be frozen upon its write to the corresponding register. The lower bound is therefore weakened linearly in .

Corollary 3.4.1.

Any deterministic leader election protocol in asynchronous shared-memory where at most may be poised to write to a register concurrently has worst-case solo step complexity . Moreover, if the lower bound construction above implies worst-case step complexity for a processor, then there must exist an execution in which the concurrent-write contention on some register is .

Recall that, when interpreted in the stall model of [13], having processors poised to write to a register at the same time implies (write-)stall complexity for one of the processors. Thus, this last result implies a logarithmic multiplicative trade-off between the worst-case step complexity of a protocol and its worst-case stall complexity.

Stronger Primitives.

Third, we note that this approach can also be extended to deterministic algorithms employing SWMR registers supporting read, and write, and additionally -processor test-and-set objects. We can then apply the same freezing strategy, and note that an access to a test-and-set object can only lead to freezing a processor and incrementing the blame counter of another processor just once (otherwise there is a combined execution with more than processors accessing it). Hence, we still obtain a lower bound of solo step complexity, i.e. that the tournament tree is the optimal strategy.

4 Lower Bound for Randomized Algorithms

We now shift gears and present our main result, which is a logarithmic expected-time lower bound for randomized obstruction-free algorithms. Our approach in this case will be different, as we are unable to build an explicit worst-case adversarial strategy. Instead, we will argue about the expected length of executions by bounding the expected number of reads over distinct registers required for algorithms to be correct. In turn, this will require a careful analysis of the probability distribution over solo executions of a specific well-chosen structure.

We first focus on the SWMR case, and cover it exclusively in Sections 4.1 to 4.3. We will then provide a generalization to MWMR registers under bounded concurrency in Section 4.4.

4.1 Preliminaries

Fix a processor . For each , we define the set as the set of all possible solo executions of , and will focus on understanding the probability distribution over reads and writes for executions in . By the solo output property of the algorithm (Definition 1), all these executions have to be finite in length. For any possible solo execution of processor , will be used to denote the probability that if we let run run solo, it will execute and return. In particular, .

Let denote the set of all registers which could be used by the algorithm over all solo executions by some processor. Since the randomness provided to the algorithm is finite-precision, the number of possible next steps in every configuration is countable and by the spiral argument, must be countable as well444Our argument works even when isn’t countable, but this simplifies notation, e.g. discrete sums.. Fix a register ; by definition, is read or written by some processor during some solo execution. Let be a set of all solo executions which read from a register :

We define the read potential to roughly count the sum of probabilities that a register is read from during solo executions by any processor. Formally,

Analogously, let be a set of all solo executions which write to a register :

We define the write potential of a register as

For the simplicity we assume that for any , (or alternatively ). Otherwise, the reads from do not change the outcome of the solo executions, and we can assume that they do not use .

Further, for any given solo execution of processor , we define the trace of , , as the sequence of registers written by during , in the order in which they were written, but omitting duplicate registers. For instance, if in execution processor wrote to , then followed by again and finally , the trace would be (notice that registers are sorted by the order they are written to for the first time in ). Also, for each register and solo execution , let be the index of register in the trace of . That is, if , then .

Our lower bound relies heavily on double-counting techniques. To familiarize the reader with notation and provide some intuition, we isolate and prove the following simple properties of traces. We fix an execution , and the corresponding notation, as defined above.

Lemma 4.1.

Given the above notation, we have that .

Proof.

Fix a processor . Recall that every processor has to read from some register which writes to in its solo execution . Otherwise, there is an interleaving of ’s solo execution , followed by ’s execution, which neither nor can distinguish from their respective solo executions. Therefore, in this interleaved execution, and will both return win, which leads to a contradiction.

This means that, for every solo execution , there exists a register , such that and hence:

Before proving the next lemma, we provide an intuition from the deterministic setting. For each processor , assume that there exists such that and consider the sum . For each register , we know that appears times in this summation. Hence, .

Lemma 4.2.
(1)
Proof.

We have that

Where in the second equality we simply rearranged the terms. ∎

Finally, we will need the following useful property.

Lemma 4.3.

For any sequence of positive real numbers , we have that

Proof.

Notice that for any :

Hence:

4.2 The “Carefully-Normalized” Read Potential Lemma

Our lower bound is based on the following key lemma, which intuitively provides a lower bound over the sum of the read potentials of registers written to in a solo execution by processor . Importantly, the read potentials are carefully normalized by, roughly, the probability that these registers are written to by other processors in some other executions.

Lemma 4.4.

Let be a solo execution of processor , and be its trace.
Then, .

The rest of this sub-section will be dedicated to proving this lemma. Specifically, we prove the following two claims in the context of the theorem, i.e. for a fixed solo execution of processor . We expand as

Claim 1.

.

Proof.

Applying Lemma 4.3 to positive real numbers , we get:

By Lemma 4.1, , completing the proof. ∎

Next, we prove the following extension:

Claim 2.

Under the above notation, we have

Proof.

Let us substitute the definition of . We want to prove that:

Both the left and right-hand sides of this expression contain the sum of probabilities of certain solo executions. On the right hand side, (the probability of) any execution of processor can appear at most once. This is not necessarily true for the left hand side due to the outer summation. Therefore, we only need to show that for any whose probability is included in the summation on the right hand side, is also included in the summation on the left hand side—in other words, there exists , such that .

We prove this fact by contradiction. Suppose processor has a solo execution such that register is written to during , but no register among are read (which are all registers written prior to in ’s solo execution ). Now consider a combined execution of and , which consists of running as in until it becomes poised to write register —crucially, please note that so far has actually executed solo. From this point, we consider processor executing identically as it runs solo in execution . This is possible because the only registers written to so far the system are , which does not read in . As a result, will write to register , after which we can immediately allow to also write to . This implies that two processors write to the same register during the same execution, and contradicting the SWMR property. ∎

Then, Lemma 4.4 follows by combining Claim 1Claim 2 and the definition of trace.

4.3 Completing the Lower Bound Proof

We now finally proceed to proving the following theorem:

Theorem 4.5.

Any randomized leader election protocol in asynchronous shared-memory with SWMR registers has worst-case expected solo step complexity.

Proof.

We start by summing up inequalities given by Lemma 4.4 for all processors, and their solo executions:

(2)

where in the last step we used that . Hence, by using and Lemma 4.2 we get that

Note that is the lower bound on the expected number of total reads from register . Hence, since the expected number of total reads is at least , there must exist a processor which performs at least reads in expectation. ∎

4.4 Extension for Bounded Concurrent-Write Contention

We now extend our result to the case where the maximum number of processors which may be poised to write concurrently to a register, which we defined as the concurrent-write contention, is bounded. Specifically, suppose that, in any execution, at most different processors may be poised to write to the same register. We preserve the notation from the previous section. Upon close examination, notice that Lemma 4.1 and Lemma 4.2 still hold in this MWMR model, as well as Claim 1, since they do not employ the SWMR property. (By contrast, Claim 2 no longer holds for .) We therefore continue to use only the above results.

We will prove a lower bound on the expected solo step complexity, under the above assumptions on . As before, let be the trace of execution .

Lemma 4.6.

We have that

.

Proof.

The proof is similar to the proof of Theorem 4.5, but using Claim 1 directly instead of Lemma 4.4. Specifically, we start by summing up the inequalities resulting from Claim 1 for all processors and solo executions:

The last equality follows by re-arranging terms to be grouped by register instead of by processor. Note that this is similar to the proof of Lemma 4.2. However, in this case the resulting equation cannot be simplified further, since, unlike , the denominator term also depends on the execution . ∎

For any register , we call the set of processors a poise set for if:

  • contains solo executions of different processors, i.e. , such that , and for .

  • Let be the prefix of up to and including ’s first write step to the register . There exists a combined execution by processors , such that at the end of all processors have written to . Moreover, is indistinguishable from to (i.e. takes steps as in and does not read anything written by until it writes to ).

As processors can be poised to write to in the combined execution, no poise set can have size .

Lemma 4.7.

Let be a set of solo executions. Let be the maximum size of a poise set for register among executions in . Then, there exists a subset of executions , such that:

  • ;

  • Every poise set for register among executions in has size at most .

Proof.

Let , i.e. the sum of probabilities of executions in , excluding executions by processors. We define the set as follows:

So, an execution is included in if the sum of read potentials of registers written prior to in is lower bounded by a term that depends on . Notice that for this is analogous to the condition in Claim 2.

The parameter satisfies the following useful property:

(3)

where in (3) we have used that, from the definition of , .
Using this property, we get:

This proves the first part of the lemma. We prove the second part of the lemma by contradiction. Suppose there is a poise set for register among executions in , where is an execution of processor .

Consider any execution for . Execution must read one of the registers written during some time step before the point when is written in . Otherwise, , and more precisely, the prefix of up to the write to , can be added at the end of ’s interleaved execution, implying that would be a poise set of size among executions in , which does not exist by definition. Hence:

Notice how this generalizes Claim 2: we can now apply the pigeonhole principle to terms on the left hand side. We get that for some , , giving the desired contradiction, specifically, that consists of executions from only. This completes the proof of the Lemma. ∎

We are now ready to prove the main result of this section.

Theorem 4.8.

Any randomized leader election protocol in asynchronous shared-memory has worst-case expected solo step complexity, when is the maximum number of processors which may be poised to write concurrently to the same register.

Proof.

Fix a register . We start by applying Lemma 4.7 to the set and the maximum poise set size of . Let be the resulting subset of executions , and be the maximum size of a poise set among executions in . Next, we apply Lemma 4.7 again to , and define , and as the maximum size of a poise set among executions in . The next application of Lemma 4.7 will be to , defining and . We repeat the process until some becomes , implying that the set of remaining executions is empty. Since Lemma 4.7 will be applied at most times. Therefore we have obtained that:

as there are at most terms, each of which is upper bounded by , by Lemma 4.7.

Combined with Lemma 4.6, this gives . By the pigeonhole principle, some processor must perform at least reads in expectation, over its solo executions. ∎

5 A Complementary Upper Bound for Weak Leader Election

Figure 1: The classic Lamport splitter [25], restated following [26, 6].

It is interesting to consider whether the lower bound approach can be further improved to address the MWMR model under -concurrent write contention. This is not the case for the specific definition of the weak leader election problem we consider (Definition 1), and to which the lower bound applies. To establish this, it suffices to notice that the classic splitter construction of Lamport [25] solves weak leader election for processes, in constant time, by leveraging MWMR registers with maximal (concurrent) write contention .

Please recall that this construction, restated for convenience in Figure 1, uses two MWMR registers. Given a splitter, we can simply map the stop output to win, and the left and right outputs to lose. In this case, it is immediate to show that the splitter ensures the following:

  1. a processor will always return win in a solo execution, and

  2. no two processes may return win in the same execution.

This matches the requirements of the weak leader election problem, but not of test-and-set objects generally, as this algorithm has contended executions in which all processors return lose, which is also impractical.

One may further generalize this approach by defining -splitter objects for , each of which is restricted to participating processors (and thus also -concurrent write contention), and then arranging them in a complete -ary tree. We can then proceed similarly to tournament tree, to implement a weak leader election object. The resulting construction has step complexity in solo executions, suggesting that the dependency on provided by our argument can be further improved.

This observation suggests that the trade-off between step complexity and concurrent-write contention/worst-case stalls outlined by our lower bound may be the best one can prove for weak leader election, as this problem can be solved in constant time with MWMR registers, at the cost of linear worst-case stalls. At the same time, it shows that lower bound arguments wishing to approach the general version of the problem have to specifically leverage the fact that, even in contended executions, not all processors may return lose.

6 Conclusion

Overview.

We gave the first tight logarithmic lower bounds on the solo step complexity of leader election in an asynchronous shared-memory model with single-writer multi-reader (SWMR) registers, for both deterministic and randomized algorithms. We then extended these results to registers with bounded concurrent-write contention , showing a trade-off between the step solo complexity of algorithms, and their worst-case stall complexity. The approach admits additional extensions, and is tight in the SWMR case. The impossibility result is quite strong, in the sense that logarithmic time is required over solo executions of processors, and for a weak variant of leader election, which is not linearizable and allows processors to all return lose in in contended executions.

Future Work.

The key question left open is whether sub-logarithmic upper bounds for strong leader election / test-and-set may exist, specifically by leveraging multi-writer registers, or whether the lower bounds can be further strengthened. Another interesting question is whether our approach can be extended to handle different cost metrics, such as remote memory references (RMRs).

References

  • [1] Yehuda Afek, Eli Gafni, John Tromp, and Paul M. B. Vitányi. Wait-free test-and-set (extended abstract). In WDAG ’92: Proceedings of the 6th International Workshop on Distributed Algorithms, pages 85–94, 1992.
  • [2] Dan Alistarh and James Aspnes. Sub-logarithmic test-and-set against a weak adversary. In International Symposium on Distributed Computing, pages 97–109. Springer, 2011.
  • [3] Dan Alistarh, James Aspnes, Keren Censor-Hillel, Seth Gilbert, and Rachid Guerraoui. Tight bounds for asynchronous renaming. Journal of the ACM (JACM), 61(3):1–51, 2014.
  • [4] Dan Alistarh, Hagit Attiya, Seth Gilbert, Andrei Giurgiu, and Rachid Guerraoui. Fast randomized test-and-set and renaming. In Proceedings of the 24th international conference on Distributed computing, DISC’10, pages 94–108, Berlin, Heidelberg, 2010. Springer-Verlag. URL: http://portal.acm.org/citation.cfm?id=1888781.1888794.
  • [5] Dan Alistarh, Rati Gelashvili, and Adrian Vladu. How to elect a leader faster than a tournament. In Proceedings of the 2015 ACM Symposium on Principles of Distributed Computing, pages 365–374, 2015.
  • [6] James Aspnes. Notes on theory of distributed systems. arXiv preprint arXiv:2001.04235, 2020.
  • [7] James Aspnes, Keren Censor-Hillel, Hagit Attiya, and Danny Hendler. Lower bounds for restricted-use objects. SIAM Journal on Computing, 45(3):734–762, 2016.
  • [8] Hagit Attiya and Keren Censor. Tight bounds for asynchronous randomized consensus. J. ACM, 55(5):1–26, 2008. doi:http://doi.acm.org/10.1145/1411509.1411510.
  • [9] Hagit Attiya and Faith Ellen. Impossibility results for distributed computing. Synthesis Lectures on Distributed Computing Theory, 5(1):1–162, 2014.
  • [10] James E Burns and Nancy A Lynch. Bounds on shared memory for mutual exclusion. Information and Computation, 107(2):171–184, 1993.
  • [11] Cynthia Dwork, Maurice Herlihy, and Orli Waarts. Contention in shared memory algorithms. Journal of the ACM (JACM), 44(6):779–805, 1997.
  • [12] Aryaz Eghbali and Philipp Woelfel. An almost tight rmr lower bound for abortable test-and-set. arXiv preprint arXiv:1805.04840, 2018.
  • [13] Faith Ellen, Danny Hendler, and Nir Shavit. On the inherent sequentiality of concurrent objects. SIAM Journal on Computing, 41(3):519–536, 2012.
  • [14] George Giakkoupis, Maryam Helmi, Lisa Higham, and Philipp Woelfel. An O ( sqrt n) space bound for obstruction-free leader election. In International Symposium on Distributed Computing, pages 46–60. Springer, 2013.
  • [15] George Giakkoupis, Maryam Helmi, Lisa Higham, and Philipp Woelfel. Test-and-set in optimal space. In

    Proceedings of the forty-seventh annual ACM symposium on Theory of Computing

    , pages 615–623, 2015.
  • [16] George Giakkoupis and Philipp Woelfel. Efficient randomized test-and-set implementations. Distributed Computing, 32(6):565–586, 2019.
  • [17] Wojciech Golab, Danny Hendler, and Philipp Woelfel. An o(1) rmrs leader election algorithm. SIAM Journal on Computing, 39(7):2726–2760, 2010.
  • [18] Danny Hendler and Nir Shavit. Operation-valency and the cost of coordination. In Proceedings of the twenty-second annual symposium on Principles of distributed computing, pages 84–91, 2003.
  • [19] Maurice Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems, 13(1):123–149, January 1991.
  • [20] Maurice Herlihy, Victor Luchangco, and Mark Moir. Obstruction-free synchronization: Double-ended queues as an example. In 23rd International Conference on Distributed Computing Systems, 2003. Proceedings., pages 522–529. IEEE, 2003.
  • [21] Maurice P Herlihy and Jeannette M Wing. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems (TOPLAS), 12(3):463–492, 1990.
  • [22] Prasad Jayanti. A time complexity lower bound for randomized implementations of some shared objects. In Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing, pages 201–210, 1998.
  • [23] Prasad Jayanti, King Tan, and Sam Toueg. Time and space lower bounds for nonblocking implementations. SIAM Journal on Computing, 30(2):438–456, 2000.
  • [24] Yong-Jik Kim and James H Anderson. A time complexity lower bound for adaptive mutual exclusion. Distributed Computing, 24(6):271–297, 2012.
  • [25] Leslie Lamport. A fast mutual exclusion algorithm. ACM Transactions on Computer Systems (TOCS), 5(1):1–11, 1987.
  • [26] Mark Moir and James H. Anderson. Wait-free algorithms for fast, long-lived renaming. Science of Computer Programming, 25:1–39, 1995.
  • [27] Gary L. Peterson and Michael J. Fischer. Economical solutions for the critical section problem in a distributed system (extended abstract). In Proceedings of the ninth annual ACM symposium on Theory of computing, STOC ’77, pages 91–97, New York, NY, USA, 1977. ACM. URL: http://doi.acm.org/10.1145/800105.803398, doi:http://doi.acm.org/10.1145/800105.803398.
  • [28] Eugene Styer and Gary L Peterson. Tight bounds for shared memory symmetric mutual exclusion problems. In Proceedings of the eighth annual ACM Symposium on Principles of distributed computing, pages 177–191, 1989.
  • [29] John Tromp and Paul Vitányi. Randomized two-process wait-free test-and-set. Distrib. Comput., 15(3):127–135, 2002. doi:http://dx.doi.org/10.1007/s004460200071.
  • [30] Jae-Heon Yang and James H Anderson. Time bounds for mutual exclusion and related problems. In Proceedings of the twenty-sixth annual ACM symposium on Theory of Computing, pages 224–233, 1994.