1 Introduction
Leader election is a classic distributed coordination problem, in which a set of processors must cooperate to decide on the choice of a single “leader” processor. Each processor must output either a win or lose decision, with the property that, in any execution, a single processor may return win, while all other processors have to return lose. Moreover, any processor returns win in solo executions, in which it does not observe any other processor.
Due to its fundamental nature, the time and space complexity of variants of this problem in the classic asynchronous sharedmemory model has been the subject of significant research interest. Leader election and its linearizable variant called testandset are weaker than consensus, as processors can decide without knowing the leader’s identifier. Testandset differs from leader election in that no processor may return lose before the eventual winner has joined the computation, and has consensus number two. It therefore cannot be implemented deterministically waitfree [19]. Tromp and Vitányi gave the first randomized algorithm for twoprocessor leader election [29], and Afek, Gafni, Tromp and Vitányi [1] generalized this approach to processors, using the tournament tree idea of Peterson and Fischer [27].
Their algorithm builds a complete binary tree with leaves; each processor starts at a leaf, and proceeds to compete in twoprocessor leaderelection objects located at nodes, returning lose whenever it loses at such an object. The winner at the root returns win. Since each twoprocessor object can be resolved in expected constant time, their algorithm has expected step complexity against an adaptive adversary. Moreover, their algorithm only uses singlereader multiplewriter (SWMR) registers: throughout any execution, any register may only be written by a single processor, although it may be read by any processor.
Followup work on time upper bounds has extended these results to the adaptive setting, showing logarithmic expected step complexity in the number of participating processors [4, 16]. Further, Giakkoupis and Woelfel [16] showed that, if the adversary is oblivious to the randomness used by the algorithm, step complexity is achievable, improving upon a previous sublogarithmic upper bound by Alistarh and Aspnes [2]. Another related line of work has focused on the space complexity of this problem, which is now resolved. Specifically, it is known that distinct registers are necessary [28, 16], and a breakthrough result by Giakkoupis, Helmi, Higham, and Woelfel [15] provided the first asymptotically matching upper bound of , improving upon an algorithm by the same authors [14].
The clear gap in the complexity landscape for this problem concerns time complexity lower bounds. Specifically, in the standard case of an adaptive adversary, the best known upper bound is the venerable tournamenttree algorithm we described above [1], which has expected time complexity and uses SWMR registers. It is not known whether one can perform leader election in classic asynchronous sharedmemory faster than a tournament.^{1}^{1}1Sublogarithmic step complexity is achievable in other models, e.g. distributed and cachecoherent sharedmemory [17] or messagepassing [5]. Due to the simplicity of the problem, none of the classic lower bound approaches, e.g. [23, 24, 22], apply, and resolving the time complexity of sharedmemory leader election is known to be a challenging open problem [2, 16]. Moreover, given that the step complexities of sharedmemory consensus [8] and renaming [3] have been resolved, leader election remains one of the last basic objects for which no tight complexity bounds are known.
We show tight logarithmic lower bounds on the step complexity of leader election in asynchronous sharedmemory with SWMR registers. Our motivating result is a natural potential argument showing that any deterministic obstructionfree algorithm for leader election—in which processors must return if they execute enough solo steps—must have worstcase step complexity in solo executions, that is, even if processors execute in the absence of concurrency, as long as registers are SWMR.
Our main contribution is a new and nontrivial technique showing that a similar statement holds for randomized algorithms: in the same model, any obstructionfree algorithm for leader election has worstcase expected cost . In this case as well, the lower bound holds in terms of expected step complexity in solo executions. The lower bound technique is based on characterizing the expected length of solo executions by analyzing the number of reads and writes over distinct registers required by a correct algorithm.
These are the first nontrivial lower bounds on the time complexity of classic sharedmemory leader election, although they assume restrictions on the algorithms. They are both matched asymptotically by the tournamenttree approach, as the algorithm of [1] can be modified to be deterministic obstructionfree, by using twoprocessor obstructionfree leader election objects. This essentially shows that the tournament strategy is optimal for SWMR registers. The results will also apply to the case where algorithms may employ stronger twoprocessor readmodifywrite primitives, such as twoprocessor testandset operations instead of reads and exclusive writes. Interestingly, the result holds for a weak version of leader election, in which all processors may return lose if they are in a contended execution.
The main limitation of the approach concerns the SWMR restriction on the registers used by the algorithm. We investigate relaxations of this, and show that, for deterministic algorithms, if is the maximum number of processors which might be poised to write to a register in any given execution, then any algorithm will have worstcase solo step complexity . Conversely, assuming is superconstant with respect to , any algorithm with solo step complexity will have an execution in which distinct processors are poised to write concurrently to the same register. Since this latter quantity is an asymptotic lower bound on the worstcase stall complexity at a processor [13],^{2}^{2}2If processors are poised to write concurrently to a register, then the last processor to write will incur stalls. this yields a logarithmic tradeoff between the worstcase slowdown due to steps in a solo execution, and the worstcase slowdown at a processor due to high register contention, measured in stalls, for any deterministic algorithm.
We generalize this argument to the randomized case as well, to show that, for , any algorithm ensuring that at most worstcase stalls at a processor must have expected step complexity . In practical terms, our results show that any gain made due to decreased steps on the solo fastpath is paid for by an increase in the worstcase stall complexity at a processor incurred by any obstructionfree leader election algorithm.
Additional Related Work.
The previous section already covered known time and space complexity results for the classic leader election problem in the standard asynchronous sharedmemory model. This fundamental problem has also been considered in under related models and complexity metrics. Specifically, Golab, Hendler and Woelfel [17] have shown that leader election can be solved using constant remote memory references (RMRs) in the cachecoherent (CC) and distributed sharedmemory (DSM) models. Their result circumvents our lower bounds due to differences in the model and in the complexity metrics. In the same model, Eghbali and Woelfel [12] have shown that abortable leader election requires time in the worst case. The abortability constraint imposes stronger semantics, and they consider a different notion of complexity cost, but multiwriter registers.
In addition, our results are also related to early work by Anderson and Yang [30], who assume bounds on the write contention at each register, and prove lower bounds for a weak version of mutual exclusion, assuming bounded write contention per register. Upon careful consideration, one can obtain that their approach can be used to prove a similar logarithmic lower bound for obstructionfree leader election in the readwrite model with contention constraints. However, we emphasize that their argument works only for deterministic algorithms.
Specifically, relative to this paper, our contribution is the randomized lower bound. The argument of Anderson and Yang [30] does not generalize to randomized algorithms, for the same reason that the simple deterministic argument we provide as motivation does not generalize in the randomized case. Even focusing on the deterministic case, our approach is slightly different than the one of Anderson and Kim: we use covering plus a potential argument, while they use a different covering argument based on eliminating contending processors by leveraging Turan’s theorem. However, their approach can provide a better dependency on contention in the bound: versus in our case.
We note that similar tradeoffs between contention and step complexity have been studied by Dwork, Herlihy and Waarts [11], and by Hendler and Shavit [18], although in the context of different objects, and for slightly different notions of cost. We believe this paper is the first to approach such questions for randomized algorithms, and for leader election.
From the technical perspective, the simple deterministic argument we propose can be viewed as a covering argument [23, 10, 9, 24, 7], customized for the leaderelection problem, and leveraging the SWMR property. The new observation is the potential argument showing that some processor must incur distinct steps in a solo execution. To our knowledge, the lower bound approach for randomized algorithms is new. The generalized argument for bounded concurrentwrite contention implies bounds in terms of the stall metric of Ellen, Hendler and Shavit [13], which has also been employed by other work on lower bounds, e.g. [7]. These prior approaches do not apply to leader election.
2 Model, Preliminaries, and Problem Statement
We assume the asynchronous sharedmemory model, in which processors may participate in an execution, of which may fail by crashing. Processors are equipped with unique identifiers, which they may use during the computation. For simplicity, we will directly use the corresponding indices, e.g. , to identify processors in the following, and denote the set of all processors by . Unless otherwise stated, we assume that processors communicate via atomic read and write operations applied to a finite set of registers. The scheduling of processor steps is controlled by a strong (adaptive) adversary, which can observe the structure of the algorithm and the full state of processors, including their random coin flips, before deciding on the scheduling.
As stated, our approach assumes that the number of processors which may be poised to write to any register during the execution is deterministically bounded. Specifically, for an integer parameter , we assume algorithms ensure concurrent write contention: in any execution of the algorithm, at most processors may be concurrently poised to write to any given register. We note that, equivalently, we could assume that the worstcase writestall complexity of the algorithms is , as having processors concurrently poised to write to a given register necessarily implies that the “last” processor scheduled to write incurs stalls, one for each of the other writes.
Notice that this assumption implies a (possibly random) mapping between each register and the set of processors which write to it in every execution. For , we obtain a variant of the SWMR model, in which a single processor may write to a given register in an execution. Specifically, we emphasize that we allow this mapping between registers and writers to change between executions: different processors may write to the same register, but in different executions. This is a generalization of the classic SWMR property, which usually assumes that the processortoregisters mapping is fixed across all executions.
Without loss of generality, we will assume that algorithms follow a fixed pattern, consisting of repetitions of the following sequence: 1) a shared read operation, possibly followed by local computation, including random coin flips, and 2) a shared write operation, again possibly followed by local computation and coin flips. Note that any algorithm can be rewritten following this pattern, without changing its asymptotic step complexity: if necessary, one can insert dummy read and write operations to dedicated NULL registers.
We measure complexity in terms of processor steps: each sharedmemory operation is counted as a step. Total step complexity will count the total number of processor steps in an execution, while individual step complexity, which is our focus, is the number of steps that any single processor may perform during any execution.
We now introduce some basics regarding terminology and notation for the analysis, following the approach of Attiya and Ellen [9]. We view the algorithm as specifying the set of possible states for each processor. At any point in time, for any processor, there exists a single next step that the processor is poised to take, which may be either a sharedmemory read or write step. Following the step, the processor changes state, based on its previous state, the response received from the shared step (e.g., the results of a read), and its local computation or coin flips. Deterministic protocols have the property that the processor state following a step is exclusively determined by the previous state and the result of the shared step, e.g. the value read. Randomized
protocols have the property that the processor has multiple possible next steps, based on the results of local coin flips following the sharedmemory step. Each of these possible next steps has a certain nonzero probability. As standard, we assume that the randomness provided to the algorithm is
finiteprecision, and so, the number of possible next steps at each point is countable.^{3}^{3}3Our analysis would also work in the absence of this requirement. However, it appears to be standard, and it will simplify the presentation: it will allow us to sum, rather than integrate, over possible executions.A configuration of the algorithm is completely determined by the state of each processor, and by the contents of each register. We assume that initially all registers have some predetermined value, and thus the initial configuration is only determined by the input state (or value) of each processor. Two configurations and are said to be indistinguishable to processor if has the same state in and , and all registers have the same contents in both configurations.
A processor is said to be poised to perform step , which could be a read or a write, in configuration if is the next step that will perform given . Given a valid configuration and a valid next step by , we denote the configuration after is performed by as . An execution is simply a sequence of such valid steps by processors, starting at the initial configuration. Thus, a configuration is reachable if there exists an execution resulting in . In the following, we will pay particular attention to solo processor executions, that is, executions in which only a single processor takes steps.
Our progress requirement for algorithms will be obstructionfreedom [20], also known as solotermination [23]. Specifically, an algorithm satisfies this condition if, from any reachable configuration , any processor must eventually return a decision in every solo extension of , i.e. in every extension such that only consists of steps by .
In the following, we will prove lower bounds for the following simplified variant of the leader election problem.
Definition 1 (Weak Leader Election).
In the Weak Leader Election problem, each participating processor starts with its own identifier as input, and must return either win or lose. The following must hold:

(Leader Uniqueness) In any execution, at most a single processor can return win.

(Solo Output) Any processor must return win in any execution in which it executes solo.
We note that this variant does not fully specify return values in contended executions—in particular, under this definition, all processors may technically return lose if they are certain that they are not in a solo execution—and does not require linearizability [21], so it is weaker than testandset. Our results will apply to this weaker problem variant.
3 Lower Bound for Deterministic Algorithms
As a warmup result, we provide a simple logarithmic lower bound for the solo step complexity of leader election with SWMR registers. Specifically, the rest of this section is dedicated to proving the following statement:
Theorem 3.1.
Any deterministic leader election protocol in asynchronous sharedmemory with SWMR registers has worstcase solo step complexity.
3.1 Adversarial Strategy
We will specify the lower bound algorithmically, as an iterative procedure that the adversary can follow to create a worstcase execution. More precisely, the adversarial strategy will proceed in steps and will maintain two sets of processors at each step, the available set and the frozen set . In addition, we maintain a prefix of the worstcase execution, which we denote by .
Initially, all processors are in initial state, and placed in the pool of available processors , while the set of frozen processors is empty, and the worstcase execution is empty as well. In addition, we will associate a blame counter to each available processor , initially . Intuitively, this represents the number of processors that were placed in the frozen set because of .
In each step , we first identify the processor whose blame count is minimal among processors from the available set , breaking ties arbitrarily. We then execute the sequence of solo steps of processor , until we first encounter a write step of to some register which is read by some available processor in its solo execution. Note that the step itself is not added to the execution prefix . Below, in Lemma 3.3, we will show that such a write step by must necessarily exist: otherwise, we could run until it returns win, without this fact being visible to any other processor in the available set.
Having identified this first write step by , we “freeze” processor exactly before , and place it in the frozen set at the next step, , removing it from . We then update the worstcase execution prefix to . Finally, we increment the blame count by 1 for every processor with the property that reads from in its solo execution. At this point, step is complete, and we can move on to step . The process stops when there are no more available processors.
3.2 Analysis
We begin by noting the following invariants, maintained by the adversarial strategy:
Lemma 3.2.
At the beginning of each step , the adversarial strategy enforces the following:

All available processors are in their initial state;

The contents of all registers read by processors in during their respective solo executions are the same as in the initial configuration.
Proof.
Both claims follow by the structure of the construction. The first claim follows since the only processor which executes in any step is eliminated from . The second claim follows since, at every step , we freeze the corresponding processor before it writes to any register read by any of the remaining processors in . ∎
Notice that this result practically ensures that the execution prefix generated up to every step is indistinguishable from the initial configuration for processors in the available set . Next, we show that the strategy is welldefined, in the sense that the step processor specified above must exist at each iteration of the strategy.
Lemma 3.3.
Fix a step and let be the chosen processor of minimal blame count . Then there must exist a step in the solo execution of which writes to some register which is read by some available processor .
Proof.
We will begin by proving a slightly stronger statement, that is, for any processor , there must exist a register which is written by in its solo execution and read by in its solo execution. We will then choose to be the first such register written to by in its solo execution, and to be the corresponding write step.
Assume for contradiction that there exists a processor , which does not read from any registers written to by in its solo execution. By Lemma 3.2, the current execution is indistinguishable from a solo execution for . Thus, if runs solo from the prefix until completion, must return win. However, if runs solo after returns, also must return win, since it does not read from any register which wrote to, and therefore, by Lemma 3.2, it observes a solo execution as well. This contradicts the leader uniqueness property in the resulting execution.
We have therefore established that every other processor must eventually read from a register written to by in its solo execution. (Notice that these registers need not be distinct with respect to processors.) Let be the step where first writes to during its solo execution. To ensure the requirements of the adversarial strategy, it suffices to pick to be the first such step , in temporal order, in ’s solo execution. ∎
We now return to the proof, and focus on the blame counts of available processors at any fixed step , . Define the potential to be at time .
Since and , for all processors , we have that . Next, we show that, due to the way in which we choose the next processor to be executed, we can always lower bound this potential by .
Lemma 3.4.
For any step , we have .
Proof.
We will proceed by induction. The base step is outlined above. Fix therefore a step such that .
Again, let be the processor we freeze at step . For each , let be the weight by which we incremented the blame count of processor in this step. By Lemma 3.3. we have that there exist such that . Further, since we chose to execute the processor with minimal blame count, we have that . Let us now analyze the difference
Hence, , as required. ∎
To complete the proof of Theorem 3.1, let be the last remaining nonfrozen processor before the process completes, i.e. . By Lemma 3.4, we have that , which implies that . Further, notice that processor must have performed at least distinct read operations: for every increment of , there must exist a unique processor which wrote to some register from which reads in its solo execution. Since we are assuming SWMR registers, the reads performed by must be also unique. Hence, processor performs steps in a solo execution, implying an solo step complexity lower bound for the algorithm. This strongly suggests that the tournamenttree approach is optimal for SWMR registers.
3.3 Discussion
Bounded ConcurrentWrite Contention and Stalls.
Second, it is interesting to observe what happens to the above argument in the case of multiwriter registers. Let be the bound on the concurrentwrite contention over any single register, in any execution, that is, on the maximum number of processors which may be concurrently poised to write to a register. Notice that the overall construction and the blaming mechanism would still work. Therefore, the potential lower bound still holds, but in the proof of the last step, the different steps taken by the last processor do not necessarily need to be distinct. Specifically, we note that a single read step by may be counted at most times, once for each different processor which may be frozen upon its write to the corresponding register. The lower bound is therefore weakened linearly in .
Corollary 3.4.1.
Any deterministic leader election protocol in asynchronous sharedmemory where at most may be poised to write to a register concurrently has worstcase solo step complexity . Moreover, if the lower bound construction above implies worstcase step complexity for a processor, then there must exist an execution in which the concurrentwrite contention on some register is .
Recall that, when interpreted in the stall model of [13], having processors poised to write to a register at the same time implies (write)stall complexity for one of the processors. Thus, this last result implies a logarithmic multiplicative tradeoff between the worstcase step complexity of a protocol and its worstcase stall complexity.
Stronger Primitives.
Third, we note that this approach can also be extended to deterministic algorithms employing SWMR registers supporting read, and write, and additionally processor testandset objects. We can then apply the same freezing strategy, and note that an access to a testandset object can only lead to freezing a processor and incrementing the blame counter of another processor just once (otherwise there is a combined execution with more than processors accessing it). Hence, we still obtain a lower bound of solo step complexity, i.e. that the tournament tree is the optimal strategy.
4 Lower Bound for Randomized Algorithms
We now shift gears and present our main result, which is a logarithmic expectedtime lower bound for randomized obstructionfree algorithms. Our approach in this case will be different, as we are unable to build an explicit worstcase adversarial strategy. Instead, we will argue about the expected length of executions by bounding the expected number of reads over distinct registers required for algorithms to be correct. In turn, this will require a careful analysis of the probability distribution over solo executions of a specific wellchosen structure.
We first focus on the SWMR case, and cover it exclusively in Sections 4.1 to 4.3. We will then provide a generalization to MWMR registers under bounded concurrency in Section 4.4.
4.1 Preliminaries
Fix a processor . For each , we define the set as the set of all possible solo executions of , and will focus on understanding the probability distribution over reads and writes for executions in . By the solo output property of the algorithm (Definition 1), all these executions have to be finite in length. For any possible solo execution of processor , will be used to denote the probability that if we let run run solo, it will execute and return. In particular, .
Let denote the set of all registers which could be used by the algorithm over all solo executions by some processor. Since the randomness provided to the algorithm is finiteprecision, the number of possible next steps in every configuration is countable and by the spiral argument, must be countable as well^{4}^{4}4Our argument works even when isn’t countable, but this simplifies notation, e.g. discrete sums.. Fix a register ; by definition, is read or written by some processor during some solo execution. Let be a set of all solo executions which read from a register :
We define the read potential to roughly count the sum of probabilities that a register is read from during solo executions by any processor. Formally,
Analogously, let be a set of all solo executions which write to a register :
We define the write potential of a register as
For the simplicity we assume that for any , (or alternatively ). Otherwise, the reads from do not change the outcome of the solo executions, and we can assume that they do not use .
Further, for any given solo execution of processor , we define the trace of , , as the sequence of registers written by during , in the order in which they were written, but omitting duplicate registers. For instance, if in execution processor wrote to , then followed by again and finally , the trace would be (notice that registers are sorted by the order they are written to for the first time in ). Also, for each register and solo execution , let be the index of register in the trace of . That is, if , then .
Our lower bound relies heavily on doublecounting techniques. To familiarize the reader with notation and provide some intuition, we isolate and prove the following simple properties of traces. We fix an execution , and the corresponding notation, as defined above.
Lemma 4.1.
Given the above notation, we have that .
Proof.
Fix a processor . Recall that every processor has to read from some register which writes to in its solo execution . Otherwise, there is an interleaving of ’s solo execution , followed by ’s execution, which neither nor can distinguish from their respective solo executions. Therefore, in this interleaved execution, and will both return win, which leads to a contradiction.
This means that, for every solo execution , there exists a register , such that and hence:
∎
Before proving the next lemma, we provide an intuition from the deterministic setting. For each processor , assume that there exists such that and consider the sum . For each register , we know that appears times in this summation. Hence, .
Lemma 4.2.
(1) 
Proof.
We have that
Where in the second equality we simply rearranged the terms. ∎
Finally, we will need the following useful property.
Lemma 4.3.
For any sequence of positive real numbers , we have that
Proof.
Notice that for any :
Hence:
∎
4.2 The “CarefullyNormalized” Read Potential Lemma
Our lower bound is based on the following key lemma, which intuitively provides a lower bound over the sum of the read potentials of registers written to in a solo execution by processor . Importantly, the read potentials are carefully normalized by, roughly, the probability that these registers are written to by other processors in some other executions.
Lemma 4.4.
Let be a solo execution of processor , and be its trace.
Then, .
The rest of this subsection will be dedicated to proving this lemma. Specifically, we prove the following two claims in the context of the theorem, i.e. for a fixed solo execution of processor . We expand as
Claim 1.
.
Proof.
Next, we prove the following extension:
Claim 2.
Under the above notation, we have
Proof.
Let us substitute the definition of . We want to prove that:
Both the left and righthand sides of this expression contain the sum of probabilities of certain solo executions. On the right hand side, (the probability of) any execution of processor can appear at most once. This is not necessarily true for the left hand side due to the outer summation. Therefore, we only need to show that for any whose probability is included in the summation on the right hand side, is also included in the summation on the left hand side—in other words, there exists , such that .
We prove this fact by contradiction. Suppose processor has a solo execution such that register is written to during , but no register among are read (which are all registers written prior to in ’s solo execution ). Now consider a combined execution of and , which consists of running as in until it becomes poised to write register —crucially, please note that so far has actually executed solo. From this point, we consider processor executing identically as it runs solo in execution . This is possible because the only registers written to so far the system are , which does not read in . As a result, will write to register , after which we can immediately allow to also write to . This implies that two processors write to the same register during the same execution, and contradicting the SWMR property. ∎
4.3 Completing the Lower Bound Proof
We now finally proceed to proving the following theorem:
Theorem 4.5.
Any randomized leader election protocol in asynchronous sharedmemory with SWMR registers has worstcase expected solo step complexity.
Proof.
We start by summing up inequalities given by Lemma 4.4 for all processors, and their solo executions:
(2) 
where in the last step we used that . Hence, by using and Lemma 4.2 we get that
Note that is the lower bound on the expected number of total reads from register . Hence, since the expected number of total reads is at least , there must exist a processor which performs at least reads in expectation. ∎
4.4 Extension for Bounded ConcurrentWrite Contention
We now extend our result to the case where the maximum number of processors which may be poised to write concurrently to a register, which we defined as the concurrentwrite contention, is bounded. Specifically, suppose that, in any execution, at most different processors may be poised to write to the same register. We preserve the notation from the previous section. Upon close examination, notice that Lemma 4.1 and Lemma 4.2 still hold in this MWMR model, as well as Claim 1, since they do not employ the SWMR property. (By contrast, Claim 2 no longer holds for .) We therefore continue to use only the above results.
We will prove a lower bound on the expected solo step complexity, under the above assumptions on . As before, let be the trace of execution .
Lemma 4.6.
We have that
.
Proof.
The proof is similar to the proof of Theorem 4.5, but using Claim 1 directly instead of Lemma 4.4. Specifically, we start by summing up the inequalities resulting from Claim 1 for all processors and solo executions:
The last equality follows by rearranging terms to be grouped by register instead of by processor. Note that this is similar to the proof of Lemma 4.2. However, in this case the resulting equation cannot be simplified further, since, unlike , the denominator term also depends on the execution . ∎
For any register , we call the set of processors a poise set for if:

contains solo executions of different processors, i.e. , such that , and for .

Let be the prefix of up to and including ’s first write step to the register . There exists a combined execution by processors , such that at the end of all processors have written to . Moreover, is indistinguishable from to (i.e. takes steps as in and does not read anything written by until it writes to ).
As processors can be poised to write to in the combined execution, no poise set can have size .
Lemma 4.7.
Let be a set of solo executions. Let be the maximum size of a poise set for register among executions in . Then, there exists a subset of executions , such that:

;

Every poise set for register among executions in has size at most .
Proof.
Let , i.e. the sum of probabilities of executions in , excluding executions by processors. We define the set as follows:
So, an execution is included in if the sum of read potentials of registers written prior to in is lower bounded by a term that depends on . Notice that for this is analogous to the condition in Claim 2.
The parameter satisfies the following useful property:
(3)  
where in (3) we have used that, from the definition of , .
Using this property, we get:
This proves the first part of the lemma. We prove the second part of the lemma by contradiction. Suppose there is a poise set for register among executions in , where is an execution of processor .
Consider any execution for . Execution must read one of the registers written during some time step before the point when is written in . Otherwise, , and more precisely, the prefix of up to the write to , can be added at the end of ’s interleaved execution, implying that would be a poise set of size among executions in , which does not exist by definition. Hence:
Notice how this generalizes Claim 2: we can now apply the pigeonhole principle to terms on the left hand side. We get that for some , , giving the desired contradiction, specifically, that consists of executions from only. This completes the proof of the Lemma. ∎
We are now ready to prove the main result of this section.
Theorem 4.8.
Any randomized leader election protocol in asynchronous sharedmemory has worstcase expected solo step complexity, when is the maximum number of processors which may be poised to write concurrently to the same register.
Proof.
Fix a register . We start by applying Lemma 4.7 to the set and the maximum poise set size of . Let be the resulting subset of executions , and be the maximum size of a poise set among executions in . Next, we apply Lemma 4.7 again to , and define , and as the maximum size of a poise set among executions in . The next application of Lemma 4.7 will be to , defining and . We repeat the process until some becomes , implying that the set of remaining executions is empty. Since , Lemma 4.7 will be applied at most times. Therefore we have obtained that:
as there are at most terms, each of which is upper bounded by , by Lemma 4.7.
Combined with Lemma 4.6, this gives . By the pigeonhole principle, some processor must perform at least reads in expectation, over its solo executions. ∎
5 A Complementary Upper Bound for Weak Leader Election
It is interesting to consider whether the lower bound approach can be further improved to address the MWMR model under concurrent write contention. This is not the case for the specific definition of the weak leader election problem we consider (Definition 1), and to which the lower bound applies. To establish this, it suffices to notice that the classic splitter construction of Lamport [25] solves weak leader election for processes, in constant time, by leveraging MWMR registers with maximal (concurrent) write contention .
Please recall that this construction, restated for convenience in Figure 1, uses two MWMR registers. Given a splitter, we can simply map the stop output to win, and the left and right outputs to lose. In this case, it is immediate to show that the splitter ensures the following:

a processor will always return win in a solo execution, and

no two processes may return win in the same execution.
This matches the requirements of the weak leader election problem, but not of testandset objects generally, as this algorithm has contended executions in which all processors return lose, which is also impractical.
One may further generalize this approach by defining splitter objects for , each of which is restricted to participating processors (and thus also concurrent write contention), and then arranging them in a complete ary tree. We can then proceed similarly to tournament tree, to implement a weak leader election object. The resulting construction has step complexity in solo executions, suggesting that the dependency on provided by our argument can be further improved.
This observation suggests that the tradeoff between step complexity and concurrentwrite contention/worstcase stalls outlined by our lower bound may be the best one can prove for weak leader election, as this problem can be solved in constant time with MWMR registers, at the cost of linear worstcase stalls. At the same time, it shows that lower bound arguments wishing to approach the general version of the problem have to specifically leverage the fact that, even in contended executions, not all processors may return lose.
6 Conclusion
Overview.
We gave the first tight logarithmic lower bounds on the solo step complexity of leader election in an asynchronous sharedmemory model with singlewriter multireader (SWMR) registers, for both deterministic and randomized algorithms. We then extended these results to registers with bounded concurrentwrite contention , showing a tradeoff between the step solo complexity of algorithms, and their worstcase stall complexity. The approach admits additional extensions, and is tight in the SWMR case. The impossibility result is quite strong, in the sense that logarithmic time is required over solo executions of processors, and for a weak variant of leader election, which is not linearizable and allows processors to all return lose in in contended executions.
Future Work.
The key question left open is whether sublogarithmic upper bounds for strong leader election / testandset may exist, specifically by leveraging multiwriter registers, or whether the lower bounds can be further strengthened. Another interesting question is whether our approach can be extended to handle different cost metrics, such as remote memory references (RMRs).
References
 [1] Yehuda Afek, Eli Gafni, John Tromp, and Paul M. B. Vitányi. Waitfree testandset (extended abstract). In WDAG ’92: Proceedings of the 6th International Workshop on Distributed Algorithms, pages 85–94, 1992.
 [2] Dan Alistarh and James Aspnes. Sublogarithmic testandset against a weak adversary. In International Symposium on Distributed Computing, pages 97–109. Springer, 2011.
 [3] Dan Alistarh, James Aspnes, Keren CensorHillel, Seth Gilbert, and Rachid Guerraoui. Tight bounds for asynchronous renaming. Journal of the ACM (JACM), 61(3):1–51, 2014.
 [4] Dan Alistarh, Hagit Attiya, Seth Gilbert, Andrei Giurgiu, and Rachid Guerraoui. Fast randomized testandset and renaming. In Proceedings of the 24th international conference on Distributed computing, DISC’10, pages 94–108, Berlin, Heidelberg, 2010. SpringerVerlag. URL: http://portal.acm.org/citation.cfm?id=1888781.1888794.
 [5] Dan Alistarh, Rati Gelashvili, and Adrian Vladu. How to elect a leader faster than a tournament. In Proceedings of the 2015 ACM Symposium on Principles of Distributed Computing, pages 365–374, 2015.
 [6] James Aspnes. Notes on theory of distributed systems. arXiv preprint arXiv:2001.04235, 2020.
 [7] James Aspnes, Keren CensorHillel, Hagit Attiya, and Danny Hendler. Lower bounds for restricteduse objects. SIAM Journal on Computing, 45(3):734–762, 2016.
 [8] Hagit Attiya and Keren Censor. Tight bounds for asynchronous randomized consensus. J. ACM, 55(5):1–26, 2008. doi:http://doi.acm.org/10.1145/1411509.1411510.
 [9] Hagit Attiya and Faith Ellen. Impossibility results for distributed computing. Synthesis Lectures on Distributed Computing Theory, 5(1):1–162, 2014.
 [10] James E Burns and Nancy A Lynch. Bounds on shared memory for mutual exclusion. Information and Computation, 107(2):171–184, 1993.
 [11] Cynthia Dwork, Maurice Herlihy, and Orli Waarts. Contention in shared memory algorithms. Journal of the ACM (JACM), 44(6):779–805, 1997.
 [12] Aryaz Eghbali and Philipp Woelfel. An almost tight rmr lower bound for abortable testandset. arXiv preprint arXiv:1805.04840, 2018.
 [13] Faith Ellen, Danny Hendler, and Nir Shavit. On the inherent sequentiality of concurrent objects. SIAM Journal on Computing, 41(3):519–536, 2012.
 [14] George Giakkoupis, Maryam Helmi, Lisa Higham, and Philipp Woelfel. An O ( sqrt n) space bound for obstructionfree leader election. In International Symposium on Distributed Computing, pages 46–60. Springer, 2013.

[15]
George Giakkoupis, Maryam Helmi, Lisa Higham, and Philipp Woelfel.
Testandset in optimal space.
In
Proceedings of the fortyseventh annual ACM symposium on Theory of Computing
, pages 615–623, 2015.  [16] George Giakkoupis and Philipp Woelfel. Efficient randomized testandset implementations. Distributed Computing, 32(6):565–586, 2019.
 [17] Wojciech Golab, Danny Hendler, and Philipp Woelfel. An o(1) rmrs leader election algorithm. SIAM Journal on Computing, 39(7):2726–2760, 2010.
 [18] Danny Hendler and Nir Shavit. Operationvalency and the cost of coordination. In Proceedings of the twentysecond annual symposium on Principles of distributed computing, pages 84–91, 2003.
 [19] Maurice Herlihy. Waitfree synchronization. ACM Transactions on Programming Languages and Systems, 13(1):123–149, January 1991.
 [20] Maurice Herlihy, Victor Luchangco, and Mark Moir. Obstructionfree synchronization: Doubleended queues as an example. In 23rd International Conference on Distributed Computing Systems, 2003. Proceedings., pages 522–529. IEEE, 2003.
 [21] Maurice P Herlihy and Jeannette M Wing. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems (TOPLAS), 12(3):463–492, 1990.
 [22] Prasad Jayanti. A time complexity lower bound for randomized implementations of some shared objects. In Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing, pages 201–210, 1998.
 [23] Prasad Jayanti, King Tan, and Sam Toueg. Time and space lower bounds for nonblocking implementations. SIAM Journal on Computing, 30(2):438–456, 2000.
 [24] YongJik Kim and James H Anderson. A time complexity lower bound for adaptive mutual exclusion. Distributed Computing, 24(6):271–297, 2012.
 [25] Leslie Lamport. A fast mutual exclusion algorithm. ACM Transactions on Computer Systems (TOCS), 5(1):1–11, 1987.
 [26] Mark Moir and James H. Anderson. Waitfree algorithms for fast, longlived renaming. Science of Computer Programming, 25:1–39, 1995.
 [27] Gary L. Peterson and Michael J. Fischer. Economical solutions for the critical section problem in a distributed system (extended abstract). In Proceedings of the ninth annual ACM symposium on Theory of computing, STOC ’77, pages 91–97, New York, NY, USA, 1977. ACM. URL: http://doi.acm.org/10.1145/800105.803398, doi:http://doi.acm.org/10.1145/800105.803398.
 [28] Eugene Styer and Gary L Peterson. Tight bounds for shared memory symmetric mutual exclusion problems. In Proceedings of the eighth annual ACM Symposium on Principles of distributed computing, pages 177–191, 1989.
 [29] John Tromp and Paul Vitányi. Randomized twoprocess waitfree testandset. Distrib. Comput., 15(3):127–135, 2002. doi:http://dx.doi.org/10.1007/s004460200071.
 [30] JaeHeon Yang and James H Anderson. Time bounds for mutual exclusion and related problems. In Proceedings of the twentysixth annual ACM symposium on Theory of Computing, pages 224–233, 1994.
Comments
There are no comments yet.