    # A Local Lemma for Focused Stochastic Algorithms

We develop a framework for the rigorous analysis of focused stochastic local search algorithms. These are algorithms that search a state space by repeatedly selecting some constraint that is violated in the current state and moving to a random nearby state that addresses the violation, while hopefully not introducing many new ones. An important class of focused local search algorithms with provable performance guarantees has recently arisen from algorithmizations of the Lovász Local Lemma (LLL), a non-constructive tool for proving the existence of satisfying states by introducing a background measure on the state space. While powerful, the state transitions of algorithms in this class must be, in a precise sense, perfectly compatible with the background measure. In many applications this is a very restrictive requirement and one needs to step outside the class. Here we introduce the notion of measure distortion and develop a framework for analyzing arbitrary focused stochastic local search algorithms, recovering LLL algorithmizations as the special case of no distortion. Our framework takes as input an arbitrary such algorithm and an arbitrary probability measure and shows how to use the measure as a yardstick of algorithmic progress, even for algorithms designed independently of the measure.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Let be a large, finite set of objects and let be a collection of subsets of . We will refer to each as a flaw to express that all objects in have negative feature . For example, for a CNF formula on variables with clauses , for each clause we can define to comprise the truth assignments that violate . Following linguistic rather than mathematical convention we will say that flaw is present in object if and that is flawless if no flaw is present in .

The Lovász Local Lemma is a non-constructive tool for proving the existence of flawless objects by introducing a probability measure on and bounding from below the probability of simultaneously avoiding all (“bad”) events corresponding to the flaws in . (Below and throughout we assume that products devoid of factors evaluate to 1.)

###### General LLL.

Given events , for each , let the set be such that if , then . If there exist such that for all ,

 μ(Ai)ψi∑S⊆{i}∪D(i)∏j∈Sψj≤1, (1)

then the probability that none of occurs is at least .

Erdős and Spencer  noted that independence in the LLL can be replaced by positive correlation, yielding the stronger Lopsided LLL. The difference is that each set is replaced by a set such that if , then , i.e., “=” is replaced by “”. Also, condition (1) is more well-known as , where . As we will see, formulation (1) better facilitates the statement of refinements of the condition. Specifically, considering graphical properties of the graphs on induced by the relationships and , one can show more permissive conditions, such as the cluster expansion , the lefthanded , and Shearer’s condition .

Moser , later joined by Tardos , in groundbreaking work showed that a simple algorithm can be used to make the general LLL constructive when is a product measure. Specifically, in the variable setting of , each event is determined by a set of variables so that iff . Moser and Tardos proved that in the variable setting, if (1) holds, then repeatedly selecting any occurring event and resampling every variable in independently according to , leads to a flawless object after a linear expected number of resamplings. Pegden  proved that this remains true under the weakening of (1) to the cluster expansion condition of Bissacott et al. . Finally, Kolipaka and Szegedy  proved that the resampling algorithm actually works even under Shearer’s tight condition . In an orthogonal development, Harris and Srinivasan in  were the first to make the LLL constructive outside the variable setting, giving an algorithmic LLL for the uniform measure on permutations.

Moser’s original analysis of the resampling algorithm in the context of satisfiability  inspired a parallel line of works that formed the so-called entropy compression method, e.g., [11, 16, 13]. In these works, the set of objects typically does not have product structure, there is no measure , and no general condition for algorithmic convergence. Instead, the fact that the algorithm under consideration must reach a flawless object (and, thus, terminate) is established by proving that the entropy of its trajectory grows more slowly than the rate at which it consumes randomness. The rate comparison is done in each case via a problem-specific counting argument.

A common feature of the resampling algorithm of , the swapping algorithm of , and all algorithms analyzed by entropy compression, is that they are instances of focused stochastic local search. The general idea in stochastic local search is that is equipped with a neighborhood structure, so that the search for flawless objects starts at some (flawed) object (state) and moves stochastically from state to state along the neighborhood structure. Restricting the search so that every transition away from a state must target one of the flaws present in is known as focusing the search . The first effort to give a convergence condition for focused stochastic local search algorithms with arbitrary transition probabilities, i.e., not mandated by a background measure , was the flaws/actions framework of . As our work also uses this framework, below we recall some of the relevant definitions. Note that the existence of flawless objects is not presumed in any of these analyses. Instead, the idea is to establish the existence of flawless objects by proving that some focused stochastic local search algorithm (quickly) converges to one.

For , let denote the set of indices of the flaws present in , i.e., . For every , let be a non-empty subset of . The elements of are called actions and we consider the multi-digraph on that has an arc for every . We will consider walks on which start at a state

selected according to some probability distribution

, and which at each non-sink vertex first select a flaw as a function of the trajectory so far (focus), and then select as the next state with probability . Whenever flaw is selected we will say that flaw was addressed. This will not necessarily mean that will be eliminated, i.e., potentially . The multidigraph should be thought of as implicitly defined by the algorithm we wish to analyze each time, not as explicitly constructed. Also, when we refer to running “time”, we will refer to the number of steps on , without concern for exactly how long it takes to perform a single step, i.e., to identify a flaw present and select from its actions.

In this language,  gave a sufficient condition for algorithmic convergence when:

1. [label=()]

2. is atomic, i.e., for every and every there exists at most one arc .

3. assigns equal probability to every action in , for every and .

By analyzing algorithms satisfying conditions 1 and 2, several results that had been proved by custom versions of the LLL, and thus fell outside the algorithmization framework of , were made constructive and improved in . At the same time, the convergence condition of  makes it possible to recover most results of the entropic method by generic arguments (sometimes with a small parameter loss). Finally, it is worth pointing out that even though the framework of  does not reference a background probability measure , it captures a large fraction of the applications of general LLL. This is because when is uniform and bad events correspond to partial assignments, a very common scenario, the state transitions of the resampling algorithm of Moser and Tardos satisfy both conditions 1 and 2. Overall, though, the convergence condition of  was incomparable with those of the LLL algorithmizations preceding it.

The long line of work on LLL algorithmizations that started with the groundbreaking work of Moser, culminated with the work of Harvey and Vondrák . They showed that the Lopsided LLL can be made constructive even under the most permissive condition (Shearer’s), whenever one can construct efficient resampling oracles. Resampling oracles elegantly capture the common core of all LLL algorithmizations, namely that the state transitions, , are perfectly compatible with the background measure . Below we give the part of the definition of resampling oracles that exactly expresses this notion of compatibility, which we dub (measure) regeneration.

###### Regeneration (Harvey-Vondrák ).

Say that regenerate at flaw if for every ,

 1μ(fi)∑σ∈fiμ(σ)ρi(σ,τ)=μ(τ). (2)

Observe that the l.h.s. of (2) is the probability of reaching state after first sampling a state according to and then addressing at . The requirement that this probability equals for every means that must be such that in every state the distribution on actions for addressing perfectly removes the conditional . Of course, a trivial way to satisfy this requirement is to sample a new state according to in each step (assuming is efficiently sampleable). Doing this, though, foregoes any notion of iterative progress towards a goal, as the set of flaws present in are completely unrelated to those in . Instead, one would like to respect (2) while limiting the flaws introduced in . To that end, we can consider the projection of the action digraph capturing which flaws may be introduced (caused) when we address each flaw. It is important to note that, below, potential causality is independent of flaw choice and that the causality digraph has an arc if there exists even one transition aimed at addressing that causes to appear in the new state. Naturally, the sparser this causality digraph, the better.

###### Potential Causality.

For an arc in and a flaw present in we say that causes if or . We say that potentially causes if contains at least one arc wherein causes .

###### Causality Digraph.

The digraph on where iff potentially causes is called the causality digraph. The neighborhood of a flaw is .

Harvey and Vondrák  proved that for essentially every lopsidependency digraph of interest, there exist resampling oracles whose causality digraph is (a subgraph of) . We should emphasize, though, that there is no guarantee that these promised resampling oracles can be implemented efficiently, so as to yield an LLL algorithmization (and, naturally, in the absence of efficiency considerations the LLL is already “algorithmic” by exhaustive search). Indeed, as we discuss below, there are settings in which the existence of efficient resampling oracles seems unlikely. That said, in  Harvey and Vondrák demonstrated the existence of efficient resampling oracles for a plethora of LLL applications in the variable setting, the permutation setting, and several other settings.

Perhaps the simplest demonstration of the restrictiveness of resampling oracles comes from one of the oldest and most vexing concerns about the LLL (see the survey of Szegedy ). Namely, the inability222Naturally, whenever the set of flawless objects is non-empty, the uniform measure on demonstrates the existence of flawless objects. So, in a trivial sense, there is nothing that can not be established by the LLL. But, of course, anyone in possession of a description of allowing the construction of a measure on it, does not need the LLL. Indeed, the whole point of the LLL is that it offers incredibly rich conclusions, e.g., , from extremely meager ingredients, e.g., the uniform measure on . of the LLL to establish that a graph with maximum degree can be colored with colors. For example, if is the uniform measure on all colorings with colors, then every time a vertex is recolored, its color must be chosen uniformly among all colors, something that induces a requirement of colors. If, instead, one could chose only among colors that do not currently appear in ’s neighborhood, then for all , the causality digraph is empty and rapid termination follows trivially. But it seems very hard to describe a probability measure and resampling oracles for it that respect the empty causality graph.

To recap, there are two “schools of thought.” In the first, one starts from the central object of the LLL, the measure on , and tries to design an algorithm that moves from one state to another in a manner that perfectly respects the measure. In the other, there is no measure on at all and both the transitions and their probabilities can be, a priori, arbitrary. In this work, we bring these two schools of thought together by introducing the notion of measure distortion, showing, in particular, that the first school corresponds to the special case of no distortion. The main point of our work, though, is to demonstrate that the generality afforded by allowing measure distortion has tangible benefits. Specifically, in complex applications, the requirement that every resampling must perfectly remove the conditional of the resampled bad event can be impossible to meet by short travel within , i.e., by “local search”. This is because small, but non-vanishing, correlations can travel arbitrarily far in the structure. Allowing measure distortion removes the requirement of perfect deconditioning, with any correlation seepage (distortion) is accounted for, via a local analysis. This makes it possible to design natural, local algorithms and prove rigorous mathematical statements about their convergence in the presence of long-range correlations.

Concretely, we extend the flaws/actions framework of  to allow arbitrary action digraphs , arbitrary transition probabilities , and the incorporation of arbitrary background measures , allowing us to connect the flaws/actions framework to the Lovász Local Lemma. Our work highlights the role of the measure in gauging how efficiently the algorithm rids the state from flaws, i.e., as a gauge of progress, by pointing out the trade-off between distortion and the sparsity of the causality graph. The end result is a theorem that subsumes both the results of  and the algorithmization of the Lopsided LLL  via resampling oracles, establishing a uniform method for designing and analyzing focused stochastic local search algorithms. Additionally, our work makes progress on elucidating the role of flaw choice in stochastic local search, and establishes several structural facts about resampling oracles.

## 2 Statement of Results

We develop tools for analyzing focused stochastic local search algorithms. Specifically, we establish a sequence of increasingly general conditions under which such algorithms find flawless objects quickly, presented as Theorems 1,2, and 3. For the important special case of atomic action digraphs we identify structural properties of resampling oracles, presented as Theorem 4. For the same setting we also derive a sharp analysis for the probability of any trajectory, elucidating the role of flaw choice, presented as Theorem 5.

Theorems 13 differ in the sophistication of the flaw-choice mechanism they can accommodate. While in works such as  on the variable setting and  on permutations, the setting was sufficiently symmetric that flaw choice could be arbitrary, in more complex applications more sophisticated flaw-choice is necessary. For example, to establish our results on Acyclic Edge Coloring we must use our recursive algorithm (Theorem 2), as the simple Markov walk (Theorem 1), let alone arbitrary flaw choice, will not work.

To demonstrate the flexibility of our framework, we derive a bound for Acyclic Edge Coloring of graphs with bounded degeneracy, a class including all graphs of bounded treewidth, presented as Theorem 6 in Section 6. To derive the result we rely heavily on the actions not forming resampling oracles with respect to the measure used. Unlike other recent algorithmic work on the problem [13, 15], our result is established without ideas/computations “customized” to the problem, but as a direct application of Theorem 2, highlighting its capacity to incorporate both global conditions, such as degeneracy, and sophisticated flaw-choice mechanisms, in this case a recursive procedure. We also show how to derive effortlessly an upper bound of for Acyclic Edge Coloring of general graphs, which comes close to the hard-won bound of of Esperet and Parreau  via a custom analysis. Finally, we note that Iliopoulos  recently showed how our main theorem can be used to analyze the algorithm of Molloy  for coloring triangle-graph graphs of degree up to the “shattering threshold” for random graphs .

### 2.1 Setup

Recall that we consider algorithms which at each flawed state select some flaw to address and then select the next state with probability . As one may expect the flaw choice mechanism does have a bearing on the running time of such algorithms and we discuss this point in Section 2.6. Our results focus on conditions for rapid termination that do not require sophisticated flaw choice (but can be used in conjunction which such choice).

To measure a walk’s capacity to rid the state of flaws we introduce a measure on , as in the LLL. Without loss of generality, and to avoid certain trivialities, we assume that for all . The choice of is entirely ours and can be oblivious, e.g., . While will typically assigns only exponentially small probability to flawless objects, it will allow us to prove that the walk reaches a flawless object in polynomial time with high probability.

To do this we define a “charge” for each flaw that captures the compatibility between the actions of the algorithm for addressing flaw and the measure . Specifically, just as for regeneration, we consider the probability, , of ending up in state after (i) sampling a state according to , and then (ii) addressing at . But instead of requiring that equals , as in resampling oracles, we allow to be free and simply measure

 di=maxτ∈Ωνi(τ)μ(τ)≥1, (3)

i.e., the greatest inflation of a state probability incurred by addressing (relative to its probability under , and averaged over the initiating state according to ). The charge of flaw is then defined as

 γi := di⋅μ(fi) (4) = maxτ∈Ω1μ(τ)∑σ∈fiμ(σ)ρi(σ,τ). (5)

To gain some intuition for observe that if is uniform and is atomic, then is simply the greatest transition probability on any arc originating in .

To state our results we need a last definition regarding the distribution of the starting state.

###### Definition 1.

The span of a probability distribution , denoted by , is the set of flaw indices that may be present in a state selected according to , i.e., .

### 2.2 A Simple Markov Chain

Our first result concerns the simplest case where in each flawed state , the algorithm addresses the greatest flaw present in , according to an arbitrary but fixed permutation of the flaws. Recall that is the measure on used to measure progress, is the charge of flaw according to , and is the starting state distribution.

###### Theorem 1.

If there exist positive real numbers such that for every ,

 ζi:=γiψi∑S⊆Γ(i)∏j∈Sψj<1, (6)

then for every permutation , the walk reaches a sink within steps with probability at least , where , and

 T0=log2(maxσ∈Ωθ(σ)μ(σ))+log2⎛⎝∑S⊆S(θ)∏j∈Sψj⎞⎠=log2(maxσ∈Ωθ(σ)μ(σ))+∑j∈S(θ)log2(1+ψj).

Theorem 1 has two features worth discussing, shared by all our results.

Arbitrary starting state. Since can be arbitrary, any foothold on suffices to apply the theorem. Note also that captures the trade-off between starting at a fixed state vs. starting at a state sampled from . In the latter case, i.e., when , the first term in vanishes, but the second term grows to reflect the uncertainty of the set of flaws present in .

Arbitrary number of flaws. The running time depends only on the span , not the total number of flaws . This has an implication analogous to the result of Hauepler, Saha, and Srinivasan  on core events: even when is super-polynomial in the problem’s encoding length, it may still be possible to get a polynomial-time algorithm. For example, this can be done by proving that in every state only polynomially many flaws may be present, or by finding a specific state such that is small.

### 2.3 A Non-Markovian Algorithm

Our next results concerns the common setting where the neighbors of each flaw in the causality graph span several arcs between them. We improve Theorem 1 in such settings by employing a recursive algorithm. That is, an algorithm where the flaw choice at each step depends on the entire trajectory up to that point, not just the current state, so that the resulting walk on is non-Markovian. It is for this reason that we required a non-empty set of actions for every flaw present in a state, and why the definition of the causality digraph does not involve flaw choice. The improvement is that rather than summing over all subsets of as in (6), we now only sum over independent such subsets, where are dependent if and . This improvement is similar to the cluster expansion improvement of Bissacot et al.  of the general LLL. As a matter of fact, Theorem 2 implies the algorithmic aspects of  (see  and  ).

Further, the use of a recursive algorithm makes it possible to “shift responsibility” between flaws, so that gains from the aforementioned restriction of the sum can be realized by purposeful flaw ordering. For a permutation of , let denote the index of the greatest flaw in any according to . For a fixed action digraph with causality digraph , the recursive algorithm takes as input any digraph , i.e., any supergraph of , and is the non-Markovian random walk on that occurs by invoking procedure Eliminate. Observe that if in line 8 we do not intersect with the recursion is trivialized, recovering the simple walk of Theorem 1. Its convergence condition, Theorem 2, involves sums over the independent sets of , generalizing the discussion above (as one can always take ).

The reason for allowing the addition of arcs in relative to is that while adding, say, arcs and may make the sums corresponding to and greater, if flaw is such that , then the sum for flaw may become smaller, since are now dependent. As a result, without modifying the algorithm, such arc addition can help establish a sufficient condition for rapid convergence to a flawless object, e.g., in our application on Acyclic Edge Coloring in Section 6. An analogous phenomenon is also true in the improvement of Bissacot et al. , i.e., denser dependency graphs may yield better analysis.

###### Definition 2.

For a digraph on , let be the undirected graph where iff both and exist in . For , let .

###### Theorem 2.

Let be arbitrary. If there exist positive real numbers such that for every ,

 ζi:=γiψi∑S∈Ind(ΓR(i))∏j∈Sψj<1, (7)

then for every permutation , the recursive walk reaches a sink within steps with probability at least , where , and

 T0=log2(maxσ∈Ωθ(σ)μ(σ))+log2⎛⎝∑S⊆Ind(S(θ))∏j∈Sψj⎞⎠.
###### Remark 1.

Theorem 2 strictly improves Theorem 1 since for : (i) the summation in (7) is only over the subsets of that are independent in , instead of all subsets of as in (6), and (ii) similarly for , the summation is only over the independent subsets of , rather than all subsets of .

###### Remark 2.

Theorem 2 can be strengthened by introducing for each flaw a permutation of and replacing with in line 9 the of Recursive Walk. With this change in (7) it suffices to sum only over satisfying the following: if the subgraph of induced by contains an arc , then . As such a subgraph can not contain both and we see that .

### 2.4 A General Theorem

Theorems 1 and 2 are instantiations of a general theorem we develop for analyzing focused stochastic local search algorithms. Before stating the theorem we briefly discuss its derivation in order to motivate its form. Recall that a focused local search algorithm amounts to a flaw choice mechanism driving a random walk on a multidigraph with transition probabilities and starting state distribution .

To bound the probability that runs for or more steps we partition the set of all -trajectories into equivalence classes, bound the total probability of each class, and sum the bounds for the different classes. Specifically, the partition is according to the -sequence of the first flaws addressed.

###### Definition 3.

For any integer , let denote the set containing all -sequences of flaws that have positive probability of being the first flaws addressed by .

In general, the content of is an extremely complex function of flaw choice. An essential idea of our analysis is to overapproximate it by syntactic considerations capturing the following necessary condition for : while the very first occurrence of any flaw in may be attributed to , every subsequent occurrence of must be preceded by a distinct occurrence of a flaw that “assumes responsibility” for , e.g., a flaw that potentially causes . Definition 4 below establishes a framework for bounding by relating flaw choice with responsibility by (i) requiring that the flaw choice mechanism is such that the elements of can be unambiguously represented forests with vertices, while on the other hand (ii) generalizing the subsets of flaws for which a flaw may be responsible from subsets of to arbitrary subsets of flaws, thus enabling responsibility shifting.

###### Definition 4.

We will say that algorithm is traceable if there exist sets and such that for every , the flaw sequences in can be injected into unordered rooted forests with vertices that have the following properties:

1. Each vertex of the forest is labeled by an integer .

2. The labels of the roots of the forest are distinct and form an element of .

3. The indices labeling the children of each vertex are distinct.

4. If a vertex is labelled by , then the labels of its children form an element of .

In  it was shown that both the simple random walk algorithm in Theorem 1 and the recursive walk algorithm in Theorem 2 are traceable. Specifically, the set of the former can be injected into so-called Break Forests, so that Definition 4 is satisfied, with and . For the latter, can be analogously injected into so-called Recursive Forests with and . Thus, Theorems 1,2 follow readily from Theorem 3 below.

###### Theorem 3 (Main result).

If algorithm is traceable and there exist positive real numbers such that for every ,

 ζi:=γiψi∑S∈List(i)∏j∈Sψj<1, (8)

then reaches a sink within steps with probability at least , where and

 T0=log2(maxσ∈Ωθ(σ)μ(σ))+log2⎛⎝∑S∈Roots(θ)∏j∈Sψj⎞⎠.

Theorem 3 also implies the “LeftHanded Random Walk” result of  and extends it to non-uniform transition probabilities, since that algorithm is also traceable. Notably, in the LeftHanded LLL introduced by Pedgen  and which inspired the algorithm, the flaw order can be chosen in a provably beneficial way, unlike in the algorithms of Theorems 1 and 2, which are indifferent to . Establishing this goodness, though, entails attributing responsibility very differently from what is suggested by the causality digraph, making full use of the power afforded by traceability and Theorem 3.

### 2.5 Resampling Oracles via Atomic Actions

To get a constructive result by LLL algorithmization via resampling oracles, i.e., given , and , we must design that regenerate at every flaw . This can be a daunting task in general. We simplify this task greatly for atomic action digraphs. Such digraphs capture algorithms that appear in several settings, e.g., the Moser-Tardos algorithm when flaws correspond to partial assignments, the algorithm of Harris and Srinivasan for permutations , and others (see ). While atomicity may seem an artificial condition, it is actually a natural way to promote search space exploration, as it is equivalent to the following: distinct states must have disjoint actions, i.e., . In most settings atomicity can be achieved in a straightforward manner. For example, in the variable setting atomicity is implied by an idea that is extremely successful in practice, namely “focus” [30, 33, 4]: every state transformation should be the result of selecting a flaw present in the current state and modifying only the variables of that flaw.

Theorem 4 asserts that when the action digraph must be atomic, then in order to regenerate at it is sufficient (and necessary) for the states in each set to have total probability given by (9). Equation (10) then automatically provides appropriate transition probabilities. Combined, equations (9), (10) offer strong guidance in designing resampling oracles in atomic digraphs.

###### Theorem 4.

If is atomic and regenerate at , then for every :

 ∑τ∈A(i,σ)μ(τ) = μ(σ)μ(fi) (9) ρi(σ,τ) = μ(τ)∑σ′∈A(i,σ)μ(σ′)for every τ∈A(i,σ). (10)

### 2.6 A Sharp Analysis and the Role of Flaw Choice

Let

be the random variable that equals the sequence of the first

flaws addressed by the algorithm, or if the algorithm reaches a flawless object in fewer than steps. Recall that denotes the set of all -sequences of flaws that have positive probability of being the first flaws addressed by an algorithm , i.e., the range of except . Trivially, the probability that takes at least steps equals

 ∑W∈Wt(A)Pr[Wt=W].
###### Theorem 5.

For any algorithm for which is atomic and regenerate at every flaw, for every flaw sequence ,

 Pr[Wt=W]∈[α,β]⋅t∏i=1μ(wi), (11)

where and .

Theorem 5 tell us that every algorithm where form atomic resampling oracles, will converge to a flawless object if and only if the sum

 ∑W∈Wt(A)t∏i=1μ(wi)

converges to zero as grows. In other words, the quality of the algorithm depends solely on the set which, in turn, is determined by flaw choice (and the initial distribution ).

In the work of Moser and Tardos for the variable setting  and of Harris and Srinivasan for the uniform mesure on permutations , flaw choice can be arbitrary and the whole issue “is swept under the rug” . This can be explained as follows. In these settings, due to the symmetry of , we can afford to overapproximate in a way that completely ignores flaw choice, i.e., we can deem flaw choice to be adversarial, and still recover the LLL condition. Theorem 5 shows that this should not be confused with deeming flaw choice “irrelevant” for such algorithms. Exactly the opposite is true, a fact also established experimentally : the Moser-Tardos algorithm, in practice, succeeds on instances far denser than predicted by the LLL condition.

Kolmogorov  gave a sufficient condition, called commutativity, for arbitrary flaw choice. One can think of commutativity as the requirement that there exists a supergraph of the causality graph satisfying a strong symmetry condition (including that all arcs are bidirectional), for which the LLL condition still holds. However, such symmetries can not be expected to hold in general, something reflected in the requirement of traceability in our Theorem 3, and in the specificity of the flaw choice mechanisms in our Theorems 1 and 2. More generally, in , Harvey and Vondrák provided strong evidence that in the absence of commutativity, specific flaw choice is necessary to match Shearer’s criterion for the LLL.

### 2.7 Comparison with Resampling Oracles

Harvey and Vondrák  proved that in the setting of resampling oracles, i.e., no distortion, when the causality graph is symmetric, if one resamples a maximal independent set of bad events each time, the resulting algorithm succeeds even under Shearer’s condition. (Notably, Shearer’s condition, involving an exponential number of terms, is not used in applications). As a corollary, they prove that in this setting, in (6), strict inequality () can be replaced with inequality (). As our results are over arbitrary directed causality graphs, for which no analogue to Shearer’s condition exists, we do not have an analogous result. However, for the case where the causality graph is symmetric (undirected), the third author showed  that the analogue of Shearer’s lemma holds in our framework. That is, if the conditions that result when in the standard Shearer lemma one replaces probabilities with charges are satisfied, then the algorithm that resamples a maximal independent set of bad events each time succeeds.

## 3 Bounding the Probabilities of Trajectories

To bound the probability that an algorithm runs for or more steps we partition its -trajectories into equivalence classes, bound the total probability of each class, and sum the bounds for the different classes. Formally, for a trajectory we let denote its witness sequence, i.e., the sequence of flaws addressed along (note that determines as flaw choice is deterministic). We let if has fewer than steps, otherwise we let be the -prefix of . Slightly abusing notation, as mentioned, we let be the random variable when is the trajectory of the walk, i.e., selected according to and the flaw choice mechanism. Finally, recall that denotes the range of for algorithm except for , i.e., is the set of -sequences of flaws that have positive probability of being the first flaws addressed by , as per Definition 3. Thus,

Key to our analysis will be the derivation of an upper bound for that holds for arbitrary -sequences of flaws, i.e., not necessarily elements of , and which factorizes over the flaws in . For an arbitrary sequence of flaws , let us denote by the index such that .

###### Lemma 1.

Let . For every sequence of flaws ,

 Pr[Wt=W]≤ξt∏i=1γ[i].
###### Proof.

We claim that for every , every -sequence of flaws , and every state ,

 Pr[Wt=W∩σt+1=τ]≤ξ⋅t∏i=1γ[i]⋅μ(τ). (12)

Summing (12) over all proves the lemma.

To prove our claim we proceed by induction on after recalling that for every and , by the definition of ,

 ∑σ∈fiμ(σ)ρi(σ,τ)≤γi⋅μ(τ). (13)

For the claim holds because for all , by the definition of .

Assume that (12) holds for all -sequences of flaws, for some . Let be any sequence of flaws and let be arbitrary. The first inequality below is due to the fact that since is the last flaw in a necessary (but not sufficient) condition for the event to occur is that is present in the state that results after the flaws in have been addressed (it is not sufficient as may choose to address a flaw other than ). The second inequality follows from the inductive hypothesis, while the third from (13).

 Pr[Ws+1=A′∩σs+2=τ] ≤ ∑σ∈fiρi(σ,τ)Pr[Ws=A∩σs+1=σ] ≤ ξ⋅s∏i=1γ[i]⋅∑σ∈fiμ(σ)⋅ρi(σ,τ) ≤ ξ⋅s+1∏i=1γ[i]⋅μ(τ).

## 4 Proof of Theorems 4 and 5

We first identify for every digraph–measure pair certain transition probabilities as special.

###### Harmonic Walks.

are harmonic if for every and every transition ,

 ρi(σ,τ)=μ(τ)∑σ′∈A(i,σ)μ(σ′)∝μ(τ). (14)

In words, when are harmonic assigns to each state in probability proportional to its probability under . It is easy to see that are harmonic both in the celebrated algorithm of Moser and Tardos  for the variable setting and in the algorithm of Harris and Srinivasan  for the uniform measure on permutations. What makes harmonic combinations special is that for any pair , taking so that are harmonic, can be easily seen to minimize the expression

 maxτ∈A(i,σ){ρi(σ,τ)μ(σ)μ(τ)}

for every simultaneously. For atomic this suffices to minimize the charge over all possible .

###### Proof of Theorem 4.

If is atomic, , and regenerate at every flaw , it follows that for every there is exactly one such that . (And also that ). Therefore, regeneration at in this setting is equivalent to:

 For every τ∈Ω and the unique σ such that τ∈A(i,σ):ρi(σ,τ)=μ(τ)μ(fi)μ(σ). (15)

(Note that for given there may be no satisfying (15), as we also need that .)

Since in (15) we get (10). Summing (15) over yields (9). ∎

###### Proof of Theorem 5.

Lemma 1, valid for any , readily yields the upper bound. For the lower bound, we start by recalling that in the proof of Theorem 4 we showed that if is atomic and regenerate at , then . Therefore, if , since regenerate at every , for every there exists such that and . Trivially,

 Pr[Στ]=θ(στ1)t∏i=1ρ[i](στi,στi+1).

Since is atomic and regenerate at every flaw, (15) applies, yielding

 ρ[i](στi,στi+1)=μ(wi)μ(στi+1)μ(στi).

Thus, by telescoping,

 Pr[Στ]=θ(στ1)t∏i=1μ(wi)μ(στi+1)μ(στi)=θ(στ1)μ(στ1)μ(τ)t∏i=1μ(wi). (16)

Summing (16) over gives the lower bound

 Pr[Wt=W]≥minσ∈Ωθ(σ)μ(σ)t∏i=1μ(wi).

## 5 Proof of Theorem 3

Per the hypothesis of Theorem 3, the sequences in can be injected into a set of rooted forests with vertices that satisfy the properties of Definition 4. Let be the set of all forests with vertices that satisfy the properties of Definition 4. By Lemma 1, to prove the theorem it suffices to prove that is exponentially small in for .

To proceed, we use ideas from . Specifically, we introduce a branching process that produces only forests in and bound by analyzing it. Given any real numbers we define and write to simplify notation. Recall that neither the trees in each forest, nor the nodes inside each tree are ordered. To start the process we produce the roots of the labeled forest by rejection sampling as follows: For each independently, with probability we add a root with label . If the resulting set of roots is in we accept the birth. If not, we delete the roots created and try again. In each subsequent round we follow a very similar procedure. Specifically, at each step, each node with label “gives birth”, again, by rejection sampling: For each integer , independently, with probability we add a vertex with label as a child of . If the resulting set of children of is in we accept the birth. If not, we delete the children created and try again. It is not hard to see that this process creates every forest in with positive probability. Specifically, for a vertex labeled by , every set receives probability 0, while every set receives probability proportional to

 w(S)=∏g∈Sxg∏h∈[m]∖S(1−xh).

To express the exact probability received by each we define

 Q(S):=∏g∈Sxg∏g∈S(1−xg)=∏g∈Sψg (17)

and let . We claim that . To see the claim observe that

 w(S)Z=∏g∈Sxg∏h∈[m]∖S(1−xh)∏i∈[m](1−xi)=∏g∈Sxg∏g∈S(1−xg)=Q(S).

Therefore, each receives probability equal to

 (18)

Similarly, each set receives probability equal to .

For each forest and each node of , let denote the set of labels of its children and let , where is the label of .

###### Lemma 2.

The branching process described above produces every forest with probability

 pϕ=(∑S∈Roots∏i∈Sψi)−1∏v∈ϕψv∑S∈List(v)Q(S)
###### Proof.

Let denote the roots of . By (18),

 pϕ =Q(R)∑S∈RootsQ(S)∏v∈ϕQ(N(v))∑S∈List(v)Q(S) =Q(R)∑S∈RootsQ(S)⋅∏v∈ϕ∖Rψv∏v∈ϕ∑S∈List(v)Q(S) =(∑S∈RootsQ(S))−1∏v∈ϕψv∑S∈List(v)Q(S).

Notice now that

 ∑W∈˜Wtt∏i=1γ[i] = ∑W∈˜Wtt∏i=1ζ[i]ψ[i]∑S∈List([i])Q(S) (19) ≤ (maxi∈[m]ζi)t∑W∈˜Wtt∏i=1ψ[i]∑S∈List([i])Q(S) = (maxi∈[m]ζi)t∑W∈˜Wt(pW∑S∈RootsQ(S)) = (maxi∈[m]ζi)t∑S∈RootsQ(S).

Using (19) we see that the binary logarithm of the probability that the walk does not encounter a flawless state within steps is at most , where

 T0 = log2(maxσ∈Ωθ(σ)μ(σ))+log2(∑S∈Roots∏i∈Sψi).

Therefore, if , the probability that the random walk on does not reach a flawless state within steps is at most .

## 6 Application to Acyclic Edge Coloring

### 6.1 Earlier Works and Statement of Result

An edge-coloring of a graph is proper if all edges incident to each vertex have distinct colors. A proper edge coloring is acyclic if it has no bichromatic cycles, i.e., no cycle receives exactly two (alternating) colors. Acyclic Edge Coloring (AEC), was originally motivated by the work of Coleman et al. [10, 9] on the efficient computation of Hessians. The smallest number of colors, , for which a graph has an acyclic edge-coloring can also be used to bound other parameters, such as the oriented chromatic number  and the star chromatic number , both of which have many practical applications. The first general linear upper bound for was given by Alon et al.  who proved , where denotes the maximum degree of . This bound was improved to by Molloy and Reed  and then to by Ndreca et al. . Attention to the problem was recently renewed due to the work of Esperet and Parreau  who proved