A New Perspective on Stochastic Local Search and the Lovasz Local Lemma

We present a new perspective on the analysis of stochastic local search algorithms, via linear algebra, and use it to establish a new criterion for their convergence. Our criterion captures and unifies the analysis of all currently known LLL-inspired local search algorithms, including all current applications of the entropy compression method. It can be seen as a generalization of the Lovasz Local Lemma that quantifies the interaction strength of bad events, so that weak interactions form correspondingly small obstacles to algorithmic convergence. As a demonstration of its power, we use our criterion to analyze a complex local search algorithm for the classical problem of coloring graphs with sparse neighborhoods. We prove that any improvement over our algorithm would require a major (and unexpected) breakthrough in random graph theory, suggesting that our criterion reaches the edge of tractability for this problem. Finally, we consider questions such as the number of possible distinct final states and the probability that certain portions of the state space are visited by a local search algorithm. Such information is currently available for the Moser-Tardos algorithm and for algorithms satisfying a combinatorial notion of commutativity introduced of Kolmogorov. Our framework provides a very natural and more general notion of commutativity (essentially matrix commutativity) which allows the recovery of all such results with much simpler proofs.

Authors

• 8 publications
• 7 publications
• 5 publications
• A new notion of commutativity for the algorithmic Lovász Local Lemma

The Lovász Local Lemma (LLL) is a powerful tool in probabilistic combina...
08/12/2020 ∙ by David G. Harris, et al. ∙ 0

• A Local Lemma for Focused Stochastic Algorithms

We develop a framework for the rigorous analysis of focused stochastic l...
09/03/2018 ∙ by Dimitris Achlioptas, et al. ∙ 0

• A Stochastic Process Model of Classical Search

Among classical search algorithms with the same heuristic information, w...
11/27/2015 ∙ by Dimitri Klimenko, et al. ∙ 0

• Unweighted Stochastic Local Search can be Effective for Random CSP Benchmarks

We present ULSA, a novel stochastic local search algorithm for random bi...
11/27/2014 ∙ by Christopher D. Rosin, et al. ∙ 0

• Algorithms for Optimizing Fleet Staging of Air Ambulances

In a disaster situation, air ambulance rapid response will often be the ...
01/10/2020 ∙ by Joseph Tassone, et al. ∙ 0

• Reinforcement learning based local search for grouping problems: A case study on graph coloring

Grouping problems aim to partition a set of items into multiple mutually...
04/01/2016 ∙ by Yangming Zhou, et al. ∙ 0

• Crystal Structure Prediction via Oblivious Local Search

We study Crystal Structure Prediction, one of the major problems in comp...
03/27/2020 ∙ by Dmytro Antypov, et al. ∙ 0

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Numerous problems in computer science and combinatorics can be formulated as searching for objects lacking certain bad properties, or “flaws”. For example, constraint satisfaction problems like satisfiability and graph coloring can be seen as searching for objects (truth assignments, colorings) that are flawless, in the sense that they do not violate any constraint. A large class of algorithms for finding flawless objects employ “stochastic local search”; such algorithms start with a flawed object and try to make it flawless via small randomized changes that in each step focus on eradicating a specific flaw (while potentially introducing others). Given their great practical success, it is natural to ask whether there are conditions under which stochastic local search algorithms provably work efficiently, and use these conditions to show that interesting families of instances of hard problems are in fact tractable.

The Lovász Local Lemma (LLL) [28] is a powerful tool for proving the existence of flawless objects that has had far-reaching consequences in computer science and combinatorics [12, 55]. Roughly speaking, it asserts that, given a collection of a bad events in a probability space, if all of them are individually not too likely, and independent of most other bad events, then the probability that none of them occurs is strictly positive; hence a flawless object exists. For example, the LLL implies that every -CNF formula in which each clause shares variables with fewer than other clauses is satisfiable. Remarkably, this is tight [32].

In groundbreaking work, Moser [56], joined by Tardos in [57], showed that a simple local search algorithm can be used to make the LLL constructive for product probability spaces. For example, the Moser-Tardos algorithm for satisfiability amounts to starting at a uniformly random truth assignment and, as long as violated clauses exist, selecting any such clause and resampling all of its variables uniformly at random. Following this work, a large amount of effort has been devoted to making all known variants of the LLL constructive: see, e.g., [50, 51, 22, 60, 42, 4, 44, 5].

Moser’s original analysis also inspired a parallel line of work centered on the entropy compression method. This method has been used primarily to analyze backtracking algorithms, e.g., for non-repetitive sequences [36, 27], acyclic edge coloring [30], non-repetitive list-coloring [34], the Thue choice number [35], and pattern avoidance [59]. More recently, entropy compression was used to analyze resampling algorithms for stochastic control [6] and graph list-coloring [53], in the latter case dramatically simplifying the celebrated result of Johansson [47]. While the spirit of the analysis in all these works is close to [56], they are not derived from a known form of the LLL and indeed often improve on earlier results obtained from the LLL. Instead, the fact that the algorithm under consideration reaches a flawless object is established in each case by a problem-specific counting argument.

In this paper we introduce a new viewpoint for the analysis of local search algorithms, based on linear algebra. Our key insight is the following:

LLL-inspired convergence arguments can be seen as a method for bounding

the spectral radius of a matrix specifying the algorithm to be analyzed.

Among the benefits of this new viewpoint, which we will present in a moment, are the following:

• A unified analysis of all entropy compression applications, connecting backtracking algorithms to the LLL in the same fashion that existing analyses connect resampling algorithms to the LLL.

• A new convergence condition that seamlessly handles resampling algorithms that can detect, and back away from, unfavorable parts of the state space.

• Several applications of this condition, notably a new vertex coloring algorithm for arbitrary graphs that uses a number of colors that matches the algorithmic barrier for random graphs. Thus, any improvement on our algorithm’s guarantee requires a breakthrough in random graph theory.

• A generalization of Kolmogorov’s notion of commutative algorithms [52], cast as matrix commutativity, which affords much simpler proofs both of the original results and of recent extensions.

1.1 The Lovász Local Lemma as a Spectral Condition

Let be a (large) finite set of objects and let be the “bad” part of , comprising the flawed objects; e.g., for a CNF formula on variables and comprises all non-satisfying assignments. Imagine a particle trying to escape

by following a Markov chain

111Our framework does not require the state evolution to be Markovian, but we make this assumption here to simplify exposition. on  with transition matrix . Our task is to develop conditions under which the particle eventually escapes, thus establishing in particular that . (Motivated by this view, we also refer to objects as states.) Letting be the submatrix of that corresponds to transitions from to , and the submatrix that corresponds to transitions from to , we see that, after a suitable permutation of its rows and columns, can be written as

 P=[AB0I].

Here

is the identity matrix, since we assume that the particle stops after reaching a flawless state.

Let

be the row vector that corresponds to the probability distribution of the starting state, where

and are the vectors that correspond to states in and , respectively. Then, the probability that after steps the particle is still inside is exactly . Therefore, for any initial distribution , the particle escapes if and only if the spectral radius, , of is strictly less than 1. Moreover, the rate of convergence is dictated by . Unfortunately, since

is huge and defined implicitly by an algorithm, the magnitude of its largest eigenvalue,

In linear systems analysis, to sidestep the inaccessibility of the spectral radius, , one typically bounds instead some operator norm of the matrix , since for any such norm. (For brief background on matrix norms see Appendix A.) Moreover, instead of bounding an operator norm of itself, one often first performs a “change of basis” and bounds , justified by the fact that

, for any invertible matrix

. The purpose of the change of basis is to cast “in a good light” in the eyes of the chosen operator norm, in the hope of minimizing the cost of replacing the spectral norm with an operator norm. To demonstrate this approach in action, we start by showing how it captures the classical potential function argument.

Consider any function on such that for , while for . In our -SAT example, could be the number of violated clauses under . The potential argument asserts that eventually (i.e., the particle escapes ) if is always reduced in expectation, i.e., if for every ,

 ∑σ′∈ΩP[σ,σ′]ϕ(σ′)<ϕ(σ). (1)

To express this argument via matrix norms, let where is the diagonal matrix . Thus, . Recalling that is the maximum row sum of a matrix, we see that the potential argument’s condition (1) is nothing other than .

Our starting point is the observation that all entropy compression arguments, and indeed all arguments in the algorithmic LLL literature, can be seen as dual to the potential function argument. That is, after a suitable change of basis , they bound not , as the potential argument, but the dual norm . As a concrete demonstration, let us consider the Moser-Tardos algorithm for a -CNF formula on variables with clauses , under the uniform measure on . For simplicity, assume that the lowest indexed violated clause is always resampled, so that the state evolves as a Markov chain.

For each clause , let be the submatrix of comprising all rows (states) where the resampled clause is . (All other rows of are 0). For , let contain every -sequence of (indices of) clauses that has non-zero probability of comprising the first clauses resampled by the algorithm. In other words, is the set of all -sequences of indices from corresponding to non-vanishing -products of matrices from , i.e., . With these definitions, the first inequality below follows from the fact that for any operator norm , the triangle inequality gives the second inequality and, crucially, the submultiplicativity of operator norms gives the third:

 (2)

Observe that (2) holds for every operator norm. To get a favorable bound here, we will apply (2) with the norm , i.e., the maximum column sum. We see that for all , every column of has at most one non-zero entry, since only if is the mutation of so that is violated. Recalling that all non-zero entries of equal , we conclude for all . Therefore, . To bound we use a simple necessary condition for membership in which, by a standard counting argument, implies that if each clause shares variables with at most other clauses, then . Therefore, implying that if , then and the algorithm terminates within steps with high probability.

1.2 Informal Statement of our Main Theorem

The matrix-norm perspective we introduce in this paper allows us not only to cast the probabilistic method aspect of the algorithmic LLL as a change of basis, and the overall approach as a dual potential function argument, but, more importantly, to significantly expand and refine the analysis, so that it can avoid a hard notion of dependence, i.e., a dependency or causality graph. This is because, unlike past works, our condition quantifies point-to-set correlations, so that interactions can be arbitrarily dense as long as they are sufficiently weak. Before stating our result we need to fix some notation.

Let be a discrete state space, and let be a collection of subsets (flaws) of such that . For a state , we denote by the set of (indices of) flaws present in . (Here and elsewhere, we shall blur the distinction between flaws and their indices.) We consider algorithms which, in each flawed state , choose a flaw  in and attempt to leave (“fix”) , by moving to a new state according to a probability distribution . We make minimal assumptions about how the algorithm choses which flaw to address in each step, e.g., it will be enough for the algorithm to choose the flaw with lowest index according to some fixed permutation. (We discuss this point further in the formal statement of our results.) We refer to an attempt to fix a flaw, successful or not, as addressing it. We say that a transition , made to address flaw , introduces flaw if or if . (Thus, a flaw (re)introduces itself when a transition fails to address it.)

Let us define for every flaw , , and every set of (indices of) flaws , the matrix to be the matrix having a non-zero entry for every transition such that the set of flaws introduced contains .

Theorem 1.1.

Let be any invertible matrix. Let be any operator norm. For and , let . If there exist positive real numbers such that for all ,

 1ψi∑S⊆[m]γSi∏j∈Sψj<1, (3)

then a local search algorithm as above reaches a flawless object quickly with high probability.

We quantify the phrase “quickly with high probability” in our formal statement of results. In applications, this typically means that the probability that the algorithm takes more than steps is , for some that is linear in the size of the input.

We refer to the norm appearing in the above theorem as the charge of the pair . These charges will play a crucial role in our analysis.

In Section 2.4 we will introduce a significant strengthening of Theorem 1.1 that, under certain conditions, allows us to sparsify each matrix , thus reducing its norm, by zeroing out some or all entries that correspond to transitions where a superset of was introduced. Before doing so, we first discuss how Theorem 1.1 already captures and generalizes previous work on the algorithmic LLL.

2 Comparison with Previous LLL Conditions

Here we give background on the LLL and explain how our main theorem compares with previous algorithmic LLL conditions. In the interest of space, some additional background material appears in Appendix B.

2.1 Non-constructive Conditions

We start by stating the strongest form of the LLL that holds for arbitrary probability spaces and families of bad events (see e.g., [55, p.228]).

General LLL.

Let be a probability space and let be a set of (bad) events. For each , let be such that for every . If there exist positive real numbers such that for all ,

 biψi∑S⊆L(i)∪{i}∏j∈Sψj≤1, (4)

then the probability that none of the events in occurs is at least .

Writing , condition (4) takes the more familiar form . The form in (4), though, is more amenable to refinement and comparison. The directed graph on where each vertex points to the vertices in is known as the lopsidependency graph.

The above form of the LLL is motivated by the fact that, in complex applications, small but non-vanishing correlations tend to travel arbitrarily far in the space . To isolate these dependencies so that they can be treated locally, it can be crucial [9, 26, 48, 49] to allow mild negative correlations between each bad event and the events outside its “special” set , achieved by allowing . The Lopsided LLL of Erdős and Spencer [29] corresponds to , i.e., to not allowing such negative correlations.

2.2 Constructive Conditions

Using the framework of Section 1.2 for local search algorithms, we say that flaw causes flaw and write , if there exist such that and the transition introduces flaw . (Thus, causality is to be interpreted as potential causality.) Let be the set of flaws caused by . We call the digraph over in which each vertex points to the vertices in the causality digraph.

Let be an arbitrary probability measure on . The distortion of associated with flaw is the greatest inflation of a state probability induced by sampling according to and addressing flaw at . More formally, for and , let be the probability of ending up in state after sampling a state according to , and then addressing at . The distortion associated with is then

 di:=maxσ′∈Ωνi(σ′)μ(σ′)≥1.

If , i.e., for all , we say that the algorithm is a resampling oracle [44] for . Observe that a resampling oracle perfectly removes the conditioning on the old state belonging to , since the new state is distributed according to . For example, if is a product measure, this is precisely what is achieved by the resampling algorithm of Moser and Tardos [57].

Requiring an algorithm to be a resampling oracle for every flaw may be impossible to achieve by local exploration within , i.e., by “local search.” (Note that restricting to local search is crucial since longer-range resampling, even if it were possible, would tend to rapidly densify the causality digraph.) Allowing distortion frees the algorithm from the strict resampling requirement of perfect deconditioning. Optimizing the tradeoff between distortion and the density of the causality digraph has recently led to strong algorithmic results [5, 53, 46, 31]. (As we will see later, our results make this optimization task easier.)

Algorithmic LLL.

Let . If there exist positive real numbers such that for all ,

 γiψi∑S⊆Γ(i)∏j∈Sψj<1, (5)

then a local search algorithm as above reaches a flawless object quickly with high probability.

As shown in [31], condition (5) is the algorithmic counterpart of the existential condition (4): a causality digraph is a lopsidependency graph for measure with for all . We include the proof of this fact from [31] in Appendix C for completeness. In particular, when one has resampling oracles for all flaws, i.e., for all , then condition (4) is the algorithmic counterpart of the Lopsided LLL, as established by Harvey and Vondrák [44]. Condition (5) also subsumes the flaws/actions condition of [3]: in that setting is uniform over the set of possible next states, while the analysis does not reference a measure . Taking to be uniform and applying condition (5) in fact sharpens the condition of [3].

We note that condition (5) can be improved in certain settings, i.e., under additional assumptions. Let be an undirected graph on such that is a subset of the neighbors of , for every . (One can trivially get such a by ignoring the direction of arcs in the lopsidependency graph, but at the cost of potentially expanding the “neighborhood” of each vertex.) It was proven in [5] that condition (5) can be replaced by the cluster expansion condition [20] on , while in [52] it was proven that condition (5) can be replaced by Shearer’s condition [65]. Both of these conditions benefit by restricting consideration to independent sets of (see Appendix B). Also, Harris and Srinivasan [39, 41] have developed improved conditions for the convergence of algorithms operating in the so-called variable setting [57], based on refinements of the notion of dependency between bad events. These improvements are incomparable to condition (5), as they do not apply to general local search algorithms (for instance, all algorithms in the variable setting are commutative).

2.3 Our New Condition

Our Theorem 1.1 is a strict generalization of the above algorithmic LLL condition (5). To see this, observe that if is the diagonal matrix and , then for every set , as

 γSi=∥MASiM−1∥1≤∥MA∅iM−1∥1=maxσ′∈Ω∑σ∈fiμ(σ)μ(σ′)ρi(σ,σ′)=maxσ′∈Ων(σ′)μ(σ′)μ(fi)=diμ(fi)=γi.

Hence, since also for , the l.h.s. of our condition (3) is never larger than the l.h.s. of (5).

As a quick example of where this may be helpful, let be a clause in a CNF formula and let be a set of clauses that share variables with but which can never be violated simultaneously, e.g., because two of them disagree on the sign of a variable. Trivially, , even though (assuming the algorithm ever addresses ). Clearly, the advantageous vanishing of here was due to a structural property of . In the absence of such a structural property, we may still be able to achieve by designing the algorithm so that it never transitions from a state where is violated to a state where all clauses in are violated.

Next we discuss a strengthening of Theorem 1.1 that leads to significant algorithmic improvements.

2.4 Refinement of Our Condition

Even though, as we have just seen, our Theorem 1.1 already improves upon all existing general algorithmic LLL conditions, we might hope to do even better. Observe that in Theorem 1.1 a matrix has a non-zero entry whenever addressing flaw  introduces any superset of flaws. Ideally, we would like a non-zero entry in only when the set of flaws introduced is exactly , so that the matrices partition . The reason for this apparent weakness is that flaws introduced by fixing  may later be fixed “collaterally,” i.e., as a result of addressing other flaws, rather than by being specifically addressed by the algorithm, so we cannot charge those flaws unambiguously to . While it may initially seem that the possibility of collateral fixes of flaws cannot be detrimental, from an analysis perspective they actually represent a loss of control over the progress of the algorithm. Consider for example a step in which addressing flaw introduces a set of flaws, all of which end up being fixed collaterally. The analysis will charge for this step, even though (had we been able to detect that that is what happened) we could have charged .

Tracking collateral fixes and taking them into account not only wreaks havoc on theoretical bounds, but also appears to be a bad idea in practice [63, 64, 14, 15] : for example, focused local search satisfiability algorithms which select variables to flip based only on the flaws they introduce are known to fare much better than algorithms that weigh this damage against the benefit of the collaterally fixed clauses. As we will see, if an algorithm never makes collateral fixes, then we can sharpen Theorem 1.1 so that each matrix has a non-zero entry only when the set of flaws introduced is exactly , as desired. This leads to a significant sparsification of the matrices, and a corresponding reduction of the charges .

A natural class of local search algorithms with no collateral fixes are backtracking algorithms for CSPs. In these algorithms the state space is the set of all partial assignments that do not violate any constraint, while there is one flaw for each unassigned variable: if fixing a flaw (i.e., assigning a variable) causes one or more constraints to become violated, the algorithm backtracks by unassigning not only the last variable set but several more—typically all variables involved in some violated constraint. Examples of such algorithms include [36, 30, 27, 34, 35, 59, 61, 21, 53, 13, 19]. Our sharpened theorem immediately provides a unified and greatly simplified analysis of such algorithms (see Section 6 for examples). Note in particular that our ability to control point-to-set correlations in Theorem 1.1 is crucial for this: in principle, backtracking steps, by their nature, may introduce many flaws, but because this happens only in very specific circumstances, the associated charges are small.

Having developed an algorithmic local lemma for backtracking algorithms, we extend our framework to cover algorithms that make both “resampling” and backtracking steps. The key for this is to introduce the notion of a primary flaw, which is a flaw that, once present, can only be eradicated by being addressed by the algorithm (i.e., it cannot be fixed collaterally). Note that all flaws in a backtracking algorithm are primary. The strongest form of our theorem (Theorem 3.3 in the next section), which applies to arbitrary local search algorithms, allows us to restrict the non-zero entries of each matrix to transitions where addressing flaw  introduces precisely the set of primary flaws in , as well as any superset of the non-primary flaws in . This form of the theorem is particularly powerful when analyzing resampling algorithms that include additional backtracking steps in order to retreat from “bad” parts of the state space. The reason is that, with the separation of flaws into primary and non-primary, the charge reflects exactly the distortion due to transitions that “make progress”, i.e., that do not introduce primary flaws, and all of whose introduced non-primary flaws are fixed collaterally. Thus, if in some region of state space, the algorithm blows up , this is a signal that we should modify it to backtrack instead of pressing on. Even though doing so creates dependencies between potentially large sets of flaws, our capacity to quantify point-to-set correlations allows us to charge such steps in proportion to their frequency. We illustrate the power of this approach by adding backtracking steps to Molloy’s recent breakthrough resampling algorithm for coloring triangle-free graphs [53], in order to handle graphs with triangles.

3 Statement of Results

3.1 A New Algorithmic LLL Condition

Below we state our main result, which includes the strengthening of Theorem 1.1 discussed in Section 2.4.

Definition 3.1 (Primary Flaws).

A flaw is primary if for every and every , addressing at always results in some , i.e., is never eradicated collaterally. For a given set , we write and to denote the indices that correspond to primary and non-primary flaws in , respectively.

Definition 3.2 (Sparsified Matrices).

For every and every set of flaw indices , let be the matrix where if the set of primary flaws introduced by the transition equals and the set of non-primary flaws introduced by contains ; otherwise .

Remark 3.1.

In Theorem 1.1, we used matrices where if the set of flaws introduced by the transition contained . The sparsification amounts to zeroing out all entries for which the set of primary flaws introduced is a strict superset of . In particular, if , then all entries corresponding to transitions that introduce primary flaws are zeroed-out.

For a state , let denote the indicator vector of , i.e., and for all . The span of a probability distribution , denoted by , is the set of flaw indices that may be present in a state selected according to , i.e., .

Let be an arbitrary permutation over . We say that an algorithm follows the -strategy if at each step it picks to address the flaw corresponding to the lowest index element of according to .

Theorem 3.3 (Main Result).

Let be any invertible matrix such that . Let be any operator norm. For every and , let . If there exist positive real numbers such that for every ,

 ζi:=1ψi∑S⊆[m]γSi∏j∈Sψj<1, (6)

then, for every permutation over , the probability that an algorithm following the -strategy fails to find a flawless object within steps is , where , and

 T0=log2∥θ⊤M−1∥∗+log2(∑S⊆Span(θ)∏j∈Sψj)+log2(maxS⊆[m]1∏j∈Sψj),

where denotes the dual norm of .

To get a feeling for Theorem 3.3, we start by noting that in typical applications the sum in (6) is easily computable, as for the vast majority of subsets . Also, will usually be a positive diagonal matrix , so that means that is a probability distribution. Thus, the vector is the ratio of two probability distributions on  so, typically, . In the important special case where, additionally, , the time bound simplifies to the following.

Corollary 3.4.

Let be an arbitrary measure on , let be the matrix , and let . If there exist positive real numbers such that for every , condition (6) holds, then the conclusion of Theorem 3.3 holds with .

Remark 3.2.

In applications of Corollary 3.4, typically are such that .

Remark 3.3.

The requirement is not really necessary. We impose it because in applications is typically diagonal with positive entries, in which case the normalization simplifies the expressions for the running time.

Remark 3.4.

For any fixed permutation , the charges can be reduced by replacing with the matrix that results by zeroing out every row of for which is not the lowest indexed element of according to .

Remark 3.5.

Theorem 3.3 holds also for algorithms using flaw choice strategies other than -strategies. We discuss some such strategies in Section 4.3. However, there is good reason to expect that it does not hold for arbitrary flaw choice strategies, i.e., without additional assumptions (for more details see [52]).

3.2 Application to Graph Coloring

In graph coloring one is given a graph and the goal is to find a mapping of to a set of colors so that no edge in is monochromatic. The chromatic number, , of is the smallest integer for which this is possible. Given a set of colors (called a list) for each vertex , a list-coloring maps each to a color in so that no edge in is monochromatic. A graph is -list-colorable if it has a list-coloring no matter how one assigns a list of colors to each vertex. The list chromatic number, , is the smallest for which is -list-colorable. Clearly . A celebrated result of Johansson [47] established that there exists a large constant such that every triangle-free graph with maximum degree can be list-colored using colors. Very recently, using the entropy compression method, Molloy [53] improved Johansson’s result, replacing with for any and all . (Soon thereafter, Bernshteyn [17] established the same bound for the list chromatic number, non-constructively, via (4), and Iliopoulos [46] showed that the algorithm of Molloy can be analyzed using (5).)

Our main result in this section is a generalization of Molloy’s result to graphs with a bounded number of triangles per vertex. Specifically, in Section 5 we establish the following general theorem for the list chromatic number (the triangle-free case corresponding to ).

Theorem 3.5.

Let be any graph with maximum degree in which the neighbors of every vertex span at most edges. For all , there exists such that if and , then

 χℓ(G)≤(1+ϵ)Δ/ln√f.

Furthermore, such a coloring can be found in polynomial time with high probability.

Theorem 3.5 is interesting for two reasons. First, random graphs suggest that it is sharp, i.e., that no efficient algorithm can color graphs satisfying the conditions of the theorem with colors. More precisely, Proposition 3.1 below, proved in Appendix F, implies that any such algorithm would entail coloring random graphs using fewer than twice as many colors as their chromatic number.

Proposition 3.1.

For every and , there exist and such that with probability tending to as , a random graph satisfies the conditions of Theorem 3.5 and .

This would be a major (and unexpected) breakthrough in random graph theory, as beating this factor of two has been an elusive goal for over 40 years. Also, for sparse random graphs, this factor of two corresponds to a phase transition in the geometry of the set of colorings

[1], known as the shattering threshold. In other words, our algorithm can be seen as a robust version of previously known algorithms [7] for coloring random graphs up to the shattering threshold, that applies to worst-case graphs as well.

Second, armed with Theorem 3.5, we are able to prove the following result concerning the chromatic number of general graphs, as a function of the maximum degree and the maximum number of triangles in any neighborhood:

Theorem 3.6.

Let be a graph with maximum degree in which the neighbors of every vertex span at most edges. For all , there exist such that if and , then

 χ(G)≤(2+ϵ)Δ/ln√f. (7)

Furthermore, such a coloring can be found in polynomial time with high probability.

Theorem 3.6 improves a classical result of Alon, Krivelevich and Sudakov [11] which established (7) with an unspecified (large) constant in place of . Indeed, our analysis closely follows theirs. The main idea is to break down the input graph into triangle-free subgraphs, and color each one of them separately using distinct sets of colors by applying the result of Johansson [47]. Instead, we break down the graph into subgraphs with few triangles per neighborhood, and use Theorem 3.5 to color the pieces. The proof of Theorem 3.6 can be found in Appendix E. We note that Theorem 3.5 is essential here: even if we used Molloy’s [53] recent result in place of Johansson’s in the above scheme, the corresponding constant would still be in the thousands.

As final remark, we note that Vu [66] proved the analogue of the main result of [11] (again with a large constant) for the list chromatic number. While we don’t currently see how to sharpen Vu’s result to an analogue of Theorem 3.6 for the list chromatic number using our techniques, we note that our Theorem 3.5 improves over [66] for all .

3.3 Application to Backtracking Algorithms

An important class of algorithms naturally devoid of “collateral fixes” are backtracking algorithms, as discussed in Section 2.4. In particular, consider a Constraint Satisfaction Problem (CSP) over a set of variables , each variable taking values in a domain , with a set of constraints over these variables. The backtracking algorithms we consider operate as follows. (Note that in Step 1, we can always take to be the distribution under which all variables are unassigned; this does not affect the convergence condition (6) but may have a mild effect on the running time.)

Let be the set of partial assignments to that do not violate any constraint in . For each variable , let flaw comprise the partial assignments in which is unassigned. Clearly, each flaw can only be removed by addressing it, as addressing any other flaw can only unassign . Thus, every flaw is primary and a flawless state is a complete satisfying assignment. The fact that every flaw is primary leads to an improvement in the running time bound of the algorithm, i.e., the value of in Theorem 3.3.

Corollary 3.7.

Let be the set comprising the sets of flaw-indices that may be present in a state selected according to . If every flaw is primary, then the sum over in the definition of can be restricted to . In particular, if every variable is initially unassigned, this sum equals .

We give three representative applications in Section 6. First, we develop a corollary of Theorem 3.3 that can be used to make applications of the LLL in the variable setting [57] constructive via a backtracking algorithm, i.e., an algorithm of very different flavor from the Moser-Tardos algorithm. We note that very recently and independently, Bissacot and Doin [19] also showed that backtracking algorithms can make constructive LLL applications in the variable setting using the entropy compression method. However, their result applies only to the uniform measure and their algorithms are relatively complicated. In contrast, we show that a simple backtracking algorithm works for every product measure. Second, we show how Theorem 3.3 perfectly recovers in a black-box fashion the main result of Esperet and Parreau [30] for acyclic edge coloring. Finally, we show how our application of Theorem 3.3 to acyclic edge coloring can be adapted with minimal effort to make constructive an existential result of Bernshteyn [16] showing improved bounds for the acyclic chromatic index of graphs that do not contain any fixed arbitrary bipartite graph . Specifically, we prove the following result.

Theorem 3.8.

Let be a graph with maximum degree and let be a fixed bipartite graph. If does not contain as a subgraph, then there exists an acyclic edge coloring of using at most colors. Moreover, such a coloring can be found in time with high probability.

3.4 Commutative Algorithms and Distributional Properties

Besides conditions for fast convergence to flawless objects, it is natural to ask further questions about focused search algorithms, such as: “Are they parallelizable?”; “How many distinct solutions can they output?”, etc. These questions and more have been answered for the Moser-Tardos algorithm in a long series of papers [57, 38, 43, 50, 22, 23, 37, 2]. As a prominent example, the result of Haeupler, Saha and Srinivasan [38] shows that the Moser-Tardos algorithm, in a certain sense, approximates well the LLL-distribution, i.e., the distribution obtained by conditioning on avoiding all bad events.

Harvey and Vondrák [44] showed that these distributional results are unlikely to transfer to the more general algorithmic LLL settings of [4, 44, 5], because the so-called Witness Tree Lemma—the key technical ingredient for analyzing the Moser-Tardos algorithm—can fail to hold in such settings. In [46], Iliopoulos established the Witness Tree Lemma for algorithms satisfying Kolmogorov’s notion of “commutativity” [52] and showed how it can be used to establish many distributional properties of such algorithms. Kolmogorov’s notion of commutativity requires that for every , every sequence of state transitions of the form can be mapped to a distinct sequence of state transitions of the form , so that .

Our matrix framework allows us to introduce a more natural notion of algorithmic commutativity, essentially matrix commutativity, that is also more general than the notion of [52]. For , let denote the matrix where for , and 0 otherwise. Recall the definition of .

Definition 3.9.

An algorithm is commutative with respect to a symmetric binary relation if

1. , for every such that .

2. .

Remark 3.6.

In most applications when , in which case a implies b.

Under this new notion, we recover all the results of [52, 46] with much simpler proofs, at the mild cost of restricting the family of flaw choice strategies to canonical ones, per Definition 3.10 below. (In [52, 46] the flaw choice strategy can be arbitrary.) Note that in the commutative setting, canonical flaw choice strategies suffice to capture the optimal convergence results, so that the restriction to such strategies is indeed mild.

Definition 3.10.

Fix an arbitrary sequence of (possibly stochastic) functions , each mapping to an element of . A flaw choice strategy is canonical if the flaw addressed in the -th step is , where is the state after steps.

In particular, we establish the Witness Tree Lemma, from which all the other results follow. (In fact, we prove a more general version of the Witness Tree Lemma that takes as input an operator norm and diagonal matrix . Using the -norm and , where is a probability measure over , recovers the standard Witness Tree Lemma.) The formal statement and proof of this result can be found in Section 7 (Theorem 7.1).

Theorem 3.11 (Informal Statement).

The Witness Tree Lemma holds for commutative algorithms that follow canonical flaw choice strategies.

4 Proof of Main Theorem

In Sections 4.1 and 4.2 we present the proof of our main result, the new algorithmic LLL condition in Theorem 3.3. In Section 4.3 we show how to extend the theorem to allow flaw choice strategies other than following a fixed permutation over flaws.

Throughout this section we use standard facts about operator norms, summarized briefly in Appendix A.

4.1 Tracking the Set of Current Flaws

We say that a trajectory followed by the algorithm is a bad -trajectory if every state , , is flawed. Thus, our goal is to bound the probability that the algorithm follows a bad -trajectory.

Given a bad trajectory, intuitively, we track the flaws introduced into the state in each step, where a flaw is said to “introduce itself” whenever addressing it fails to remove it. Of the flaws introduced in each step, we disregard those that later get eradicated collaterally, i.e., by an action addressing some other flaw. The rest form the “witness sequence” of the trajectory, i.e., a sequence of sets of flaws.

Fix any permutation on . For any , let , i.e., the lowest index in according to . Recalling that is the set of indices of flaws present in , in the following we assume that the index of the flaw addressed in state is , which we sometimes abbreviate as . Also, to lighten notation, we will denote by .

Definition 4.1.

Let be any bad -trajectory. Let . For , let

 Bi=U(σi+1)∖[U(σi)−π(σi)],

i.e., comprises the indices of the flaws introduced in the -th step. For , let

 Ci={k∈Bi∣∃j∈[i+1,t]:k∉U(σj+1)∧∀ℓ∈[i+1,j]:k≠π(σℓ)},

i.e., comprises the indices of the flaws introduced in the -th step that get eradicated collaterally. The witness sequence of bad -trajectory is the sequence of sets

 w(Σ)=(B0∖C0,B1∖C1,…,Bt∖Ct).

A crucial feature of witness sequences is that they allow us to recover the sequence of flaws addressed.

Definition 4.2.

Given an arbitrary sequence , let , while for , let

 S∗i+1={[S∗i−π(S∗i)]∪Si%if$S∗i≠∅$,∅otherwise.

If for all , then we say that is plausible and write .

Lemma 4.3.

If is any bad -trajectory, then is plausible, for all , and for every flaw index , the number of times occurs in the multiset minus the number of times it occurs in the multiset equals .

Proof.

Recall that . For , let comprise the elements of eradicated collaterally during the -th step and let comprise the elements of eradicated collaterally during any step . Observe that . We will prove, by induction, that for all ,

 S∗i ⊆ U(σi) (8) U(σi)∖S∗i = Hi. (9)

Observe that if (8), (9) hold for a given , then , since by the definition of , and whenever . Moreover, , because otherwise , an impossibility. To complete the proof it suffices to note that for any , the difference in question equals and that since, by definition, . The inductive proof is as follows.

For , (8), (9) hold since , while . If (8), (9) hold for some , then while, by definition, . Thus, the fact that trivially implies , while

 U(σi+1)∖S∗i+1=(U(σi)∖S∗i∖Li)∪(Bi∖Si)=(Hi∖Li)∪Ci=Hi+1.

The first step in our proof of Theorem 3.3 is to give an upper bound on the probability that a given witness sequence occurs in terms of the charges defined in Section 3.1.

Lemma 4.4.

Fix any integer and let

be the random variable

. For any invertible matrix such that and any plausible sequence ,

 Pr[w(Σ)=ϕ]≤∥θ⊤M−1∥∗t∏i=1γSi(i). (10)
Proof.

By Definition 4.1 and Lemma 4.3, a necessary condition for to occur is that and , for every .

Recall that for any , we denote by and the subsets of that correspond to primary and non-primary flaws, respectively. By Definition 4.1 and Lemma 4.3, a necessary condition for to occur is that and , for every . Moreover, since primary flaws are never eradicated collaterally, i.e., always, it must also be that for . Fix any state . If is the row vector expressing the probability distribution of the initial state , then the probability that equals the -column (coordinate) of the row-vector . More generally, if is the indicator vector of state , we see that for any ,

 Pr[t⋀i=1((i)∈U(σi))t⋀i=1(SPi=BPi)t⋀i=1(SNi⊆BNi)⋀σt+1=τ]=θ⊤t∏i=1ASi(i)eτ. (11)

Consider now any vector norm and the corresponding operator norm. By (36),

 θ⊤t∏i=1ASi(i)eτ=θ⊤M−1(t∏i=1MASi(i)M−1)Meτ ≤∣∣ ∣∣∣∣ ∣∣θ⊤M−1(t∏i=1MASi(i)M−1)∣∣ ∣∣∣∣ ∣∣∗∥Meτ∥. (12)

Summing (12) over all and restricting to matrices for which we conclude that

 Pr[w(Σ)=ϕ]=∑τ