DeepAI
Log In Sign Up

Fast sampling via spectral independence beyond bounded-degree graphs

11/07/2021
by   Ivona Bezakova, et al.
0

Spectral independence is a recently-developed framework for obtaining sharp bounds on the convergence time of the classical Glauber dynamics. This new framework has yielded optimal O(n log n) sampling algorithms on bounded-degree graphs for a large class of problems throughout the so-called uniqueness regime, including, for example, the problems of sampling independent sets, matchings, and Ising-model configurations. Our main contribution is to relax the bounded-degree assumption that has so far been important in establishing and applying spectral independence. Previous methods for avoiding degree bounds rely on using L^p-norms to analyse contraction on graphs with bounded connective constant (Sinclair, Srivastava, Yin; FOCS'13). The non-linearity of L^p-norms is an obstacle to applying these results to bound spectral independence. Our solution is to capture the L^p-analysis recursively by amortising over the subtrees of the recurrence used to analyse contraction. Our method generalises previous analyses that applied only to bounded-degree graphs. As a main application of our techniques, we consider the random graph G(n,d/n), where the previously known algorithms run in time n^O(log d) or applied only to large d. We refine these algorithmic bounds significantly, and develop fast n^1+o(1) algorithms based on Glauber dynamics that apply to all d, throughout the uniqueness regime.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

07/21/2021

Sampling from Potts on random graphs of unbounded degree via random-cluster dynamics

We consider the problem of sampling from the ferromagnetic Potts and ran...
07/12/2018

Algorithms for #BIS-hard problems on expander graphs

We give an FPTAS and an efficient sampling algorithm for the high-fugaci...
11/05/2021

Entropic Independence II: Optimal Sampling and Concentration via Restricted Modified Log-Sobolev Inequalities

We introduce a framework for obtaining tight mixing times for Markov cha...
03/15/2022

Optimal mixing for two-state anti-ferromagnetic spin systems

We prove an optimal Ω(n^-1) lower bound for modified log-Sobolev (MLS) c...
11/07/2022

Spectral Independence Beyond Uniqueness using the topological method

We present novel results for fast mixing of Glauber dynamics using the n...
07/06/2022

From algorithms to connectivity and back: finding a giant component in random k-SAT

We take an algorithmic approach to studying the solution space geometry ...
02/26/2020

Spectral Sparsification via Bounded-Independence Sampling

We give a deterministic, nearly logarithmic-space algorithm for mild spe...

1 Introduction

Spectral independence was introduced by Anari, Liu, and Oveis Gharan [ALG20] as a new framework to obtain polynomial bounds on the mixing time of Glauber dynamics. Originally based on the high-dimensional expander results of Alev and Lau [AL20], it has since then been developed further using entropy decay by Chen, Liu, and Vigoda [CLV21] who obtained optimal mixing results on graphs of bounded maximum degree whenever the framework applies. This paper focuses on relaxing the bounded-degree assumption of these results, in sparse graphs where the maximum degree is not the right parameter to capture the density of the graph.

As a running example we will use the problem of sampling (weighted) independent sets, also known as the sampling problem from the hard-core model. For a graph , the hard-core model with parameter specifies a distribution on the set of independent sets of , where for an independent set it holds that where

is the partition function of the model (the normalising factor that makes the probabilities add up to 

). For bounded-degree graphs of maximum degree (where is an integer), it is known that the problems of sampling and approximately counting from this model undergo a computational transition at , the so-called uniqueness threshold [Wei06, Sly10, GGŠ14]: they are poly-time solvable when , and computationally intractable for . Despite this clear complexity picture, prior to the introduction of spectral independence, the algorithms for were based on elaborate enumeration techniques whose running times scale as  [Wei06, LLY13, PR17, PR19]. The analysis of Glauber dynamics111Recall, for a graph , the Glauber dynamics for the hard-core model iteratively maintains a random independent set , where at each step a vertex is chosed u.a.r. and, if is independent, it sets with probability , otherwise . The mixing time is the maximum number (over the starting ) of steps needed to get within total variation distance 1/4 of , see Section 4.1 for the precise definitions. using spectral independence in the regime yielded initially algorithms for any [ALG20], and then for bounded-degree graphs [CLV21] (see also [CLV20]). More recently, Chen, Feng, Yin, and Zhang [CFYX21] obtained results for arbitrary graphs that apply when , where is the maximum degree of (see also [JPV21] for related results when grows like ).

The maximum degree is frequently a bad measure of the density of the graph, especially for graphs with unbounded-degree. One of the most canonical examples is the random graph where the maximum degree grows with but the average degree is , and therefore one would hope to be able to sample from for up to some constant, instead of that the previous results yield. In this direction, [SSY13, SSŠY17] obtained an algorithm based on correlation decay that applies to all for all graphs with “connective constant” bounded by (meaning, roughly, that for all the number of length- paths starting from any vertex is bounded by ). The result of [SSŠY17] applies to for all . In terms of Glauber dynamics on , [MS13] showed an lower bound on the mixing time in the case of the Ising model; this lower bound actually applies to most well-known models, and in particular rules out mixing time results for the hard-core model when . The mixing-time lower bound on has only been matched by complementary fast mixing results in models with strong monotonicity properties, see [MS13] for the ferromagnetic Ising model and [BG21] for the random-cluster model. Such monotonicity properties unfortunately do not hold for the hard-core model, and the best known results [Eft14, EHŠV18] for Glauber dynamics on give an algorithm for and sufficiently large (where is a constant depending on ).

Our main contribution is to obtain algorithms on , for all of the models considered in [SSŠY17], i.e., the hard-core model, the matchings model, and the antiferromagnetic Ising model. Key to our results are new spectral independence bounds for any in the regime for arbitrary graphs in terms of their “-branching value” (which resembles the connective-constant notion of [SSŠY17]). To state our main theorem for the hard-core model on , we first extend the definition of to all reals by setting for , and for . We use the term “whp over the choice of ” as a shorthand for “as grows large, with probability over the choice of ”. An -sample from a distribution supported on a finite set is a random whose distribution satisfies , where .

Theorem 1.

Let be such that . For any arbitrarily small constant , there is an algorithm such that, whp over the choice of , when the algorithm is given as input the graph and an arbitrary rational , it outputs an -sample from in time .

We remark here that the algorithm of Theorem 1 (as well as Theorems 2 and 3 below) can also recognise in time whether the graph is a “good” graph, i.e., we can formulate graph properties that guarantee the success of the algorithm, are satisfied whp, and are also efficiently verifiable, see Section C.4 in the appendix for details.

The key to obtaining Theorem 1 is to bound the spectral independence of . The main strategy that has been applied so far to bound spectral independence is to adapt suitably correlation decay arguments and, therefore, it is tempting to use the correlation decay analysis of [SSŠY17]. This poses new challenges in our setting since [SSŠY17] uses an -norm analysis of correlation decay on trees, and the non-linearity of -norms is an obstacle to converting their analysis into spectral independence bounds (in contrast, for bounded-degree graphs, the -norm is used which can be converted to spectral independence bounds using a purely analytic approach, see [CLV20]). Our solution to work around that is to “linearise” the -analysis by taking into account the structural properties of subtrees. This allows us to amortise over the tree-recurrence using appropriate combinatorial information (the -branching values) and to bound subsequently spectral independence; details are given in Section 3, see Lemmas 10 and 12 (and equation (2) that is at the heart of the argument). Once the spectral independence bound is in place, further care is needed to obtain the fast running time of , paying special attention to the distribution of high-degree vertices inside and to blend this with the entropy-decay tools developed in [CLV21], see Section 4.2 for this part of the argument.

In addition to our result for the hard-core model, we also obtain similar results for the Ising and the matchings models. The configurations of the Ising model on a graph are assignments which assign the spins  and  to the vertices of . The Ising model with parameter corresponds to a distribution on , where for an assignment , it holds that where is the number of edges whose endpoints have the same spin assignment under , and is the partition function of the model. The model is antiferromagnetic when , and ferromagnetic otherwise. For , let ; for , let . It is known that on bounded-degree graphs of maximum degree

the sampling/counting problem for the antiferromagnetic Ising model undergoes a phase transition at

, analogous to that for the hard-core model [SST12, LLY13, SS12, GŠV16].

Theorem 2.

Let be such that . For any constant , there is an algorithm such that, whp over the choice of , when the algorithm is given as input the graph and an arbitrary rational , it outputs an -sample from in time .

For a graph , the matchings model with parameter , also known as the monomer-dimer model, corresponds to a distribution on the set of matchings of , where for a matching , it holds that where is the partition function. For general graphs , [Jer03, JS89] gave an algorithm (where ), which was improved for bounded-degree graphs to in [CLV21] using spectral independence. For , [SSŠY17] gave an deterministic algorithm using correlation decay, and [JPV21] showed that Glauber dynamics mixes in steps in the case that .

Theorem 3.

Let . For any constant , there is an algorithm such that, whp over the choice of , when the algorithm is given as input the graph and an arbitrary rational outputs an -sample from in time .

In the next section, we give the main ingredients of our algorithm for the hard-core model and we give the proof of Theorem 1. The proofs of Theorems 2 and 3 build on similar ideas, though there are some modifications needed to obtain the required spectral independence bounds. We give their proofs in Section B.3.

2 Proof outline for Theorem 1

Our algorithm for sampling from the hard-core model on a graph is an adaptation of Glauber dynamics on an appropriate set of “small-degree” vertices , the details of the algorithm are given in Figure 1. Henceforth, analogously to the Ising model, it will be convenient to view the hard-core model as a 2-spin model supported on , where corresponds to the set of independent sets of (for an independent set , we obtain by setting iff ).

Algorithm Sample  

Parameters: (threshold for small/high degree vertices).

Input:    Graph , integer (number of iterations).

1. Initialisation:

Let be the set of all vertices with degree .

Let be the empty independent set on .

2. Main loop:

For ,

  • Pick a vertex uniformly at random from .

  • For every vertex , set .

  • Sample the spin according to , i.e., update according to the hard-core distribution on the whole graph , conditioned on the spins of .

3. Finalisation:

Sample , i.e., extend to the whole vertex set of by sampling from conditioned on the configuration on .

Figure 1: The Sample subroutine for sampling from the hard-core distribution . We use the analogue of this algorithm for the Ising model with parameter (replacing by ). For the monomer-dimer model, the only difference is that the algorithm needs to update (single) edges in , where is the set of vertices whose both endpoints lie in (i.e., degree ).

Note that for general graphs , implementing Steps 2 and Steps 3 of the algorithm might be difficult. The following lemma exploits the sparse structure of and in particular the fact that high-degree vertices are sparsely scattered. We will use this in the proof of our main theorems to show that the algorithm Sample can be implemented very efficiently for appropriate , paying only per loop operation in Step 2 and only in Step 3. The tree-excess of a graph is defined as .

Lemma 4.

Let be an arbitrary real. There exist constants such that the following holds whp over the choice of . Each of the connected components of , where is the set of vertices of degree , has size and tree-excess at most .

Lemma 4 follows using relatively standard techniques from random graphs and is proved in Section C of the appendix. Later, we will establish a more refined version of this property that will allow us to bound the mixing time of the single-site dynamics that we consider (the main loop of Sample).

The key ingredient needed to prove our main result is to show that the main loop of our sampling algorithm returns a good sample on the induced hard-core distribution on the set . More precisely, for a graph and , we let denote the induced distribution on the spins of , i.e., the marginal distribution .

Lemma 5.

Let be constants such that . For any arbitrarily small constant , there is such that the following holds whp over the choice of .

Let be the set of vertices in of degree . Then, for any , for , the main loop of Sample returns a sample from a distribution which is -close to .

We will prove Lemma 5 in Section 4.2. With these two lemmas we are ready to prove Theorem 1.

Proof of Theorem 1.

We give first the details for the more interesting case . Consider arbitrarily small and as in Lemmas 4 and 5, so that whp satisfies the properties therein. Let be the desired accuracy for sampling from ; it is sufficient to consider . Let be the set of vertices with degree , and set .

By Lemma 5, whp over the choice of , the main loop of Sample returns a configuration that is -close to . Note that each iteration of the main loop of Sample can be implemented in time since has components of size and tree excess at most . In particular, any vertex can be adjacent to at most of these components, and therefore the component of in has size and tree excess at most . We can therefore sample the spin of under conditioned on the spins of in time .222One “naive” way to do this is by considering a spanning tree and then brute-forcing over all possibilities for the endpoints of the excess edges (the spins on each edge can be set in at most 4 ways). For each of these, the marginal probability at and the corresponding partition function can be computed using dynamic programming on the left-over tree. Therefore, the main loop of Sample runs in time . Analogously, the finalisation step of Sample, i.e., extending the configuration on to a configuration on the whole vertex set , can be implemented in time by iterating over the vertices in and using the fact that the components of have excess at most . Therefore, the overall running time of the algorithm is bounded by , which is less than for all sufficiently large . It remains to note that, since is -close to the marginal distribution of on , and the finalisation step is done perfectly conditioned on the configuration on , the final configuration is -close to the distribution .

For , whp consists of tree-like components of size , and therefore we can obtain a perfect sample from in time by going through the vertices one by one and, for each vertex, taking time to compute its marginal, conditioned on the spins already sampled. ∎

3 Spectral independence via branching values

We first introduce the notions of spectral independence and pairwise vertex influences, which we will later use to bound the mixing time of the main loop of Sample, i.e., to prove Lemma 5. We will define the terminology in a general way that will be useful both for our analysis of the hard-core model, and for our later analysis of other models.

Let be an integer indicating the number of spins and let be a set of size . We will consider distributions supported on a set .333For an integer , we denote by the set . For , let be the set of all partial configurations on that have non-zero marginal under . For , let be the conditional distribution on induced by , i.e., . Let .

For and , the influence matrix conditioned on is the matrix whose rows and columns are indexed by , where the entry indexed by equals if

, and 0 otherwise. It is a standard fact that the eigenvalues of the matrix

are all real ([ALG20]), and we denote by its largest eigenvalue.

Definition 6.

Let be an integer and be a set of size . Let be a distribution supported over . Let . We say that is -spectrally independent if for all and , it holds that . We say that is -marginally bounded if for all , , , and , it either holds that or else .

Following [ALG20, CLV20], for distributions induced by 2-spin systems, we work with the following notion of pairwise vertex-influence, which can be used to bound the spectral independence. For a graph and for some , for vertices with and , we define the influence of on (under ) as

For matchings, we will work with an analogous notion from the perspective of edges (see Section B.2). For all these models, spectral independence will be bounded by summing the absolute value of the influences of an arbitrary vertex to the rest of the graph.

In turn, it has been shown in [CLV20] that summing the influences of a vertex in a graph reduces to summing the sum of influences on the self-avoiding walk tree emanating from , see Lemma 22 in the appendix. Therefore, we only need to focus on trees arising as self-avoiding walk trees.

3.1 The branching value

We will need the following notion to capture the growth of the self-avoiding walk tree from a vertex.

Definition 7.

Let be a real number and be a graph. For a vertex in , the -branching value equals , where is the number of (simple) paths with a total of vertices starting from (for convenience, we set ).

We will show the following lemma in Section C.1 which bounds the -branching value of for any .

Lemma 8.

Let . Then, for every and , whp over the choice of , the -branching value of every vertex in is at most .

3.2 Spectral independence for the hard-core model

In this section, we bound the spectral independence of in the hard-core model when . We will need the following technical lemma that can be derived from [SSŠY17]. The derivation details are similar to an analogous lemma for matchings (cf. Lemma 26 below), which can be found in [BGGŠ21, Lemma 15].

Lemma 9 ([Ssšy17]).

Let and be constants such that . Let be given from and set . Consider also the function for . Then, there is a constant such that the following holds for any integer .

Let be real numbers and . Then .

We will show the following.

Lemma 10.

Let and be constants such that . Then, there is a constant such that the following holds.

Let be a tree rooted at , whose -branching value is and which has children. Then, for the hard-core distribution on with parameter , any and with , it holds that

where is a real depending only on the degree of the root (and the constants ).

Proof.

Let and be the constants from Lemma 9, and be also as in Lemma 9.

We may assume without loss of generality that is empty (and is trivial) by truncating the tree using the following procedure: just remove vertices with , and for with remove and all of its neighbours. Note that for all the removed vertices it holds that , so the removal procedure does not decrease the sum of the absolute influences, while at the same time decreasing the -branching value of the tree . Henceforth, we will drop and from notation.

To prove the lemma, we will work inductively on the depth of the tree. To this end, we first define for each vertex in the following values and ; the ’s capture a rooted analogue of the branching value of internal vertices within , while the ’s the marginals of the vertices in the corresponding subtrees. More precisely, if is a leaf, set and ; otherwise set and , where are the children of . Note that for the root we have that , where is the -branching value of in the tree . Moreover, if we denote by the subtree of rooted at and by the parent of in , then it holds that

(1)

The first equality is fairly standard and can be proved using induction on the height of the tree, while the second one is [CLV20, Lemma 15] (it also follows directly from the definition of influence and the first equality).

For an integer , let be the nodes at distance from the root . Let , where recall that is the degree of the root . We will show that

(2)

Since for , and , (2) yields for all integer , and therefore summing over , we obtain that

which proves the result with . So it only remains to prove (2).

We will work inductively. The base case is equivalent to , which is true since from the recursion for we have that . For the induction step, consider and suppose it has children, denoted by for . Then, for each , since is on the unique path joining to , it holds that (see [ALG20, Lemma B.2])

so we can write

(3)

Consider an arbitrary . Then, since , by Hölder’s inequality we have that

(4)

Note that for and , , we have from (1) that and , so by Lemma 9 we have that

By definition of the -branching value we also have , so plugging these back into (4) yields

In turn, plugging this into (3) and using the induction hypothesis yields (2), finishing the proof. ∎

Remark 11.

For simplicity, and since it is not important for our arguments, the constant in the proof depends exponentially on the degree of the root. With a more careful inductive proof (cf. [CLV20, Proof of Lemma 14]), the dependence on can be made linear. In either case, because of the high-degree vertices in , both bounds do not yield sufficiently strong bounds on the spectral independence of the whole distribution , and this is one of the reasons that we have to consider the spectral independence on the induced distribution on low-degree vertices.

Recall that for a graph and , we let denote the marginal distribution on the spins of , i.e., the distribution .

Lemma 12.

Let and be constants such that . Then, for any constants , whp over the choice of , the marginal hard-core distribution , where is the set of vertices in with degree , is -spectrally independent.

Proof.

Let be arbitrary constants, and let be such that ; such exists because the function is continuous in the interval and for . Let and where and the ’s are as in Lemma 10 (corresponding to the constants ). By Lemma 8, whp all of the vertices the graph have -branching value less than . We will show that the result holds for all such graphs .

Let be the set of vertices in with degree , and let for convenience . Consider arbitrary and . It suffices to bound the largest eigenvalue of the influence matrix by . Analogously to [ALG20, CLV20], we do this by bounding the absolute-value row sums of . Recall that the rows and columns of are indexed by , where the entry indexed by equals if , and 0 otherwise. Consider arbitrary ; our goal is to show

(5)

Henceforth, we will also assume that (in addition to

), otherwise the sum on the l.h.s. is equal to 0. Then, by the law of total probability, for any

we have

where the last equality follows from the fact that is the marginal distribution of on . Therefore, we can bound

By Lemma 22 of the appendix, for the self-avoiding walk tree from , there is a subset and a configuration such that

where denotes the influence of on the vertices of (in the hard-core distribution conditioned on ). Since the -branching value of (and any other vertex of ) is bounded by and the degree of is , by Lemma 10 applied to , we have that

Since , for all sufficiently large we have that , which proves (5). ∎

We also record the following corollary of the arguments in Lemma 10.

Corollary 13.

Let and be real numbers. For a graph , let be the set of vertices in with degree and suppose that . Then, the distribution is -marginally bounded for .

Proof.

By Lemma 22 in the appendix, for any vertex and any boundary condition on (a subset of) , there is a corresponding tree and a boundary condition on such that . Since has degree , from the proof of Lemma 10, see in particular equation (1), we have that , where is as in the lemma statement. ∎

4 Entropy factorisation for bounded-degree vertices

In this section, we show how to convert the spectral independence results of the previous section into fast mixing results for Glauber dynamics on the set of small-degree vertices on . Our strategy here follows the technique of [CLV21], though to obtain results we have to pay attention to the connected components induced by high-degree vertices and how these can connect up small-degree vertices.

4.1 Preliminaries

Entropy factorisation for probability distributions.

For a real function on , we use for the expectation of with respect to and, for , , with the convention that . Finally, for , let i.e., is the expected value of the conditional entropy of when the assignment outside of is chosen according to the marginal distribution (the induced distribution of on ). For convenience, when , we define

. The following inequality of entropy under tensor product is a special case of Shearer’s inequalities.

Fact 14.

Let be integers and suppose that, for , is a distribution supported over , where are pairwise disjoint sets. Let be the product distribution on . Then, for any , it holds that .

To bound the mixing time of Markov chains such as the Glauber dynamics, we will be interested in establishing inequalities for factorisation of entropy, defined as follows.

Definition 15.

Let , be integers and be a set of size . Let be a distribution supported over . We say that satisfies the -uniform-block factorisation of entropy with multiplier444We note that in related works is usually referred to as the “factorisation constant”; we deviate from this terminology since for us will depend on (cf. Corollary 19 and Lemma 21), and referring to it as a constant could cause confusion. if for all it holds that .

The following lemma will be useful to bound the (-uniform-block) factorisation multiplier for conditional distributions on sets with small cardinality.

Lemma 16 ([Clv21, Lemma 4.2]).

Let be an integer and be a set of size . Let be a distribution supported over which is -marginally bounded for some . Then, for any and , for , it holds that .

The -uniform-block Glauber dynamics and its mixing time.

For an integer , the -uniform-block Glauber dynamics for is a Markov chain where is an arbitrary configuration and, for , is obtained from by first picking a subset of size uniformly at random and updating the configuration on according to

For , the mixing time of the -uniform-block Glauber dynamics is defined as , where denotes the distribution of . Note, the case corresponds to the single-site dynamics, where at every step the spin of a single vertex, chosen u.a.r., is updated conditioned on the spins of the remaining vertices.

Lemma 17 (See, e.g., [Clv21, Lemma 2.6 & Fact 3.5(4)] or [Che21, Lemma 3.2.6 & Fact 3.4.2]).

Let , be integers and be a set of size . Let be a distribution supported over that satisfies the -uniform-block factorisation of entropy with multiplier . Then, for any , the mixing time of the -uniform-block Glauber dynamics on satisfies