There is a well-known bijection between matchings of a graph and independent sets in the line graph of . We will show that that we can approximate the number of independent sets in graphs for which all bipartite induced subgraphs are well structured, in a sense that we will define precisely. Our approach is to generalise the Markov chain analysis of Jerrum and Sinclair  for the corresponding problem of counting matchings.
The canonical path argument given by Jerrum and Sinclair in  relied on the fact that the symmetric difference of two matchings of a given graph is a bipartite subgraph of consisting of a union of paths and even-length cycles. We introduce a new graph parameter, which we call bipartite pathwidth, to enable us to give the strongest generalisation of the approach of , beyond the class of line graphs.
1.1 Independent set problems
For a given graph , the independence number is the size of the largest independent set in . The problem of finding is NP-hard in general, even in various restricted cases, such as degree-bounded graphs. However, polynomial time algorithms have been constructed for computing , and finding an independent set such that , for various graph classes. The most important case has been matchings, which are independent sets in the line graph of . This has been generalised to larger classes of graphs, for example claw-free graphs , which include line graphs , and fork-free graphs , which include claw-free graphs.
Counting independent sets in graphs, determining , is known to be #P-complete in general , and in various restricted cases [21, 38]. Exact counting is known only for some restricted graph classes. Even approximate counting is NP-hard in general, and is unlikely to be in polynomial time for bipartite graphs . The relevance here of the optimisation results above is that proving NP-hardness of approximate counting is usually based on the hardness of some optimisation problem.
However, for some classes of graphs, for example line graphs, approximate counting is known to be possible [26, 27]. The most successful approach to the problem has been the Markov chain approach, which relies on a close correspondence between approximate counting and sampling uniformly at random . The Markov chain method was applied to degree-bounded graphs in  and . In his PhD thesis , Matthews used the Markov chain approach with a Markov chain for sampling independent sets in claw-free graphs. His chain, and its analysis, directly generalises that of .
Several other approaches to approximate counting have been successfully applied to the independent set problem. Weitz  used the correlation decay approach on degree-bounded graphs, resulting in a deterministic polynomial time approximation algorithm (an FPTAS) for counting independent sets in graphs with degree at most 5. Sly  gave a matching NP-hardness result. The correlation decay method was also applied to matchings in , and was extended to complex values of in . Recently, Efthymiou et al.  proved that the Markov chain approach can (almost) produce the best results obtainable by other methods.
The independence polynomial of a graph is defined in (1.1) below. The Taylor series approach of Barvinok  was used by Patel and Regts  to give a FPTAS for in claw-free graphs. The success of the method depends on the location of the roots of the independence polynomial, Chudnovsky and Seymour  proved that all these roots are real, and hence they are all negative. Hence the algorithm of  is valid for all complex which are not real and negative.
In this paper, we return to the Markov chain approach, providing a broad generalisation of the methods of . In Section 3 we define a graph parameter which we call bipartite pathwidth, and the class of graphs with bipartite pathwidth at most . The Markov chain which we analyse is the well-known Glauber dynamics. We now state our main result, which gives a bound on the mixing time of the Glauber dynamics for graphs of bounded bipartite pathwidth.
Let and . Then the Glauber dynamics with fugacity on (and initial state ) has mixing time
When is constant, this upper bound is polynomial in and .
The plan of the paper is as follows. In Section 2, we define the necessary Markov chain background and define the Glauber dynamics. In Section 4, we develop the concept of bipartite pathwidth, and use it to determine canonical paths for independent sets. In Section 5, we introduce some graph classes which have bounded bipartite pathwidth. These classes, like the class of claw-free graphs, are defined by excluded induced subgraphs.
We write for any positive integer . and let denote the symmetric difference of sets .
For graph theoretic definitions not given here, see [9, 13]. Throughout this paper, all graphs are simple and undirected. The term “induced subgraph” will mean a vertex-induced subgraph, and the subgraph of induced by the set will be denoted by .
Given a graph , we write for the set of independent sets in , and the set of independent sets of of size . The independence polynomial of is the partition function
where for . Here is called the fugacity. In this paper, we consider only nonnegative real , We have , and for . Thus it follows that for any ,
Note also that and .
An almost uniform sampler
for a probability distributionon a state is a randomised algorithm which takes as input a real number and outputs a sample from a distribution such that the total variation distance is at most . The sampler is a fully polynomial almost uniform sampler (FPAUS) if its running time is polynomial in the input size and . The word “uniform” here is historical, as it was first used in the case where
is the uniform distribution. We use it in a more general setting.
If is a weight function, then the Gibbs distribution satisfies for all , where . If for all then is uniform. For independent sets with , the Gibbs distribution satisfies
and is often called the hardcore distribution. Jerrum, Valiant and Vazirani  showed that approximating is equivalent to the existence of an FPAUS for , provided the problem is self-reducible. Counting independent sets in a graph is a self-reducible problem.
2 Markov chains
For additional information on Markov chains and approximate counting, see for example . In this section we provide some necessary definitions and then define a simple Markov chain on the set of independent sets in a graph.
2.1 Mixing time
Consider a Markov chain on state space with stationary distribution and transition matrix . Let be the distribution of the chain after steps. We will assume that is the distribution which assigns probability 1 to a fixed initial state . The mixing time of the Markov chain, with initial state , is
is the total variation distance between and .
2.2 Canonical paths method
To bound the mixing time of our Markov chain we will apply the canonical paths method of Jerrum and Sinclair . This may be summarised as follows.
Let the problem size be (in our setting, is the number of vertices in the graph and ). For each pair of states we must define a path from to ,
such that successive pairs along the path are given by a transition of the Markov chain. Write for the length of the path , and let . We require to be at most polynomial in . This is usually easy to achieve, but the must have the following, more demanding property
For any transition of the chain there must exist an encoding , such that, given and , there are at most distinct possibilities for and such that . That is, each transition of the chain can lie on at most canonical paths, where is some set which contains all possible encodings. We usually require to be polynomial in . It is common to refer to the additional information provided by as “guesses”, and we will do so here. In our situation, all encodings will be independent sets, so we may assume that . Furthermore, independent sets are weighted by , so we will need to perform a weighted sum over our “guesses”. See the proof of Theorem 1.1 in Section 4.
The congestion of the chosen set of paths is given by
where the maximum is taken over all pairs with and (that is, over all transitions of the chain), and the sum is over all paths containing the transition .
A bound on the relaxation time will follow from a bound on congestion, using Sinclair’s result [35, Cor. 6]:
2.3 Glauber dynamics
The Markov chain we employ will be the Glauber dynamics on state space . In fact, we will consider a weighted version of this chain, for a given value of the fugacity (also called activity) . Define for all , where is the independence polynomial defined in (1.1). A transition from to will be as follows. Choose a vertex of uniformly at random.
If then with probability .
If and then with probability .
This Markov chain is irreducible and aperiodic, and satisfies the detailed balance equations
for all . Therefore, the Gibbs distribution is the stationary distribution of the chain. Indeed, if is obtained from by deleting a vertex then
The unweighted version is given by setting , and has uniform stationary distribution. Since the analysis for general is hardly any more complicated than that for , we will work with the weighted case.
It follows from the transition procedure that for all states . That is, every state has a self-loop probability of at least this value. Using a result of Diaconis and Saloff-Coste [11, p.702]
, we conclude that the smallest eigenvalueof satisfies
which is constant for a given . We will always use the initial state , since for any graph .
In order to bound the relaxation time we will use the canonical path method. A key observation is that for any , the induced subgraph of is bipartite. This can easily be seen by colouring vertices in black and vertices in white, and observing that no edge in can connect vertices of the same colour. To exploit this observation, we introduce the bipartite pathwidth of a graph in Section 3. In Section 4 we show how to use the bipartite pathwidth to construct canonical paths for independent sets, and analyse the congestion of this set of paths to prove our main result, Theorem 1.1.
3 Pathwidth and bipartite pathwidth
The pathwidth of a graph was defined by Robertson and Seymour , and has proved a very useful notion in graph theory. See, for example, [7, 13]. A path decomposition of a graph is a sequence of subsets of such that
for every there is some such that ,
for every there is some such that , and
for every the set forms an interval in .
The width and length of this path decomposition are
and the pathwidth of a given graph is
where the minimum taken over all path decompositions of .
Condition 3 is equivalent to for all , and with . If we refer to a bag with index then by default .
For example, the bipartite graph in Fig. 1 has a path decomposition with the following bags:
This path decomposition has width 3 and length 7.
The following result is useful for bounding the pathwidth.
Let be a subgraph of a graph (not necessarily an induced subgraph). Then . Furthermore, if then .
Let and . Since is a subgraph of we have and . Let be a path decomposition of width for . Then is a path decomposition of , and its width is at most .
Now given , let and consider the induced subgraph . We show that , as follows. Let be a path decomposition of width for . Then is a path decomposition of , and its width is . This concludes the proof, as . ∎
In particular, if is a path, is a cycle and is a complete bipartite graph then it is easy to show that
3.1 Bipartite pathwidth
We now define the bipartite pathwidth of a graph to be the maximum pathwidth of an induced subgraph of that is bipartite. For any positive integer , let be the class of graphs of bipartite pathwidth at most . Lemma 5.1 below implies that claw-free graphs are contained in , for example. Note that is a hereditary class, by Lemma 3.1.
Clearly , but the bipartite pathwidth of may be much smaller than its pathwidth. For example, consider the complete graph . Now , since it is easy to see that maximum clique size is a lower bound on pathwidth. On the other hand, the largest bipartite induced subgraphs of are its edges, which are all isomorphic to . Thus the bipartite pathwidth of is .
A more general example is the class of unit interval graphs. These may have cliques of arbitrary size, and hence arbitrary pathwidth. However they are claw-free, so can have bipartite pathwidth at most 2 from Lemma 5.2.
We also note the following.
Let be a positive integer.
Every graph with at most vertices belongs to .
No element of can contain as an induced subgraph.
3.2 Some preliminary remarks on path decompositions
We say that a path decomposition is good if, for all , neither nor holds. Every path decomposition of can be transformed into a good one by leaving out any bag which is contained in another.
Every graph has a good path decomposition such that every bag is either a singleton or contains an edge in .
Let be a good path decomposition of . Since is good, none of its bags is empty, but we will define and .
For a contradiction, suppose that bag is an independent set with , for some . If contains a vertex then we can remove from and the result is still a path decomposition of . If contains a vertex then we can transform by removing from and adding a new bag to the path decomposition. (In this case, is an isolated vertex in .)
In the remaining case we have . If then this implies that , contradicting the fact that is good. If then this implies that , which also contradicts the fact that is good. Hence all independent bags of are singletons. ∎
It will be useful to define a partial order on path decompositions. Given a fixed linear order on the vertex set of a graph , we may extend to subsets of as follows: if then if and only if (a) ; or (b) and the smallest element of belongs to .
Next, given two path decompositions and of , we say that if and only if (a) ; or (b) and , where .
4 Canonical paths for independent sets
We now construct canonical paths for the Glauber dynamics on independent sets of graphs with bounded bipartite pathwidth.
Suppose that , so that . Take and let be the connected components of , ordered in lexicographical order. As already observed, the graph is bipartite, so every component is connected and bipartite. We will define a canonical path from to . by processing the components in order.
Let be the component of which we are currently processing, and suppose that after processing , we have a partial canonical path
If then .
The encoding for is defined by
In particular, when we have . We remark that (4.1) will not hold during the processing of a component, but always holds immediately after the processing of a component is complete. Because we process components one-by-one, in order, and due to the definition of the encoding , we have
We now describe how to extend this partial canonical path by processing the component . Let . We will define a sequence
of independent sets, and a corresponding sequence
of encodings, such that
for . Define the set of “remembered vertices”
for . By definition, the triple satisfies
This immediately implies that for .
We use a path decomposition of to guide our construction of the canonical path. Let be the lexicographically-least good path decomposition of . Here we use the ordering on path decompositions defined in Section 3.2. Since , the maximum bag size in is . As usual, we assume that .
We process by processing the bags in order. Initially , by (4.1). Because we process the bags one-by-one, in order, if bag is currently being processed and the current independent set is and the current encoding is , then
It remains to describe how to process the bag , for . Let , , denote the current independent set, encoding and set of remembered vertices, immediately after the processing of bag . When we have and in particular, .
Preprocessing: We “forget” the vertices of and add them to .
This does not change the current independent set or add to the canonical path.
For each , in lexicographical order, do
if then , ,
otherwise , .
For each , in lexicographic order, do
Observe that both and are independent sets at every step. This is true initially (when ) and remains true by construction. Indeed, the preprocessing phases removes all vertices of from , which makes more room for other vertices to be inserted into the encoding later. The deletion steps shrink the current independent set and add each removed vertex into or . A deleted vertex is only added to if it belongs to . Finally, in the insertion steps we add vertices from to , and after we have made room. Here is the last bag which contains the vertex being inserted into the independent set, so any neighbour of this vertex in has already been deleted from the current independent set. This phase can only shrink the encoding .
Also observe that (4.4) holds for at every point. Finally, by construction we have at all times.
To give an example of the canonical path construction, we return to the bipartite graph shown in Figure 1, which we now treat as the symmetric difference of two independent sets. Let be the set of vertices which are coloured blue in Figure 1 and let be the remaining vertices (coloured red in Figure 1). Table 1 illustrates the 10 steps of the canonical path (3 steps to process bag , none to process bag , 2 steps to process bag , and so on). In Table 1, blue vertices belong to the current independent set and red vertices belong to the current encoding . We only show the vertices of the bag which is currently being processed, as we can use (4.5) for all other vertices. The white vertices are precisely those which belong to . The column headed “before processing ” shows the situation directly after the preprocessing step, where elements of have been removed from the current encoding and added to , to be remembered. This does not count as a step of the canonical path as the current independent set does not change. At the end of processing, all vertices of are red (belong to the final encoding ) and all vertices of are blue (belong to the final independent set ), as expected.
|before processing||after 1st step||after 2nd step||after 3rd step|
4.1 Analysis of the canonical paths
Each step of the canonical path changes the current independent set by inserting or deleting exactly one element of . Every vertex of is removed from the current independent set at some point, and is never re-inserted, while every vertex of is inserted into the current independent set once, and is never removed. Vertices in (respectively ) are never altered, and belong to all (respectively, none) of the independent sets in the canonical path. Therefore
Next we provide an upper bound for the number of vertices we need to remember at any particular step.
At any transition which occurs during the processing of bag , the set of remembered vertices satisfies , with unless . In this case , which gives , and for some .
By construction, the set of remembered vertices satisfies throughout the processing of bag . Hence . Now is a good path decomposition, and so , which implies that . Therefore, whenever we have .
Next suppose that . By definition, this means that , so the transition is an insertion step which inserts some vertex of . ∎
Now we establish the unique reconstruction property of the canonical paths, given the encoding and set of remembered vertices.
Given a transition , the encoding of and the set of remembered vertices, we can uniquely reconstruct with .
By construction, (4.4) holds. This identifies all vertices in and uniquely. It also identifies the connected components of , and it remains to decide, for all vertices in , whether they belong to or .
Next, the transition either inserts or deletes some vertex . This uniquely determines the connected component of which contains . We can use (4.2) to identify and for all . It remains to decide which vertices of belong to and which belong to .
Let be the lexicographically-least good path decomposition of , which is well-defined. If (insertion) then and we are processing the last bag which contains . If then and we are processing the first bag which contains . Hence we can uniquely identify the bag which is currently being processed. We know that bags have already been processed, and bags have not yet been processed. So (4.5) holds, which uniquely determines and outside .
Finally, for every vertex , there is a path in from to (the vertex which was inserted or deleted in the given transition). Since is bipartite and connected, and we have decided for all vertices outside whether they belong to or , it follows that we can uniquely reconstruct all of and . This completes the proof. ∎
We are now able to prove our main theorem, which is restated below. [CSG: Do we really want to restate it? We could just refer back.] [CSG: The calculations have changed slightly, because the size of the set of remembered vertices affects the powers of which are present.]
For a given set , let denote the set of all subsets of with at most elements. Let be a given transition of the Glauber dynamics. To bound the congestion of the transition we must sum over all possible encodings and all possible sets of remembered vertices. Here is disjoint from and in almost all cases , by Lemma 4.1. In the exceptional case we have but we also know the identity of a vertex , since is the vertex inserted in the transition . Therefore in all cases, we only need to “guess” (choose) at most vertices for , from a subset of at most vertices.
To see this, note that
so long as . Then (2.1) gives
4.2 Graphs with large complete bipartite subgraphs
In Lemma 3.2 we observed that if a graph contains as an induced subgraph then its pathwidth is at least . Thus our argument does not guarantee rapid mixing for any graph which contains a large induced complete bipartite subgraph. In this section we show that the absence of large induced complete bipartite subgraphs appears to be a necessary condition for rapid mixing.
Suppose that the graph consists of disjoint induced copies of . So . The state space of independent sets in has . The mixing time for the Glauber dynamics on is clearly at least times the mixing time on .
Now consider the with vertex bipartition . The state space of independent sets of comprises two sets and . Now , , and . It follows that the conductance of the Glauber dynamics on is , and so . (See  for the definition of conductance.) If , where as , then , so the Glauber dynamics is not an FPAUS.
Note that, if then the Glauber dynamics has quasipolynomial mixing time, from Theorem 1.1, whereas our lower bound remains polynomial. Our techniques are insufficient to distinguish between polynomial and quasipolynomial mixing times.
5 Recognisable subclasses of
Theorem 1.1 shows that the Glauber dynamics for independent sets is rapidly mixing for any graph in the class , where is a fixed positive integer. However, it is not clear a priori which graphs belong to , and the complexity of recognising membership in the class is unknown. Therefore, we consider here three (hereditary) classes of graphs which are determined by small excluded subgraphs. These classes clearly have polynomial time recognition, though we will not be concerned with the efficiency of this. Note that, in view of Section 4.2, we must always explicitly exclude large complete bipartite subgraphs, where this is not already implied by the other excluded subgraphs.
The three classes we will consider are nested. The third includes the second, which includes the first. However, we will obtain better bounds for pathwidth in the smaller classes, and hence better mixing time bounds in Theorem 1.1. Therefore we consider them separately. The first of these classes, claw-free graphs, was considered by Matthews  and forms the motivation for this work.
5.1 Claw-free graphs
Claw-free graphs exclude the following induced subgraph, the claw.
Claw-free graphs are important because they are a more simply characterised superclass of line graphs , and independent sets in line graphs are matchings.
For claw-free graphs, the key observation is as follows.
Let be a claw-free graph with independent sets . Then is a disjoint union of paths and cycles.
We know that is an induced bipartite subgraph of . Since is claw-free, any three neighbours of a given vertex must span at least one triangle (3-cycle). But this is impossible, since is bipartite. Hence every vertex in has degree at most 2, completing the proof. ∎
Claw-free graphs are a proper subclass of .
Since is a union of paths and even cycles, the Jerrum–Sinclair  canonical paths for perfect matchings can be used, and the Markov chain has polynomial mixing time. This was the idea employed by Matthews . Theorem 1.1 generalises his result to .
However, claw-free graphs have more structure than an arbitrary graph in , and this structure was exploited for matchings in . Note that when is claw-free, we can compute the size of the largest independent set in in polynomial time , just as we can compute the size of the largest matching .
Here we will strengthen and extend the results of  to all claw-free graphs. Our main extension is that we show how to more directly sample almost uniformly from for arbitrary
. Jerrum and Sinclair’s procedure is to estimatesuccessively for , which is extremely cumbersome. However, we should add that their main objective is to estimate , rather than to sample.
Hamidoune  proved that in a claw-free graph , the numbers of independent sets of size in forms a log-concave sequence. Chudnovsky and Seymour  showed that has real roots. If , let (), so . Clearly the polynomial also has real roots, as does the polynomial . Thus we can equivalently use the sequences , .
It follows from real-rootedness (see [8, Lemma 7.1.1] that is a log-concave sequence. From this we have
We use this to strengthen an inequality deduced in  for log-concave functions.
For any and ,
Let . For , the inequality is an equality. Then
To complete the proof,