Counting small subgraphs in multi-layer networks

10/24/2017
by   Jessica Enright, et al.
0

Motivated by the prevalence of multi-layer network structures in biological and social systems, we investigate the problem of counting the number of occurrences of (small) subgraphs or motifs in multi-layer graphs in which each layer of the graph has useful structural properties. Making use of existing meta-theorems, we focus on the parameterised complexity of motif-counting problems, giving conditions on the layers of a graph that yield fixed-parameter tractable algorithms for motif-counting in the overall graph. We give a dichotomy showing that, under some restricting assumptions, either the problem of counting the number of motifs is fixed-parameter tractable, or the corresponding decision problem is already W[1]-hard.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

03/09/2021

Exact and Approximate Pattern Counting in Degenerate Graphs: New Algorithms, Hardness Results, and Complexity Dichotomies

We study the problems of counting the homomorphisms, counting the copies...
08/19/2020

Simple Counting and Sampling Algorithms for Graphs with Bounded Pathwidth

In this paper, we consider the problem of counting and sampling structur...
09/08/2017

Obstructions to a small hyperbolicity in Helly graphs

It is known that for every graph G there exists the smallest Helly graph...
11/03/2021

Counting Small Induced Subgraphs with Hereditary Properties

We study the computational complexity of the problem #IndSub(Φ) of count...
03/19/2021

A systematic association of subgraph counts over a network

We associate all small subgraph counting problems with a systematic grap...
09/26/2017

A Parameterized View on Multi-Layer Cluster Editing

In classical Cluster Editing we seek to transform a given graph into a d...
10/04/2017

Note on "The Complexity of Counting Surjective Homomorphisms and Compactions"

Focke, Goldberg, and Živný (arXiv 2017) prove a complexity dichotomy for...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A multi-layer (or multiplex) network includes edges that may be qualitatively different, and describe different types of interaction: for example, different varieties of social interaction, or physical as compared to electronic contact [30]. The capacity of multi-layer networks to represent physical and social systems has made their study one of the leading areas of research in network science [30, 36, 13]. However, as yet only a few algorithmic results concerning well-studied graph problems have been adapted to exploit multi-layer inputs. Very recently, Brederek et. al. [4] obtained a detailed classification of the complexity of questions of the form “Is there a large set of vertices that induces a subgraph with some desired property in every layer?” in terms of the number of layers involved and the order of the desired subgraph. In this paper we aim to initiate a systematic investigation of of the complexity of classic graph problems in a specific multi-layer setting, when restrictions are placed on the structural properties of each layer.

We have chosen to begin by addressing the problem of counting small subgraphs or motifs in a large graph, due to the large number of applications for special cases of this problem that have previously been reported in the literature, in settings ranging from network security tools [21, 38, 39] to livestock epidemiology [3] to the analysis of biological networks [34]. Typically, the goal is to compare two networks or to monitor the evolution of a network over time by considering the numbers of specific motifs in the network. In this work, our primary focus is on a natural generalisation of the basic motif counting problem, in which the input may include restrictions on the permitted locations for certain vertices of the motif.

There are two ways in which the presence of multi-layer inputs might impact on algorithmic questions. First of all, we may seek to solve problems in which the layers appear explicitly in the statement of the problem, as in [4]. In the motif-counting setting we might, for example, require certain edges of the motif to belong to specific layers; the algorithms we describe in this paper can easily be adapted to solve layered subgraph counting problems of this kind. However, our focus is the more subtle question of if, and how, we can exploit knowledge about the structural properties of individual layers to solve problems in the “flattened” graph obtained by combining all the layers into one. Specifically, we address the following question.

Given graph classes which are monotone (closed under taking subgraphs) and on which the generalised motif counting problem (parameterised by the number of vertices in the motif) belongs to , under what conditions on is the problem in on multi-layer graphs composed of one layer from each of ?

We provide a complete answer to this question; moreover, we show that, in the cases in which the motif counting problem is intractable, even the decision version of the problem (“Does the graph contain at least one copy of the motif?”) is already -hard.

The rest of the paper is organised as follows. We begin in Sec. 1.1 by describing a motivating example, before summarising the key notation and definitions we will use in 1.2 and giving an overview of related work in Sec. 1.3. In Sec. 2 we prove our positive results, and discuss their applicability to a selection of real-world multi-layer inputs, before proving our hardness results in Sec. 3. We explore the relationship between the basic and generalised versions of the motif counting problem in Sec. 4.

1.1 A motivating example

Alongside the broad appeal of topic described above, our initial exploration of the multi-layer approach was inspired by a specific example from the agriculture industry. Agricultural systems provide a data-rich example of a multi-layered graph: for example, in a graph in which livestock-raising farms are the vertices, edges may be formed by the fundamentally planar layer derived from physical geographic farm adjacency, by long-distance trades of livestock, by shared machinery and personnel, etc. These graphs may be important for understanding the resilience of an industry to changes in the physical world, or to disease incursions [5].

In Great Britain (as elsewhere in the EU), cattle and sheep trading between farms and markets is recorded and reported to a central repository, as are geographic locations and adjacencies of farms [28]. There is significant evidence that both the long-distance animal trades and local geographic spread contribute to livestock disease in Britain, including the serious and economically-damaging 2001 outbreak of foot-and-mouth disease [25, 29]. Modelling these two types of contacts separately is a key feature of many successful models of livestock disease, including models used to understand and control foot-and-mouth disease [25, 22, 27, 29], bovine tuberculosis [6], blue-tongue virus [41], and the emerging Schmallenberg virus [40].

When considering these two main layers of the livestock contact system in Britain, it becomes immediately clear that they are very different, but that both have potentially useful characteristics. The geographically-local contact graph will necessarily be planar, and will have limited degree due to physical constraints: realistically-shaped pastures and farms can only neighbour a limited number of other farms, and cannot physically neighbour farms that are geographically far away. The long-range trading network depends on a relatively small number of markets that intermediate the majority of trades [28]. If we consider both agricultural holdings and markets as vertices in the trading graph with animal movements as edges, then we would expect the trading graph to have a small number of high-degree vertices: this is common for trading or contact networks, which often have power-law degree distributions [35].

1.2 Notation and definitions

We begin by giving formal descriptions of the subgraph counting problems we consider. An embedding of a graph into a graph is a mapping from to such that, whenever is an edge in , we have that is an edge in . The basic motif counting problem is formalised as follows.

-#Emb Input: Two graphs and . Parameter: . Question: How many embeddings are there of into ?

We refer to as the host graph and as the pattern graph. Our main focus is in fact on the following generalisation of -#Emb, in which each vertex of must map to a specific subset of (where these subsets are not necessarily disjoint for distinct vertices of ).

-#List-Emb Input: Graphs and , and subsets . Parameter: . Question: How many embeddings of into have the property that for each ?

Note that -#Emb can be regarded as a special case of -#List-Emb  when ; we investigate the relationship between the two problems in more detail in Section 4. We will also consider the corresponding decision problems, -Emb  and -List-Emb, which involve determining whether the answer to -#Emb  (respectively -#List-Emb) is non-zero.

We are interested in determining the circumstances under which any of these problems admits an FPT algorithm, that is, an algorithm running in time where is the total input size, is the parameter, is any (computable) function, and is a fixed constant that does not depend on . In order to demonstrate that a problem is unlikely to admit an FPT algorithm, it suffices to demonstrate that it is complete for the complexity class . For further background on the theory of parameterised complexity we refer the reader to [14, 18].

Note that the “hardest” of the problems introduced above is -#List-Emb, and the “easiest” is -Emb: the existence of an fpt-algorithm for -#List-Emb, when restricted to host graphs from the class , implies the existence of such an algorithm for the other three problems in the same setting, whereas if any of the four problems defined above admits an FPT algorithm when restricted to host graphs from then there must be an FPT algorithm for -Emb  under the same restriction. Thus, when proving tractability, the strongest result is to demonstrate the existence of an FPT algorithm for -#List-Emb, whereas the strongest hardness result is one for -Emb.

We now introduce the key graph theoretic terminology we will use. Given any graph , and a vertex , we write for the degree of in . A graph is -regular if every vertex in has degree exactly . Given a subset , we write for the subgraph of induced by , and for the subgraph obtained from by deleting all elements of . If , the distance between and in is the number of edges on a shortest path between and in . A star is a graph isomorphic to the complete bipartite graph for some . A star forest is an acyclic graph in which every connected component is a star.

A graph class is said to be monotone if it is closed under the deletion of both vertices and edges.

A set is a vertex cover for if is an independent set; the vertex cover number of is the cardinality of the smallest vertex cover for . We say that a class of graphs has bounded vertex cover number if there exists a constant such that every graph in has vertex cover number at most .

We say that a class of graphs has almost bounded degree if there exists some constant such that every element of contains a set , with , such that has maximum degree at most . Equivalently, has almost bounded degree if every element of has at most vertices of degree greater than .

Given two graph classes and , we write to denote the class of graphs of the form where and ; we will assume that an explicit partition of the edges is given for graphs belonging to ). For , we write for the class of graphs of the form , where for each .

Note that if have bounded degree (respectively bounded vertex cover number), then so does . However, the same cannot be said for some more complex graph parameters: for example, if is the class of acyclic graphs (which have treewidth ), contains all grids (as a grid can be obtained by combining two paths) and hence has unbounded treewidth.

1.3 Related work

There is a rich literature concerning the (parameterised) complexity of finding and counting specific small pattern graphs in a large host graph. Several of the problems introduced in the seminal paper by Flum and Grohe on parameterised counting complexity [17] are of this form, and very recently Curticapean, Dell and Marx [10] gave a dichotomy for the parameterised complexity of counting so-called network motif parameters, based on the structure of the motifs under consideration.

In this paper we focus on structural restrictions on the large host graph, while allowing arbitrary (small) motifs; the idea is to exploit the structure that is often present in real-world networks or the layers thereof. The most general results of this kind are corollaries to two celebrated meta-theorems on the complexity of counting problems in restricted classes of graphs. Note that -#List-Emb  can easily be expressed in first-order logic (and hence also in monadic second-order logic). We can therefore deduce the following results.

Theorem 1.1 (Follows from [19]).

-#List-Emb  is in when restricted to any class of graphs of bounded local treewidth.

Theorem 1.2 (Follows from [9]).

-#List-Emb  is in when restricted to any class of graphs of bounded cliquewidth.

The class of graphs of bounded local treewidth includes, among others, the classes of graphs of bounded treewidth, bounded genus, and bounded degree. We refer the reader to [20] for the formal definition of local treewidth. Our hardness results in Sec. 3 focus on monotone graph classes: this family of classes includes the class of graphs of bounded local treewidth (and the specific sub-classes mentioned above), but not for the class of graphs of bounded cliquewidth (which is closed under the deletion of vertices but not edges).

2 Tractable cases for counting

In this section we identify some situations in which it is straightforward to demonstrate that -#List-Emb  belongs to ; we then go on to discuss the applicability of these positive results to the motif-counting problem in some real-world networks. We begin by showing that, if -#List-Emb  is in when restricted to graphs from some class , we can still solve the problem efficiently on any graph obtained from an element of by adding a constant number of layers each of which has bounded vertex cover number.

Theorem 2.1.

Suppose that, when the host graph belongs to the class , -#List-Emb  can be solved in time for some fixed constant and a computable function , where and are the numbers of vertices in the pattern and host graphs respectively. For some fixed constant , let be classes of graphs of bounded vertex cover number. Then, when restricted to host graphs from , -#List-Emb  can be solved in time for an explicit computable function .

Proof.

Note first that if is the maximum vertex cover number of any graph in , then the vertex cover number of any element of is at most , and hence is bounded by a constant. Thus it suffices to prove the result in the case that .

Suppose that the input to -#List-Emb  is , where with and . We will assume that has a vertex cover , where . Then there are at most possibilities for which vertices of map to elements of and the mapping restricted to this subset; we will consider each such possibility in turn. Note that for each of these possible partial mappings we can determine in time whether it does indeed define a partial embedding of a subgraph of into such that each vertex in the domain maps to an element of .

Suppose we have fixed a set and an embedding of into . Assume without loss of generality that are the elements of . For each , define to be the set of vertices in whose neighbourhood contains the set ; we can compute each in time . It is then clear that the number of ways to extend to an embedding of into such that for all is precisely equal to the number of embeddings of into such that for each . Note that such embeddings, as they do not use vertices of , cannot use any edges of , so we can equivalently consider the number of embeddings into ; moreover, as none of the sets intersects , this quantity is the same whether we consider embeddings into or into .

Thus it suffices to solve at most instances of -#List-Emb  in which the host graph belongs to ; as we are assuming that we can solve instances of -#List-Emb  where the host graph comes from and the pattern graph has order , it follows that we can solve -#List-Emb  on host graphs from in time , as required. ∎

We obtain the following immediate corollary, by observing that any graph of almost bounded degree can be decomposed into a graph of bounded vertex cover number and a graph of bounded degree (and recalling that -#List-Emb  belongs to when restricted to the class of graphs of bounded degree, by Theorem 1.1).

Corollary 2.2.

Let be a class of graphs of almost bounded degree. Then, when restricted to host graphs from , -#List-Emb  is in .

Finally, we observe that if each layer has almost bounded degree then the resulting graph also has almost bounded degree, and hence -#List-Emb  is in whenever each layer has almost bounded degree.

Corollary 2.3.

Let be a fixed constant, and suppose that are classes of almost bounded degree. Then, when restricted to host graphs from , -#List-Emb  is in .

2.1 Application to real datasets

We begin by investigating the applicability of these positive results to the agricultural application described in Sec. 1.1, using data from the cattle-trading industry of Scotland. Including both beef and dairy farms, the industry is composed of approximately 12,000 active farms, and includes approximately 1.8 million animals [1]. As noted above, the contacts between farms in Scotland can be categorised into fundamentally different layers, inducing graphs of different classes. Building graphs with farms as vertices and contacts as edges, we focus on two types of contacts: geographic farm adjacency, and animal trading. In order to be able to apply Corollary 2.3 to this two-layer network, we both layers to have almost bounded degree.

We compute a series of cattle trading graphs from trades of cattle within Scotland. For each month of 2013, we use a graph in which active farms, markets, showgrounds, etc. are vertices, and two vertices are adjacent if there has been a trade of at least one animal between them in the month. From each of these graphs, we compute an iterated series of graphs of decreasing maximum degree by greedily removing the highest degree vertex at each step (with ties broken arbitrarily), record the maximum degree in the resulting graph at each step, and plot the resulting curves in Fig. 1.

Figure 1: Plots of the maximum degree in graphs derived from the Scottish cattle trading data by greedily removing high-degree vertices in the overall industry (left), the beef industry (middle), and the dairy industry (right). Each line is for the iterated graphs derived from one month of cattle movements in 2013. The horizontal dotted black lines are at maximum degree 10.

The removal of a relatively modest number of high-degree vertices decreases the maximum degree in the cattle graphs dramatically, suggesting the feasibility of an approach based on Corollary 2.3 on graphs derived from Scottish cattle trading data. This effect is visible when considering the industry as a whole, or when restricting our attention to the beef or dairy industries. As expected, the majority of the high-degree vertices removed to produce the plots in Figure 1 are markets or showgrounds. This observation combined with the fact that the proportion of movements via markets has increased and the absolute number of markets has decreased over time [37], suggests that additional farms (vertices, here) would not significantly increase the number of vertices that must be deleted to achieve small degree.

The geometric layer of edges derived from physical farm adjacency could reasonably be expected to be planar and low-degree. This is not quite true in the graph derived from available data on farm adjacency: there are a small number of high-degree vertices and occasional non-planarity. Upon inspection, these anomalies are largely due to unusual farm records: occasionally large numbers of non-contiguous fields are registered to the same farm. One could reasonably excise, split, or otherwise adjust these vertices: while the maximum degree is 73, removing only 23 of over 12,000 vertices decreases the maximum degree to 15. The mean degree is well below the mean degree of the edge contact graph of a Voronoi cell diagram derived from he centroids of the farms. When counting embeddings in a geometric layer of almost-bounded degree we can exploit the geometric embedding to give a more efficient algorithm, and we give an example of such an approach in Section 2.2.

Figure 2: A plot of the degree distribution (left) of a graph derived from a Facebook dataset [32], and the maximum degree in graphs derived by greedily removing high-degree vertices (right).

Concerning the applicability of our results more generally, we might expect, due to the preferential-attachment properties displayed by many data-derived graphs and networks, that it would be common for data-derived graphs to be of almost-bounded degree. To investigate this intuition, we plotted the degree distributions and iterative high-degree removal figures for several networks from the SNAP network repository [31] (a sample shown in Figure 2). We find that all of the networks we investigated show a relatively steep decline in maximum degree with greedy high-degree vertex removal. We suggest that figures of this sort might be useful in practice when determining how many high-degree vertices to include in the set that require more intense calculation.

Exactly how steep the decline in maximum degree must be for our approach to be useful will depend on the precise implementation of the parts of the algorithm dealing with both the high degree vertices and the remaining bounded-degree graph; we leave a thorough investigation of these considerations for future work.

2.2 An algorithm to count motifs in geometrically embedded graphs

In this section we describe an algorithm to solve -#List-Emb on geometrically-embedded graphs with limited local vertex density and limited edge length. Note that such graphs will have maximum degree bounded by a function of the local vertex density and edge length, so inclusion in follows from Theorem 1.1, but we are able to improve somewhat on the running time by exploiting the richer structure in this setting.

First, we define more precisely our notion of density. For technical reasons, we will be interested in the density of vertices in semi-circular areas in which the straight-line segment of the semi-circle is vertical: we call such a semi-circle a vertical semi-circle. Given an arrangement of a graph in the plane, the density of a vertical semi-circle is the number of vertices contained within it divided by the area of the vertical semi-circle. The maximum vertical semi-circular density of a graph arranged in the plane is the maximum such density over all vertical semi-circles.

Lemma 2.4.

If is a graph on vertices with an arrangement in the plane such that the longest edge is of geometric length , and maximum vertical semi-circular density , then there is an algorithm in time to count the embeddings of in .

Proof.

Let be a graph with an arrangement in the plane such that the longest edge is of length , and with maximum vertical semi-circular density .

We will use an approach of scanning vertical semi-circular “windows” of the arrangement of , looking for embeddings of . We first define a polynomial number of vertical semi-circles, each with its vertical line segment centered on a vertex of the graph, and then argue that we need only search within each semi-circle for embeddings of .

If is the diameter of , then for each vertex in , consider the vertical semicircle of radius centered at . Let be the set of vertices that are contained in that vertical semicircle, including and those on the perimeter, except for those exactly vertically above . We denote the set of all such vertex sets as .

We say that an embedding occurs in a vertex set if it includes a mapping to , and all mappings are to vertices in . We now argue that each embedding of occurs in exactly one vertex set in : the key idea is that every embedding of the in the arranged graph must have an uppermost leftmost vertex, and the embedding will occur in exactly the window anchored at that vertex.

Firstly, because motif has diameter , and each edge in the arrangement is of length at most , then certainly any embedding of with vertex as its uppermost leftmost vertex will fall entirely within the described semi-circular window used to produces , as the geometrically farthest vertex will be at most away from , and must be non-left of , nor vertically above it. Because we produced a vertex set for every vertex , and every copy of the motif must have exactly one uppermost leftmost vertex, each copy will fall in at least one of our vertex sets.

On the other hand, because every embedding of a motif in the arranged graph has a unique uppermost leftmost vertex , it occurs in at most one of the anchored windows in , specifically in .

Given a vertex set and a motif , we can exhaustively check for each vertex for all copies of in which is mapped to in time .

We can bound by the product of the maximum vertical semi-circular density of the graph arrangement and the size of the vertical semi-circles used to search for embeddings of . We know the area of the vertical semicircle producing is , therefore , and the running time is , or, as , we can express this as . Because we must perform this search for each of anchored vertical semi-circular windows, this approach gives an overall running time of . In some applications, a better bound might be obtained by considering densities appropriate for the size of semicircles in use: these densities will be upper bounded by , but might sometimes be significantly smaller. ∎

3 Hard cases for decision

In contrast with the tractability results above we prove that, for many graph classes, if the conditions given above for the existence of an FPT algorithm for -#List-Emb  are not met, then in fact the corresponding decision problem is hard. Specifically, in this section we prove the following result.

Theorem 3.1.

Let and be recursively enumerable monotone graph classes of unbounded vertex cover number, and suppose further that does not have almost bounded degree. Then -Emb(and hence -List-Emb) is W[1]-complete when restricted to host graphs from .

The proof of Theorem 3.1 relies heavily on the following result.

Theorem 3.2.

Let be the class of star forests and the class of 1-regular graphs. Then -Emb is W[1]-hard even if the host graph is restricted to .

We give a reduction from the following problem, shown to be W[1]-complete in [16].

-Multicolour Clique Input: A graph and a partition of into sets Parameter: Question: Does contain a clique with exactly one vertex in each set ?

Let be the input to an instance of -Multicolour Clique. We build on a strategy which has previously been used in several contexts [8, 7, 11, 12, 15, 24, 23, 33] to encode a -clique with a grid. We first construct two graphs and so that there is a restricted embedding of into if and only if contains a multicolour clique; we then show how to decorate and to obtain graphs and respectively so that there is an embedding of into if and only if there is an embedding of into which satisfies the restrictions. Finally, we will demonstrate that the edges of can be partitioned into two sets and so that is a star forest and has maximum degree one.

We begin by defining and . To help do so, we fix an ordered list of all unordered pairs of distinct elements of the set , and write for the element in this list.

is now defined as follows. consists of paths, each on vertices, with some additional edges: for each , if , there is an edge between the vertices on the and path. Notice that this means that, between any two of the paths, there is precisely one edge. Note that . The structure of is illustrated in Figure 3. It will be useful in the arguments that follow to refer to certain distinguished vertices of : we will refer to the vertex of the path as .

Figure 3: The construction of .

We now define . The vertices of can be partitioned into sets, which we denote for and . Each set contains two kinds of vertices. For each vertex , contains a vertex ; we such vertices anchor vertices. Additionally, for each pair such that and is incident with , contains a path on five vertices (denoted ); we call the vertices of these paths path vertices.

For each path , we have an edge from to , and for we also have an edge from to . Finally, for each , if , we have an edge from to whenever , and . The construction of is illustrated in Figure 4. Note that

Figure 4: An example of the construction of (bottom) from (top). A subgraph of corresponding to the clique induced by , and is highlighted.

We now argue that and have the desired properties; note that Lemma 3.3 alone demonstrates that -List-Emb  is W[1]-hard when the host graph is restricted to .

Lemma 3.3.

There is an embedding of into such that is an anchor vertex in for all and if and only if contains a multicolour clique.

Proof.

Suppose first that contains a multicolour clique; suppose that this clique has vertices where has colour . For each , let be an arbitrarily chosen edge incident with (notice that such an edge must exist since the vertices induce a clique). It is straightforward to see that there is an embedding from to the vertex set

such that, for each and , , which is an anchor vertex in .

Conversely, suppose there is an embedding of into such that is an anchor vertex in for each and . We define a mapping by setting to be the unique vertex such that . We now set

We begin by arguing that contains precisely one vertex from each colour class . It is clear that contains at least one vertex from each colour class, as for each . We now claim that, for each , . Suppose, for a contradiction, that this is not true for some . Then there exists such that ; fix the smallest for which this is true. Note that the distance in from to is six; however, the only element of at distance six from is , so if then this contradicts the fact that is an embedding of into .

We now argue that induces a clique in ; it suffices to show that every pair of vertices in is adjacent. Fix , and suppose that . Set and . Let be the unique vertex of at distance three from both and , and the unique vertex of at distance three from both and . Note that must be of the form for some edge , and of the form for some . Since , it follows from the definition of that is incident with and is incident with .

Notice that and are adjacent in , so and must be adjacent in . By definition, this edge is only present if in fact . Hence this edge is incident with both and , as required. ∎

We now show how to decorate and so that we can omit the restrictions on where certain vertices are mapped in the embedding.

Observe that

is bipartite, and hence does not contain any cycles of odd length. The idea is to attach odd-length cycles of suitably chosen lengths to certain vertices of both

and so that specific vertices of the new motif graph can only map to restricted subsets of the new host graph.

Specifically, we define and as follows. We obtain from by, for each and , adding a cycle of length which contains and new vertices. Similarly, we obtain from by, for each and , adding a cycle of length which contains and new vertices. The construction of and is illustrated in Figure 5.

Figure 5: The construction of (top) and (bottom).
Lemma 3.4.

There is an embedding of into if and only if contains a multicolour clique.

Proof.

By Lemma 3.3, it suffices to show that there is an embedding of into if and only if there is an embedding of into such that is an anchor vertex in for all and .

Suppose first that there is an embedding of into such that is an anchor vertex in for all and . We define a embedding of into by extending as follows: for any vertex which belongs to a cycle containing , we define to be the corresponding vertex on the cycle in that includes . It is immediate from the construction of that such a cycle, of the correct length, exists.

Conversely, suppose that there is an embedding of into . Observe that belongs to a cycle of length , and has degree at least four. Since is bipartite, the only odd length cycles in are those added in the construction of and in particular the only cycles in of length are those that contain an anchor vertex in . Moreover, the only vertices belonging to such cycles that have degree greater than two are precisely the anchor vertices in . Thus it must be that is an anchor vertex of for each and .

It remains to check that for all . This can only be false if includes a vertex from the cycle incident with some anchor vertex not in ; as is connected, we would also have to have the corresponding anchor vertex in . However, the distance between any two anchor vertices in , or between an anchor vertex of and one of , is at least six, but no vertex in is at distance more than three from some vertex , so no other vertex of can be mapped to an anchor vertex.

We therefore see that restriction of to is an embedding of into such that is an anchor vertex of for each and , completing the proof. ∎

Finally, it remains to show that we can decompose the edge-set of into two sets with the required properties.

Lemma 3.5.

There exist two sets of edges and such that , is a forest and has maximum degree one.

Proof.

We begin by defining our edge partition. The set contains the following edges:

  • all edges with one endpoint in and the other in where ;

  • the edges and for each ;

  • for each cycle in , every edge with an even index when the edges of the cycle are numbered consecutively and an edge incident with the vertex belonging to is numbered one.

All remaining edges are assigned to . This partition of the edges is illustrated in Figure 6. It is straightforward to verify that and have the desired properties. ∎

Figure 6: The partition of the edge-set of into and : edges from (highlighted in the diagram) are disjoint, while the remaining edges induce a star forest.

Together, Lemmas 3.3, 3.4 and 3.5 complete the proof of Theorem 3.2. We are now ready to prove Theorem 3.1.

Proof of Theorem 3.1.

We will argue that contains all finite star forests and that contains all finite 1-regular graphs; the result will then follow immediately from Theorem 3.2.

First, let be an arbitrary star forest; we will argue that . Let be the maximum degree of , and suppose that has exactly connected components. We will show that , the star forest consisting of identical connected components, each isomorphic to , belongs to . Since does not have almost bounded degree, there must be some graph which has at least vertices of degree at least . In we find a collection of vertex-disjoint copies of greedily as follows: pick any vertex of degree at least and delete it together with of its neighbours. The deleted vertex set induces a graph which contains as a subgraph, while the degree of any vertex in the rest of decreases by at most . Thus, we will be able to repeat this process times to obtain our disjoint copies of ; the fact that all subgraphs of , including , belong to follows from the closure of under deletion of vertices and edges.

To see that contains all finite graphs of maximum degree one, fix some 1-regular graph ; suppose that has exactly edges. As does not have bounded vertex cover number, it must contain graphs with arbitrarily large matchings and in particular some contains at least disjoint edges. Since is closed under the deletion of vertices and edges, it follows that the graph consisting of precisely disjoint edges belongs to . ∎

4 An aside: the relationship between -#Emb  and -#List-Emb

The existing meta-theorems which show that -#Emb  is in when restricted to host graphs from some class can also be used to show that -#List-Emb  belongs to in these cases, so we do not have any examples of graph classes where -#Emb  belongs to but its generalisation -#List-Emb  does not. In this section we investigate the relationship between the complexities of these two problems in more detail, showing that under certain assumptions the existence of an FPT algorithm for -#Emb  when restricted to host graphs from is sufficient to guarantee an FPT algorithm for -#List-Emb  on the same class; we further show that for all monotone graph classes , the existence of an FPT algorithm for -#Emb  gives rise to an efficient approximation scheme for -#List-Emb.

We begin by showing that the existence of an FPT algorithm for -#Emb  on host graphs restricted to any monotone class allows us to solve instances -#List-Emb  efficiently on graphs from the same class if the allowed sets for each vertex in the pattern graph are pairwise disjoint.

Lemma 4.1.

Let be a monotone class of graphs on which -#Emb  belongs to , and suppose that be the input to an instance of -#List-Emb, where and the sets are pairwise disjoint. Then we can solve the instance of -#List-Emb  in time for some computable function .

Proof.

Assume that , and our restriction on embeddings is that we must have for each ; we call an embedding that meets this condition a good embedding of into . For , we write for the unique such that . We define to be the subgraph of with vertex set and edge set . Observe that can easily be computed from the input to -#List-Emb  in polynomial time, and the number of good embeddings of into and into is the same; it therefore suffices to demonstrate that we can compute the number of good embeddings of into in time .

Now, suppose that is a subgraph of such that for each . By construction of , it must be that is isomorphic to a subgraph of . Thus we see that is an embedding of into such that for some permutation if and only if the mapping defines an automorphism on . Hence the number of embeddings of into is exactly (the number of automorphisms of ) times the number of good embeddings of into whose image includes exactly one vertex from each set . As we can compute in time depending only on , it remains only to show that we can, in the available time, compute the number of good embeddings of into whose image includes exactly one vertex from each set .

This final step can be achieved by making a number (depending only on ) of calls to -#Emb  which, by assumption, is solvable in time on graphs from ; note that is in as it is obtained from by deleting edges. We compute the number of (unrestricted) embeddings of into the subgraph of induced by , for each . Combining these using a standard inclusion-exclusion method (see e.g. [12, 26]) allows us to determine the number of good embeddings of into whose image includes exactly one vertex from each set . ∎

The operation of blowing-up a vertex in a graph involves replacing with two non-adjacent vertices and , both adjacent to the vertex if and only if . We now argue that, if our monotone graph class is closed under this operation and -#Emb  is in when restricted to host graphs from , then -#List-Emb  is also in when restricted to host graphs from .

Lemma 4.2.

Let be a monotone class of graphs which is closed under the operation of blowing up vertices. Then -#Emb  belongs to when restricted to host graphs from if and only if the same is true of -#List-Emb.

Proof.

As -#Emb  is a special case of -#List-Emb, the if direction is trivial. For the only if direction, we proceed by induction on ; we assume that -#List-Emb  can be solved in time whenever the host graph is drawn from and the pattern graph has strictly fewer than vertices. Let be the input to our instance of -#List-Emb. We obtain a graph by replacing each vertex with a set of vertices, each with neighbourhood . For each vertex , we write for the vertex of to which it corresponds, so . In this way we obtain a new instance of -#List-Emb  in which the permitted sets for each pattern vertex are pairwise disjoint. By Lemma 4.1 we can solve this instance of -#List-Embin time for some computable function .

Each good embedding of into corresponds, in an obvious way, to an embedding of into which meets the restrictions of the new instance; however, the number of embeddings in the new instance is potentially larger than that in the original, as we could use two vertices and in which both correspond to the same vertex in . We will such embeddings mirages.

We can however adjust for this overcounting. Each mirage corresponds to a unique pattern graph obtained from by identifying all pairs of vertices such that . We denote the vertices of by where corresponds to the set ; we call the signature of the mirage . Observe that is a good embedding in the instance

of -#List-Emb. Moreover, it is clear that there is a one-to-one correspondence between the good embeddings in this instance and the mirages with signature . Thus we can compute the number of mirages by considering all possible signatures for a mirage (the number of which depends only on ) and summing the solutions to the corresponding instances of -#List-Emb. Since each instance of -#List-Emb  we consider has a pattern graph with strictly fewer vertices than , we know by the inductive hypothesis that we can solve -#List-Emb  on each such instance in time . This allows us to compute the number of mirages in time bounded by for some computable function ; the result follows immediately. ∎

Finally, we observe that even without making any additional assumptions about the class , the existence of an efficient exact counting algorithm for -#Emb  on host graphs from is enough to guarantee the existence of an efficient approximation scheme for -#List-Emb  restricted to host graphs from . An FPTRAS (fixed parameter tractable randomised approximation scheme) is the analogue in the parameterised setting of an FPRAS (fully polynomial randomised approximation scheme), and is formally defined as follows.

Definition.

An FPTRAS for a parameterised counting problem with parameter is a randomised approximation scheme that takes an instance of (with ), and rational numbers and , and in time (where is any function, and is a polynomial in , and ) outputs a rational number such that

We note that in fact we do not require the full flexibility of this definition in our next result; our algorithm will always return a solution with relative error at most .

Lemma 4.3.

Let be a monotone class of graphs such that -#Emb  belongs to when restricted to host graphs from . Then there is an FPTRAS to solve -#List-Emb  restricted to host graphs from .

Proof.

We begin by observing that the following problem admits an algorithm whenever the host graph comes from a class on which -#Emb  belongs to ; we say that a set is colourful with respect to some colouring if each of its elements receives a different colouring under .

-#Multicolour Restricted-Emb

Input: Graphs and , subsets , and a function .

Parameter: .

Question: How many embeddings of into are such that is colourful?

If we fix a permutation , the number of embeddings of into such that for each is precisely the solution to -#List-Emb  on input

Note that the sets are pairwise disjoint, so by Lemma 4.1 we can solve each such instance of -#List-Emb  in time , for some computable function . Summing over all possibilities for , we can therefore calculate the solution to an instance of -#Multicolour Restricted-Emb in time for a computable function .

We now use a colour-coding method to prove the result. A family of hash functions from to is said to be -balanced if that there exists some constant so that, for any , the number of functions such that is colourful with respect to is between and . It was shown in [2] that, for every , a family of -balanced hash functions from to of cardinality can be computed in time . We will therefore assume we are equipped with such a family .

For each , we solve an instance of -#Multicolour Restricted-Emb to compute the number of good embeddings of into whose image is colourful with respect to ; we denote this number . We then return

as our answer to the original instance of -#List-Embİt is clear from the reasoning above that we can compute within the permitted time, so it remains to show that the relative error in our answer is at most .

By the definition of an -balanced family of hash functions, we know that a fixed embedding of into will have a colourful image with respect to at least and at most