Constant Amortized Time Enumeration of Independent Sets for Graphs with Bounded Clique Number

06/24/2019
by   Kazuhiro Kurita, et al.
Hokkaido University
0

In this study, we address the independent set enumeration problem. Although several efficient enumeration algorithms and careful analyses have been proposed for maximal independent sets, no fine-grained analysis has been given for the non-maximal variant. From the main result, we propose an algorithm EIS for the non-maximal variant that runs in O(q) amortized time and linear space, where q is the clique number, i.e., the maximum size of a clique in an input graph. Note that EIS works correctly even if the exact value of q is unknown. Despite its simplicity, EIS is optimal for graphs with a bounded clique number, such as, triangle-free graphs, planar graphs, bounded degenerate graphs, locally bounded expansion graphs, and F-free graphs for any fixed graph F, where a F-free graph is a graph that has no copy of F as a subgraph.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

06/24/2019

Constant Amortized Time Enumeration of Independent Sets for Graphs with Forbidden Subgraphs on Fixed Number of Vertices

In this paper, we address the independent set enumeration problem. Altho...
11/17/2020

Exact recovery of Planted Cliques in Semi-random graphs

In this paper, we study the Planted Clique problem in a semi-random mode...
07/21/2021

Fairness-aware Maximal Clique Enumeration

Cohesive subgraph mining on attributed graphs is a fundamental problem i...
08/24/2018

Detecting strong cliques

A strong clique in a graph is a clique intersecting every maximal indepe...
03/07/2020

Clustering powers of sparse graphs

We prove that if G is a sparse graph — it belongs to a fixed class of bo...
09/22/2020

Efficiently Finding a Maximal Clique Summary via Effective Sampling

Maximal clique enumeration (MCE) is a fundamental problem in graph theor...
03/07/2021

Termination of Multipartite Graph Series Arising from Complex Network Modelling

An intense activity is nowadays devoted to the definition of models capt...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A subgraph enumeration problem is defined as follows: Given a graph and a constraint , the task is to output all subgraphs in that satisfy without duplication. We call an algorithm for an enumeration problem an enumeration algorithm. Enumeration problems have been widely studied, both in theory and practice, since 1950. The independent set enumeration problem is one of the central problems in the enumeration and several enumeration algorithms have been proposed for the maximal or maximum independent set enumeration problem [22, 5, 2, 9]. In particular, theoretically efficient algorithms are developed by restricting the class of input graphs, e.g., chordal graphs [20, 14], circular arc graphs [14], bipartite graphs [12], and claw-free graphs [18]. Another important object to be enumerated is a clique, that is, the independent set of the compliment graph of a given graph. Several efficient algorithms exist for the maximal enumeration problem [21, 15, 6]. Since every non-maximal independent set is a subset of some maximal ones, using the above results, we can find all non-maximal solutions. However, it is difficult to avoid to output duplication efficiently. Moreover, even though there are many efficient algorithms for maximal variant, no fine-grained analysis has been given for non-maximal variant. Thus, this study aims to develop an efficient algorithm for the independent set enumeration problem, which also demands the output of non-maximal solutions.

Generally, the number of solutions to an enumeration problem is exponential in the size of the input. If the number of solutions is much smaller, then it becomes unsuitable to evaluate the efficiency of an enumeration algorithm by the size of an input only since this can sometimes yield a trivial bound, such as time. In this paper, we evaluate the efficiency of an enumeration algorithm by both the size of the input and the number of solutions. We call this analysis output sensitive analysis [11]. Let be an enumeration algorithm; is an output polynomial time algorithm if the algorithm runs in time. is a polynomial amortized time algorithm if the running time is bounded by , that is, runs in time per solution on average. runs in delay if the interval between two consecutive solutions can be bounded by time and the preprocessing time and postprocessing time are also bounded by polynomial in . Note that an polynomial amortized time algorithm does not guarantee that the maximum interval between two consecutive outputs, called the delay, is polynomial. So far, several frameworks and a complexity analysis technique have been proposed for developing efficient enumeration algorithms [1, 8, 4, 7, 24]. These frameworks have been used to develop several efficient output-sensitive enumeration algorithms, especially for sparse input graphs [25, 26, 13, 6, 16, 3]. However, as will be shown later, simply applying the above results cannot generate an efficient algorithm for the independent set enumeration problem.

[width=0.6]./map.pdf

Figure 1: The inclusion relation between graph classes [10]. In this map, an arrow goes from a graph class to its super class, i.e., graphs with bounded clique number is a super class of bounded degeneracy graphs.

1.1 Main results

In this paper, we focus on -free graphs, which are graphs that have no cliques with size as subgraphs. Note that every graph is -free for some (e.g., ). In addition, it is known that if a graph does not have a clique with size , then belongs to some sparse graph class, such as, triangle-free graphs (), planar graphs ( since they do not have both and ), locally bounded expansion graphs ( for some function ), bounded degenerate graphs ( is at most the degeneracy plus one), -free graphs for some fixed graph ( is the size of ), etc (Figure 1), where an -free graph is a graph that has no copy of as a subgraph. Using the main result of this paper, we propose an algorithm EIS called for the independent set enumeration problem that runs in amortized time with linear space. EIS is optimal for these graph classes. Note that a complete graph with vertices contains any graph with vertices as a subgraph, thus if , then EIS is optimal for -free graphs. We emphasize that EIS works correctly even if the exact value of is unknown.

EIS is simple binary partition. The algorithm starts with , where is an -vertex graph. First, EIS outputs and computes a vertex sequence of , sorted by a smallest-last ordering [17]. Next, EIS generates pairs, made of a subsolution and its corresponding graph and , and , , and and . Then, for each pair, EIS makes recursive calls and repeats the above operations. We call this generation step an iteration. It can be easily shown that this algorithm runs in amortized time since each iteration has new child iterations and needs time to generate all the children. However, this naive analysis is too pessimistic since the number of vertices whose degree are may be small. For example, if is a star with leaves that has a vertex with degree , then the number of solutions is . The first iteration of the algorithm requires time to generate all the children, while other iterations require time since any subset of the remaining vertices makes a new solution, where is the set of child iterations. Thus, the total time complexity is , i.e., EIS runs in time per solution on average. As described above, if has many vertices with small degree, then the simple analysis appears not to be tight. Conversely, if has no vertices with small degree, then has a large clique from Turán’s theorem [23]. From this observation, we focus on the clique number of an input graph to give a tight complexity bound.

The analysis is based on the push out amortization technique [24], however, it is difficult to directly apply the technique to the proposed algorithm. To apply this technique, we focus on the size of the input graph for an iteration. If an input graph is sparse, then there are many iterations with small input graphs, e.g., the size is constant, hence the sum of the computation time for these iterations dominates the total computation time for EIS. In particular, we regard a graph as a small graph if the graph has at most vertices. Otherwise, the graph is large. Surprisingly, the algorithm correctly works even if the location of the boundary is unknown, that is, the size of a maximum clique. In addition, by using run-length encoding, we show that EIS uses only linear space in total. Due to space limitations, all proofs are given in the Appendix.

2 Preliminaries

Let be a simple undirected graph, i.e., has no self loops and multiple edges, where and are the set of vertices and edges of , respectively. Let denote the number of vertices in and the number of edges in . Let and be vertices in ; and are adjacent if . We denote by the set of the adjacent vertices of in . We call a neighbor of in if and an edge an incident edge of . is the degree of . The degree of is the maximum degree of . If there is no confusion, we can drop from the notations. A set of vertices of is an independent set if has no edges, that is, for any pair of is not included in . Let be a vertex subset of and be the subgraph of induced by , where . For simplicity, we write , and . A graph is a complete graph if for any distinct pair , and a set of vertices of is a clique if is a complete graph. is expressed as a complete graph with vertices. is said to be -free if has no clique with vertices. In this paper, we consider the following enumeration problem: Given an undirected graph , then output all independent sets in without duplication.

3 The proposed algorithm

1 Procedure EIS() // : An input graph
2       RecEIS();
3      
4Procedure RecEIS()
5       Output ;
6       for  do // has the minimum degree in .
7             RecEIS();
8             ;
9            
Algorithm 1 An amortized time enumeration algorithm EIS for independent sets, where is the clique number of .

In this section, we present a recursive enumeration algorithm EIS based on binary partition, as shown in Algorithm 1. Binary partition is a framework for developing enumeration algorithms. We provide a high level description of our proposed algorithm EIS.

Let be the solution space of the independent set problem for a given graph . For each recursive call , called an iteration, is associated with a graph and a solution . EIS first outputs on the iteration. Next, it picks a vertex from and partitions the current solution space into two distinct subspaces; one consists of solutions containing and the other consists of solutions that do not contain . Then, EIS makes a new iteration that receives and . We call a child iteration of and the parent iteration of . When backtracking from , EIS removes from , picks a new vertex, and makes a new child iteration . Each iteration repeats the above procedure for all vertices in the input graph of the iteration. EIS builds a recursion tree , where is the set of iterations and is given by the parent-child relation among . is referred to as a leaf iteration if has no child iterations. Otherwise, it is called an internal iteration. Let and be the input graph and the input independent set of , respectively. From the construction of EIS, we can obtain the following theorem.

Theorem 1.

Let be a graph. EIS enumerates all independent sets in without duplication.

Proof.

We first show EIS outputs all solutions by induction on the size of a solution . We assume that all the solutions whose size are at most are outputted. Let be the size of and be a vertex set , where . Note that any subset of is also an independent set, and thus, is an independent set. From the assumption, there is an iteration which outputs . If contains , then EIS outputs . Otherwise, there is the lowest ancestor iteration such that removes from in Line 1. Let be an iteration such that . Since is an independent set in , there is a descendant iteration of which outputs , i.e., all solutions are outputted.

Next, we show EIS does not output duplicate solutions. Let and be two distinct iterations. We assume that the both output the same solution. Let be the lowest common ancestor of and . We assume that and . Otherwise, the output of differs from the one of from the construction of Algorithm 1, and this contradicts the assumption. Let be a vertex picked in such that . Again, from the construction of the algorithm, does not contain . Hence, this contradicts the assumption. ∎

Note that Theorem 1 holds for any ordering of the vertices in an iteration. Hence, as in Line 1, we employ the following simple picking ordering: pick a vertex with minimum degree. This ordering is known as smallest-last ordering or degeneracy ordering [17]. Note that a smallest-last ordering is not unique since there are several vertices with the minimum degree. Thus, hereafter, we fix some deterministic procedure to uniquely determine the smallest-last ordering of .

4 Time complexity

In this section, we analyze the time complexity of EIS. In the following, we restrict an input graph to be -free. Note that every graph is -free for some (for example, ), hence, the result is true for all graphs. We first give a brief overview of our time complexity analysis, as shown in Figure 2. The left part of the figure is the recursion tree made by EIS. In our analysis, we push a part of the computational cost of an iteration to its child iterations. The remaining cost is received by the iteration itself. The key point is to use different distribution rules from the gray area and the white area. The boundary between these areas is defined by the size of the input graph of an iteration. More precisely, the gray area contains iterations whose input graphs have less than or equal to vertices (Sect. 4.1), and the white area contains iterations whose input graphs have more than vertices (Sect. 4.2). This boundary on gives a sophisticated time complexity analysis. A simple amortization analysis can be applied in the gray area. In the white area, the push out amortization technique [24] is used to design the cost distribution rule.

The right part shows how to push the computational cost from the iteration of the white area to that of the gray area. Gray rectangles represent costs pushed to the child iterations, while white rectangles represent costs received by the iteration itself. By using the push out amortization technique, we can show that an iteration receives only computational time from its parent, where is the computation time of . That is, the delivered cost does not worsen the time complexity of .

[width=0.9]./tree2.pdf

Figure 2: High level overview of our time complexity analysis. In the figure on the right, gray rectangles and a white rectangle represent the computational cost, and the sum of all rectangles means . receives a white rectangle and gray rectangles are pushed out to each iteration.

In the following, we provide the detailed description of our analysis. Assume that we use an adjacency matrix for storing the input graph. For simplicity, we write and . Let , , be the set of children of , and be the recursive subtree of rooted . The next lemma is easy but plays a key role in this section:

Lemma 2.

The total computation time of can be bounded by .

Proof.

Let be the set of iterations on . For each iteration in , since each picked vertex on generates a new child iteration, needs time, and thus, the total time of EIS for is . Note that . In addition, each iteration has the corresponding solution and for any descendant iteration of . Hence, the total time complexity is . ∎

4.1 Case:

From Lemma 2, if satisfies , then the total time complexity of is time on average. Note that for any descendant iteration of , since , if , then . Thus, the time complexity of is also time on average if satisfies the condition.

4.2 Case:

In this subsection, we use the push out amortization [24] to analyze the case . This is one of the general techniques for analyzing the time complexity of enumeration algorithms. Intuitively, if an enumeration algorithm satisfies the PO condition, the total time complexity of the algorithm can be bounded by the sum of the time complexity of leaf iterations with small time complexity. The PO condition is defined as follows: For any internal iteration , . Here, represents the maximum time complexity among the leaf iterations, and denotes some constants, represents the time complexity of , and is the total computation time of child iterations of . Note that if is large, then each internal iteration can more readily push its computation time out to its child iterations more.

We first explain the outline of the push out amortization. In [24], Uno gives a concrete computation time distribution rule for this amortization. Let be a computation time that is pushed out from the parent of and toward . Hence, now has as its total computation time. To achieve time per solution on average, the computation time of is delivered as follows: (D1) receives and (D2) each child iteration of receives the remaining computation time of . In reality, since the sum of the number of child iterations of all iterations in does not exceed the number of solutions, each iteration receives as (D1) on average. In addition, if the algorithm satisfies the PO condition, then . The reader should refer to [24] for more details. In the following, we show that EIS satisfies the PO condition. For the following discussion, we introduce some notations. Let and be a vertex with minimum degree in . We denote the subgraph of induced by by .

We first consider the number of vertices of a child iteration. After picking a vertex with minimum degree in , removes its neighborhood from . Thus, for each , the size of the input graph for the -th children of is . Now, assume that the lower bound of the time complexity in each iteration is . Clearly, this assumption does not improve the time complexity of EIS. Thus, the following equation holds for the total time complexity of all the child iterations: .

Next, we consider the lower bound of for each . Let . Since the size of the input of an iteration is smaller than the size of an ancestor, forms a tree, where . Thus, we can use the push out amortization technique to analyze this case. Since has no large clique, the upper bound of is obtained from Lemma 4. This lemma can easily be derived from Theorem 3 that is shown by Turán. Let .

Theorem 3 ((Turán’s Theorem [23])).

For any integer and , a graph that does not contain as a subgraph has at most edges.

Lemma 4.

Let be a graph and be a vertex with the minimum degree in . If the size of a maximum clique in is at most , then , where is the number of vertices in .

Proof.

If the minimum degree of is more than , then has more than edges. This contradicts Theorem 3 and the statement holds. ∎

Using this upper bound, we show the following lemma which implies that if the size of the input graph of is large enough, that is, , then the total computation time of all the child iterations of consumes more than that of .

Lemma 5.

Let be an internal iteration in . There exists a constant such that .

Proof.

Let be the number of vertices in and be an integer. If , then from Lemma 4. Hence, since is non negative for any . Therefore,

holds. Thus, the statement holds. ∎

Using Lemma 5, we can show that by choosing appropriate values for , , and , any internal iteration of satisfies the PO condition.

Lemma 6.

Suppose that , , and for some positive constant , then, any internal iteration in with satisfies the PO condition, that is, .

Proof.

From Lemma 5, there exists such that holds. Hence,

(1)

The right hand side of Eq. (1) is minimum when since the side is monotonically increasing for . Hence,

Therefore, any internal iteration satisfies the PO condition and the statement holds. ∎

Recall that any leaf iteration receives computation time from the parent. Hence, the following lemma holds.

Lemma 7.

Let be a leaf iteration of and be an internal iteration of . Then, receives computational time and needs time.

Proof.

From Lemma 6, any internal iteration in satisfies the PO condition. Remind that any leaf iteration receives at most time since and are positive constants. In addition, each internal iteration of has . Thus, the statement holds. ∎

From Lemma 6 and the distribution rule, any internal iteration in has at most computation time on average. In addition, any leaf iteration in has at most computation time. Note that from the definition, . From Lemma 2 and Lemma 7, we can show the following main theorem.

Theorem 8.

EIS enumerates all independent sets in amortized time even if the exact value of is unknown.

Proof.

From Lemma 2, the time complexity of each iteration in is time on average. From Lemma 7, receives at most time from the parent. Hence, if , then from Lemma 2, any descendant iteration of has computation time. If , then from Lemma 6 and the distribution rule of the computational cost, any has time on average. Note that the difference between and is exactly one vertex for any iteration and its child iteration . Thus, the total size of what EIS outputs is bounded by the number of iteration of EIS. Therefore, by outputting only the difference between the -th solution and the -th solution instead of the -th solution, the amortized time complexity of EIS is time and the statement holds. ∎

5 A linear space implementation of EIS

In this section, we show that we can implement EIS in linear space. The main space bottleneck associated with EIS relates to the following two points: One is the representation of input graphs. If we naively employ an adjacency matrix to represent an input graph, EIS uses space. However, if we employ an adjacent list, then linear space is obtained but then it becomes difficult to obtain the input graph for a child iteration of a current iteration in from . Note that can easily be obtained in time. The other bottleneck is related to the smallest-last ordering. If each iteration of EIS stores the smallest-last ordering, since the number of iterations between the root iteration and a leaf iteration is at most , EIS needs space. To overcome these difficulties and in particular, to achieve space, we use run-length encoding for compressing an adjacency matrix and a partial smallest-ordering to store only the differences between the smallest-last orderings.

We summarize the data structures that are stored during execution of an iteration as follows. Let be an ancestor iteration of and be the parent of . Suppose that a vertex is picked on . We will provide precise definitions of these data structures in the remainder of this section. Roughly speaking, is the run-length encoded adjacent matrix, is the smallest-last ordering of , and represents the removed vertices from .

  1. and the smallest-last ordering of ,

  2. Vertices from the first elements on and the position of on for each ,

  3. for restoring for each ,

  4. for each for a picked vertex , and

  5. for each .

[width=0.7]./adjacency_matrix.pdf

Figure 3: An example of an input graph , its adjacency matrix, and its compressed representation. A cell in is gray if and are adjacent. Otherwise, the cell is white. (C) shows a run-length encoded adjacency matrix .

First, we introduce the compression of input graphs by run-length encoding, which is a lossless data compression. We define run-length encoding and run-length encoded adjacency matrix. Let be a sequence consisting of 0 and 1. We define a run-length encoded 0-1 sequence of as follows: Let . For , is the length of the interval between consecutive 0 sequences starting from the -th element in . Similarly, is the length of the interval between consecutive 1 sequences starting from the -th element in . For example, if , then . This is denoted by ; let us call this the length of . The following lemma holds for the length of .

Lemma 9.

Let be a 0-1 sequence and be a run-length encoded sequence of . Then, the length of is at most , where and is the number of 0 and 1 in , respectively.

Proof.

Since for and for , is at most . Hence, the length of is at most and the statement holds. ∎

Let be an adjacency matrix of , be the -th row of , and let of . We call the run-length encoded adjacency matrix of . An example of a run-length encoded adjacency matrix is shown in Figure 3. The next lemma shows that the size of is linear in the size of . Since is a 0-1 matrix, is a 0-1 sequence with length for .

Lemma 10.

Let be an adjacency matrix of a graph . Then, needs space, where and are the number of vertices and the number of edges, respectively.

Proof.

Since from Lemma 9, for each , and , consumes space. Thus, uses space and the statement holds. ∎

5.1 Generating the input graphs of child iterations

In this subsection, we explain how to generate child iterations with the compressed inputs. This can be done in total time if EIS uses . To execute Line 1, we first consider how to obtain the smallest-last ordering. If the graph is stored in adjacency list representation, the ordering can be obtained in time. Now, an adjacency list can be obtained from in time. Hence, we can compute the smallest-last ordering of in time.

Generating can be done in time since the run-length encoded sequences are ordered by the smallest-last ordering of , where is the first vertex in the sequences. Next, we consider how to generate from , where is a child iteration of in Line 1 whose input graph is obtained by removing from . Our goal is time for obtaining . If we can achieve this computation time, then the time complexity of can be bounded by by distributing computation time from to its child . From the following lemma, we can compute in time.

Lemma 11.

Let be a graph and be the vertex with minimum degree in . Then, we can obtain from in time.

Proof.

By simply scanning from the first element to the last element, we can obtain in time. Note that is sorted in the order of their indices.

Next, for each , we compute from . Let be the -th vertex in , be the -th element of , and . Let and be the last element of . Now we can check whether and are connected or not by checking the following condition: Let . (1) If satisfies , then (1.a) if the parity of and the index of is same then increment by one; (1.b) if the parity of and the index of is not same then add as the last element to . (2) If does not satisfy , then increment the value of by one and check above two condition. After updating the row, we continue to check from . Now both and are sorted. Hence, we can compute from by scanning from the first element. If we apply the above procedure naively, then we need time. However, the above procedure can be done in time by processing consecutive 0s or 1s in one operation. Hence, the statement holds. ∎

Since , we can push this computation time to . In addition, can also receive the computation time of without worsening the computation time of . Next, we analyse more precisely.

Lemma 12.

Let be a graph and be a vertex with the minimum degree in . Then, the length of is for any .

Proof.

Suppose that . Note that is equal to in . Thus, from Lemma 9, the length of is . Next, we assume that . Since , . Hence, from Lemma 9, the length of is at most . Therefore, . ∎

From Lemma 12, for any vertex in . Thus, and we can push this computation time to . From the above discussion, we can compute from without worsening the time complexity.

5.2 Restoring the input graph of the parent iteration

In this subsection, we consider backtracking from to . The goal here is to show that the restoration of from can be done in time with total space. To restore , we need to restore the smallest-last ordering in advance. Since we only consider the backtracking from to , if no confusion arises, we identify with , with , and with .

5.2.1 Smallest-last orderings

If EIS stores the smallest-last ordering of from that of when making a recursive call and discards it when backtracking, then the total space is since the depth of the search tree and the number of vertices of the input graphs can be . Hence, to achieve space, EIS does not entirely store the orderings, but rather stores them partially. Let and respectively be the smallest-last ordering of and , and be the vertex such that . denotes the partial smallest-last ordering obtained by removing the vertices in from . Let be a smallest-last ordering and be the -th vertex of . We say that a sequence is obtained by shifting at position to positions in if , and write . This is refered to as a shift operation. Let be the sequence of pairs of a vertex and a shift value with length , such that . It can easily be shown that can be obtained in time since can be obtained in time for each .

Lemma 13.

Let be the root iteration and be a leaf iteration. Suppose that is the path of iterations on such that for each , is the parent of and . Then, .

Proof.

If a vertex is shifted, this implies that one of its incident edge is removed from a graph since each smallest-ordering is obtained by some fixed deterministic procedure. Thus, the number of applying shift operations is at most on . Hence, the total number of applying shift operations is at most the number of edges in the input graph. ∎

In addition, is obtained by adding to in time if for each , EIS stores the position of in . This needs space in total since each vertex is removed from a graph at most once on the path from a current iteration to the root iteration. Hence, from the above discussion, we can obtain the following lemma:

Lemma 14.

We can compute from in time with space in total when backtracking from to .

From Lemma 14, EIS demands space and time for each iteration on average for restoring the smallest-last orderings.

5.2.2 Run-length encoded adjacency matrices

In this subsection, we demonstrate how to restore each row of . Let be a vertex in . Recall that is picked from and added to . If , by just adding to , we can restore since EIS keeps until backtracking. In addition, once is removed from , it will never appear in the input graph of a descendant iteration of . Thus, EIS requires linear space in total for storing removed .

Suppose . To restore by adding some vertices to , we use data structures and . represents the neighborhood of that is removed from to obtain , and is its run-length encoded representation. In the following, we fix , , and , and we abuse notation using and to denote and , respectively.

Their precise definitions are as follows: is a 0-1 sequence with length . The -th element of if the -th neighbor of in is adjacent to both and in . Otherwise, . is the run-length encoded sequence of