Faster and Enhanced Inclusion-Minimal Cograph Completion

01/21/2020 ∙ by Christophe Crespelle, et al. ∙ University of Bergen 0

We design two incremental algorithms for computing an inclusion-minimal completion of an arbitrary graph into a cograph. The first one is able to do so while providing an additional property which is crucial in practice to obtain inclusion-minimal completions using as few edges as possible : it is able to compute a minimum-cardinality completion of the neighbourhood of the new vertex introduced at each incremental step. It runs in O(n+m') time, where m' is the number of edges in the completed graph. This matches the complexity of the algorithm in [Lokshtanov, Mancini and Papadopoulos 2010] and positively answers one of their open questions. Our second algorithm improves the complexity of inclusion-minimal completion to O(n+mlog^2 n) when the additional property above is not required. Moreover, we prove that many very sparse graphs, having only O(n) edges, require Ω(n^2) edges in any of their cograph completions. For these graphs, which include many of those encountered in applications, the improvement we obtain on the complexity scales as O(n/log^2 n).

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

We consider the problem of completion of an arbitrary graph into a cograph, i.e. a graph with no induced path on vertices. This is a particular case of graph modification problem, in which one wants to perform elementary modifications to an input graph, typically adding and removing edges and vertices, in order to obtain a graph belonging to a given target class of graphs, which satisfies some additional property compared to the input. Ideally, one would like to do so by performing a minimum number of elementary modifications. This is a fundamental problem in graph algorithms, which corresponds to the notion of projection in geometry: given an element of a ground set equipped with a distance and a subset , find an element of that is closest to for the provided distance (here, the number of elementary modifications performed on the graph). This is also the meaning of modification problems in algorithmic graph theory: they answer the question to know how far is a given graph from satisfying a target property.

Here, we consider the modification problem called completion, where only one operation is allowed: adding an edge. In this case, the quantity to be minimised, called the cost of the completion, is the number of edges added, which are called fill edges. The particular case of completion problems has been shown very useful in algorithmic graph theory and several other contexts. These problems are closely related to some important graph parameters, such as treewidth [2], and can help to efficiently solve problems that otherwise are hard on the input graph [6]. They are also useful for other algorithmic problems arising in computer science, such as sparse matrix multiplication [50], and in other disciplines such as archaeology [37], molecular biology [7] and genomics, where they played a key role in the mapping of the human genome [26, 36].

Unfortunately, finding the minimum number of edges to be added in a completion problem is NP-hard for most of the target classes of interest (see, e.g., the thesis of Mancini [42] for further discussion and references). To deal with this difficulty of computation, the domain has developed a number of approaches. This includes approximation [45], restricted input [8, 9, 12, 38, 39, 44], parameterization [13, 22, 35, 43, 54] and inclusion-minimal completions. In the latter approach, one does not ask for a completion having the minimum number of fill edges but only ask for a set of fill edges which is minimal for inclusion, i.e. which does not contain any proper subset of fill edges whose addition also results in a graph in the target class. This is the approach we follow here. In addition to the case of cographs [41], it has been followed for many other graph classes, including chordal graphs [29], interval graphs [20, 46], proper interval graphs [49], split graphs [30], comparability graphs [28] and permutation graphs [19].

The rationale behind the inclusion-minimal approach is that minimum-cardinality completions are in particular inclusion-minimal. Therefore, if one is able to sample111Usually, minimal completion algorithms are not fully deterministic. There are some choices to be made arbitrarily along the algorithm and different choices lead to different minimal completions.

efficiently the space of inclusion-minimal completions, one can compute several of them, pick the one of minimum cost and hope to get a value close to the optimal one. One of the reason of the success of inclusion-minimal completion algorithms is that this heuristic approach was shown to perform quite well in practice 

[4, 5]. The second reason of this success, which is a key point for the approach, is that it is usually possible to design algorithms of low complexity for the inclusion-minimal relaxation of completion problems.

1.0.1 Related work.

Modification problems into the class of cographs have already received a great amount of attention [27, 31, 32, 40, 41], as well as modification problems into some of its subclasses, such as quasi-threshold graphs [10] and threshold graphs [23]. One reason for this is that cographs are among the most widely studied graph classes. They have been discovered independently in many contexts [15] and they are known to admit very efficient algorithms for problems that are hard in general [11] . Moreover, very recently, cograph modification was shown a powerful approach to solve problems arising in complex networks analysis, e.g. community detection [34], inference of phylogenomics [32] and modelling [18]. The modification problem into the class of quasi-threshold graphs has also been used and it revealed that complex networks encountered in some contexts are actually very close to be quasi-threshold graphs [10], in the sense that only a few modifications are needed to transform them into quasi-threshold graphs. This growing need for treating real-world datasets, whose size is often huge, asks for more efficient algorithms both with regard to the running time and with regard to the quality (number of modifications) of the solution returned by the algorithm.

1.0.2 Our results.

Our main contribution is to design two algorithms for inclusion-minimal cograph completion. The first one (Section 4) is an improvement of the incremental algorithm in [41]. It runs in the same complexity, where is the number of edges in the completed graph, and is in addition able to select one minimum-cardinality completion of the neighbourhood of the new incoming vertex at each incremental step of the algorithm, which is an open question in [41] (Question 3 in the conclusion) which we positively answer here. It must be clear that this does not guarantee that the completion computed at the end of the algorithm has minimum cardinality but this feature is highly desirable in practice to obtain completions using as few fill edges as possible.

When this additional feature is not required, our second algorithm (Section 5) solves the inclusion-minimal problem in time, which only depends on the size of the input. Furthermore, we prove that many sparse graphs, namely those having mean degree fixed to a constant, require edges in any of their cograph completions. This result is worth of interest in itself and implies that, for such graphs, which have only edges, the improvement of the complexity we obtain with our second algorithm is quite significant : a factor .

2 Preliminaries

All graphs considered here are finite, undirected, simple and loopless. In the following, is a graph, (or ) is its vertex set and (or ) is its edge set. We use the notation , stands for the cardinality of and for the cardinality of . An edge between vertices and will be arbitrarily denoted by or . The neighbourhood of is denoted by (or ) and for a subset , we define . The subgraph of induced by some is denoted by .

For a rooted tree and a node , we denote , , and the parent and the set of children, ancestors and descendants of respectively, using the usual terminology and with belonging to and . The lowest common ancestor of two nodes and , denoted , is the lowest node in which is an ancestor of both and . The subtree of rooted at , denoted by , is the tree induced by node and all its descendants in . We use two other notions of subtree, which we call upper tree and extracted tree. The upper tree of a subset of nodes of is the tree, denoted , induced by the set of all the ancestors of the nodes of , i.e. . The tree extracted from in , denoted , is defined as the tree whose set of nodes is and whose parent relationship is the transitive reduction of the ancestor relationship in . More explicitly, for , is the parent of in iff is an ancestor of in and there exist no node such that is a strict ancestor of and a strict descendant of in .

Figure 1: Example of a labelled construction tree (left), the cograph it represents (centre), and the associated cotree (right). Some vertices are decorated in order to ease the reading.

2.0.1 Cographs.

One of their simpler definitions is that they are the graphs that do not admit the (path on vertices) as an induced subgraph. This shows that the class is hereditary, i.e., an induced subgraph of a cograph is also a cograph. Equivalently, they are the graphs obtained from a single vertex under the closure of the parallel composition and the series composition. The parallel composition of two graphs and is their disjoint union, i.e., the graph . The series composition of and is their disjoint union plus all possible edges between vertices of and vertices of , i.e., the graph . These operations can naturally be extended to an arbitrary finite number of graphs.

This gives a nice representation of a cograph by a tree whose leaves are the vertices of and whose internal nodes (non-leaf nodes) are labelled , for parallel, or , for series, corresponding to the operations used in the construction of . It is always possible to find such a labelled tree representing such that every internal node has at least two children, no two parallel nodes are adjacent in and no two series nodes are adjacent. This tree is unique [15] and is called the cotree of , see example in Fig. 1. Note that the subtree rooted at some node of cotree also defines a cograph, denoted , whose set of vertices is the set of leaves of , denoted in the following. The adjacencies between vertices of a cograph can easily be read on its cotree, in the following way.

Remark 1

Two vertices and of a cograph having cotree are adjacent iff the lowest common ancestor of leaves and in is a series node. Otherwise, if is a parallel node, and are not adjacent.

2.0.2 The incremental approach.

Our approach for computing a minimal cograph completion of an arbitrary graph is incremental, in the sense that we consider the vertices of one by one, in an arbitrary order , and at step we compute a minimal cograph completion of from a minimal cograph completion of , by adding only edges incident to . This is possible thanks to the following observation that is general to all hereditary graph classes that are also stable by addition of a universal vertex, which holds in particular for cographs.

Lemma 1 (see e.g. [46])

Let be an arbitrary graph and let be a minimal cograph completion of . Consider a new vertex adjacent to an arbitrary subset of vertices and denote and the graphs obtained by adding to and respectively. Then, there exists a subset of vertices such that is a cograph. Moreover, for any such set which is minimal for inclusion, is an inclusion-minimal cograph completion of . We call such completions (minimal) constrained completions of .

For any subset of vertices, we say that we fill in if we make all the vertices of adjacent to in the completion of . The edges added in a completion are called fill edges and the cost of the completion is its number of fill edges.

2.0.3 The new problem.

From now on, we consider the following problem, with slightly modified notations. is a cograph, and is the graph obtained by adding to a new vertex adjacent to some arbitrary subset of vertices of . Both our algorithms take as input the cotree of and the neighbourhood of the new vertex . They compute the set of neighbours of in some minimal constrained cograph completion of , i.e. obtained by adding only edges incident to (cf. Lemma 1). Then, the cotree of is updated under the insertion of with neighbourhood , in order to obtain the cotree of which will serve as input in the next incremental step.

We now introduce some definitions and characterisations we use in the following.

Definition 1 (Full, hollow, mixed)

Let be a cograph and let be a vertex to be inserted in with neighbourhood . A subset is full if , hollow if and mixed if is neither full nor hollow. When is full or hollow, we say that is uniform.

We use these notions for nodes of the cotree as well, referring to their associated set of vertices . We denote the subset of non-hollow children of a node .

Theorem 2.1 below gives a characterisation of the neighbourhood of a new vertex so that is a cograph.

Theorem 2.1 ([16, 17])

(Cf. Fig. 2) Let be a cograph with cotree and let be a vertex to be inserted in with neighbourhood . If the root of is mixed, then is a cograph iff there exists a mixed node of such that:

  1. all children of are uniform and

  2. for all vertices , iff is a series node.

Moreover, when such a node exists, it is unique and it is called the insertion node.

Remark 2

In all the rest of the article, we do not consider the case where the new vertex is adjacent to none of the vertices of or to all of them. Therefore, the root of the cotree of is always mixed wrt. .

Figure 2: Illustration of Theorem 2.1: characterisation of the neighbourhood of a new vertex so that is a cograph. The nodes and triangles in black (resp. white) correspond to the parts of the tree that are full wrt. (resp. hollow wrt. ). The insertion node , which is mixed, appears in grey colour.

The reason for this is that the case where the root is uniform is straightforward: the only minimal completion of adds an empty set of edges and the update of cotree is very simple. By definition, inserting in with its neighbourhood in some constrained cograph completion of results in a cograph, namely . Therefore, to any such completion we can associate one insertion node which is uniquely defined, from Theorem 2.1 and from the restriction stated in Remark 2.

Definition 2

Let be a cograph with cotree and let be a vertex to be inserted in . A node of is called a completion-minimal insertion node iff there exists a minimal constrained completion of such that is the insertion node associated to .

From now and until the end of the article, is a cograph, is its cotree, is a vertex to be inserted in and we consider only constrained cograph completions of . We therefore omit to systematically precise it.

3 Characterisation of minimal constrained completions

The goal of this section is to give necessary and sufficient conditions for a node of to be a completion-minimal insertion node. From Theorem 2.1, the subtrees attached to the parallel strict ancestors of the insertion node must be hollow. As we can modify the neighbourhood of only by adding edges, it follows that if is the insertion node of some completion, then is eligible, as defined below.

Definition 3 (eligible)

A node of is eligible iff for all the strict ancestors of that are parallel nodes, all the children of distinct from its unique child are hollow.

When a node is eligible, there is a natural way to obtain a completion of the neighbourhood of , which we call the completion anchored at .

Definition 4 (Completion anchored at )

Let be an eligible node of . The completion anchored at is the one obtained by making adjacent to all the vertices of whose lowest common ancestor with is a series node and by filling all the children of that are non-hollow.

The completion anchored at some eligible node may not be minimal but, on the other hand, all minimal completions are completions anchored at some eligible node , namely the insertion node of .

Lemma 2

For any completion-minimal insertion node of , there exists a unique minimal completion of such that is the insertion node associated to and this unique completion is the completion anchored at .

Proof

First, note that the modified neighbourhood of in is given by Theorem 2.1 and is the same for every completion having as insertion node. Moreover, as in any such completion, the children of in are uniform, then any non-hollow child of must be filled. Then, the completion defined by the modified neighbourhood of is included in every completion having as insertion node. As there exists some minimal completion having as insertion node, then from Theorem 2.1, is left mixed after completion and so has some hollow child with regard to . Consequently, is also mixed with regard to . Finally, since the insertion of with neighbourhood satisfies conditions 1 and 2 of Theorem 2.1, then the completion has as insertion node. And since is included in all such completions, it follows that is the unique minimal completion having as insertion node.

To characterise completion-minimal insertion nodes, we will use the notion of forced nodes. Their main property (see Lemma 4 below) is that they are full in any completion of .

Definition 5 (Completion-forced)

Let be a cograph with cotree and let be a vertex to be inserted in . A completion-forced (or simply forced) node is inductively defined as a node satisfying at least one of the three following conditions:

  1. is full, or

  2. is a parallel node with all its children non-hollow, or

  3. is a series node with all its children completion-forced.

Lemma 3

Let be a cograph with cotree and let be a vertex to be inserted in . A node of is completion-forced iff there exists a unique cograph completion of , which is the one where all missing edges between and are added.

Proof

Let us show the result by induction on . First, consider a completion-forced node of and a completion of . If satisfies Condition 3 of Definition 5, then, by induction hypothesis, all its children are full in (as is also a cograph completion of , for any child of ) and so is . If satisfies Condition 1, then since is full before completion, it is also full after. Consider now the case where is completion-forced because it satisfies Condition 2 of Definition 5, i.e. is parallel and all its children are non-hollow.

Assume for contradiction that does not fill . Then, denote the insertion node associated to in . Theorem 2.1 implies that is eligible, and since all the children of are non hollow, it follows that is not a strict descendant of . Consequently, and since all the children of are non hollow, Lemma 2 implies that fills all of them, and so fills as well: contradiction. Thus, is filled in any completion of and therefore, there exists a unique such completion.

Conversely, consider a non-completion-forced node of . If is a series node, then has at least one non-completion-forced child . By induction hypothesis, there exists a completion of that does not fill . Then, the completion of that coincides with on and that fills all the other children of is a cograph completion of that does not fill . Now, if is a parallel node, then has at least one hollow child . As is clearly eligible in , the cograph completion anchored at is properly defined. Since leaves hollow, then does not fill , which achieves the proof.

Lemma 4

Any completion-forced node of is filled in all the completions of .

Proof

This is a direct consequence of Lemma 3. Indeed, any completion of restricted to is a completion of . Moreover, from Lemma 3, there exists a unique cograph completion of and this completion makes full.

The next remark directly follows from Theorem 2.1 and Lemma 2.

Remark 3

The insertion node of any minimal completion of has at least one hollow child and at least one non-hollow child. Therefore, is non-hollow and non-completion-forced.

We now characterise the nodes that contain some minimal-insertion node in their subtree (including itself). In our algorithms, we will use this characterisation to decide whether we have to explore the subtree of a given node.

Lemma 5

For any node of , contains some completion-minimal insertion node iff is eligible, non-hollow and non-completion-forced.

Proof

If is eligible non-hollow and non-completion-forced, consider such a node of which is lower possible in . If is a series node, as is eligible so are all its children. It follows that all the children of are either completion-forced or hollow. Since is non-completion-forced, at least one of its children is hollow and since is non-hollow at least one of its children is non-hollow. The same holds if is a parallel node: since is non-completion-forced, at least one of its children is hollow and since is non-hollow at least one of its children is non-hollow. Then, in both cases, in the completion anchored at , is mixed and so is . Consequently, there exists a minimal completion included in and necessarily is mixed in as well. From Theorem 2.1, it is straightforward to see that all minimal completions having an insertion node out of leaves full or hollow. It follows that the insertion node associated to belongs to .

Now, conversely, if there exists which is a completion-minimal insertion node, let us denote the minimal completion anchored at . From Remark 3, is non-hollow in , and so is . Moreover, from Theorem 2.1, it is straightforward to see that is eligible and so is . From Theorem 2.1 again, is mixed in and so is . Then, Lemma 4 implies that is non-completion-forced, which achieves the proof of the lemma.

Lemma 6 below gives additional conditions for itself to be an insertion node.

Lemma 6

A node of is a completion-minimal insertion node iff is eligible, non-hollow and non-completion-forced and satisfies in addition one of the two following conditions:

  1. is a series node and has at least one hollow child, or

  2. is a parallel node and has no eligible non-completion-forced child.

Proof

We first show that if the conditions of the lemma are satisfied, then is a completion-minimal insertion node. From Lemma 2, if is a completion-minimal insertion node, then there exists a unique minimal completion such that is the insertion node associated to this completion. From Lemma 2 again, this completion is the completion anchored at , which is properly defined here as is eligible, see Definition 4. We will now show that is minimal.

If is a parallel node, as is non-completion-forced, has at least one hollow child , and the same holds if is a series node because of Condition 1. From Definition 4, is hollow in . Let be a minimal completion of and let be its insertion node . We will show that is not strictly included in . From Lemma 2, if , then and therefore, from now, we consider only the case where . Note that, from Theorem 2.1, the only nodes of that remain mixed after completion into are the ancestors of . All the non-hollow nodes of that are not ancestors of are filled in . Then, if is not a descendant of , node is filled in and so is node . It follows that, if is not a descendant of , is not included in .

Now, consider the case where is a strict descendant of (remember that ) and suppose for contradiction that is a parallel node. Lemma 5 implies that is eligible. Since is a strict descendant of , then all the children of , except its child that is an ancestor of , are hollow. Then, from Condition 2 of the present lemma, it follows that must be completion-forced. Lemma 4 implies that , and so , is filled in . This contradicts the fact that is the insertion node, as from Theorem 2.1, this node remains mixed after completion. Thus, is not a parallel node, but a series node. From Remark 3, is non-hollow in and consequently, is not a descendant of (the hollow child of ). Since is a series node, it follows that is filled in , which is therefore not included in . This achieves the proof that the conditions of the lemma are sufficient.

Let us now show that they are necessary. Consider a completion-minimal node and let us show that it satisfies the conditions of the lemma. Firstly, because contains some completion-minimal insertion node, namely , Lemma 5 implies that is mixed, eligible and non-completion-forced. Let be the completion anchored at . From Theorem 2.1, is mixed in . Then, from Lemma 2, it follows that has at least one hollow child. Condition 1 is satisfied.

We now show that if is parallel and does not satisfy Condition 2, then the completion anchored at is not minimal, which implies that is not a completion-minimal insertion node. Since is mixed, it has at least one non-hollow child . Moreover, since does not satisfy Condition 2, is the unique non-hollow child of (then is eligible) and is non-completion-forced. As is eligible, non-hollow and non-completion-forced, it follows from Lemma 5 that contains some completion-minimal insertion node. The corresponding minimal completion is included in and even strictly included as leaves mixed, while fills it (since is not hollow). Thus, is not minimal. By contraposition, if is minimal, Condition 2 is satisfied. This achieves the proof of the lemma.

4 An algorithm with incremental minimum

In this section, we design an incremental algorithm whose overall time complexity is , where is the number of edges in the output completed cograph. We concentrate on one incremental step, whose input is the cotree of some cograph (the completion computed so far) and a new vertex together with the list of its neighbours . Each node stores its number of children and the number of leaves in . One incremental step takes time , where is the degree of in the completion of computed by the algorithm. Within this complexity, our algorithm scans all the minimal completions of the neighbourhood of and select one of minimum cardinality. Our description is in two steps.

4.0.1 First step: collecting information on nodes of .

In this step, for each non-hollow node of we determine the following information: i) the list of its non-hollow children , ii) the number of neighbours of in and iii) whether it is completion forced or not. To this purpose, we perform two bottom-up searches of from the leaves of that are in up until the root of . Consequently, each of these searches discovers exactly the set of non-hollow nodes of (for which we show later that their number is ).

In the first search, we label each node encountered as non-hollow, we build the list of its non-hollow children and count them. The nodes that are not visited, and therefore not labelled are exactly the hollow nodes of .

In the second search, for each non-hollow node we determine the rest of its information, that is ii) the number of neighbours of in and iii) whether it is completion forced or not.

It is straightforward to get this information for the leaves of that belong to : there is exactly one neighbour of in and is forced. Then, all the leaves in forward their information to their parents in an asynchronous way. Along this process, each non-hollow node of is able to know whether it has received the information from all its non-hollow children, as we determined their number in the first search. When it happens, when has received the information from all its non-hollow children, is able to determine its own information: makes the sum of for all its non-hollow children , and determines whether it is completion-forced as follows. If is parallel, then is completion-forced iff all its children are non-hollow, and if is series, then is completion-forced iff all its children are completion-forced. Then, forwards its information to its parent and the process goes on until the root of the tree itself has determined its information. At that time, the process ends as all the non-hollow nodes of have already determined their information.

4.0.2 Second step: finding all completion-minimal insertion nodes of .

We search the set of all non-hollow, eligible and non-completion-forced nodes of . For each of them, we determine whether it is a minimal insertion node and, in the positive, we compute the number of edges to be added in its associated minimal completion. Then, at the end of the search we select the completion of minimum cardinality.

Since, all the ancestors of a non-hollow eligible non-completion-forced node also satisfy these three properties, it follows that the part of we have to search is a connected subset of nodes containing the root. Then, our search starts by determining whether the root is non-completion-forced. In the negative, we are done: there exists one unique minimal completion of which is obtained by adding all missing edges between and the vertices of .

Otherwise, if the root is non-completion-forced (it is always eligible, by definition, and non-hollow, from Remark 2), we start our search. For all the non-hollow children of the current node (we built their list in the first step), we check whether they are eligible and non-completion-forced and search, in a depth-first manner, the subtrees of those for which the test is positive (cf. Lemma 5).

During this depth-first search, we compute for each node encountered the number of edges, denoted , to be added between and the vertices of in the completion anchored at . This can be computed during the search as follows:

  • if the parent of is a parallel node (necessarily eligible, since we parse only eligible nodes), then ; and

  • if the parent of is a series node, then .

We also determine whether is a minimal insertion node by testing whether it satisfies Condition 1 or 2 of Lemma 6. This can be done thanks to the information collected in the first step. Importantly for the complexity, note that Condition 2 of Lemma 6 can be tested by scanning only the non-hollow children of . In the positive, if is a minimal insertion node, then we determine the number of edges, denoted , to be added in the completion anchored at , as .

From Lemma 6, minimal insertion nodes are non-hollow, eligible and non-completion-forced. Therefore, our search discovers all the completion-minimal insertion nodes, and computes the cost of their associated minimal completion. We keep track of the minimum cost completion encountered during the search and outputs the corresponding insertion node at the end. Finally, we need to update the cotree for the next incremental step of the algorithm (as depicted in Figure 3). To this purpose, we use the algorithm of [16] as explained below.

4.0.3 Complexity.

The key of the time complexity is that we search and manipulate only the set of non-hollow nodes of . For each of them , we need to scan the list of its non-hollow children and to perform a constant number of tests and operations that can all be done in time (thanks to the information collected in the first step). For example, when we need to test the number of hollow children of we avoid to count them by computing their number as . The computation of can also be done in time by noting that the sum can rather be computed as . Therefore, treating a node takes time and the execution of the two steps of the algorithm takes time.

Furthermore, as shown in [41], we have , where is the cardinality of the completed neighbourhood of . Indeed, from Theorem 2.1, all non-hollow nodes are filled except the ancestors of the insertion node . Let be a non-hollow child of one ancestor of , then is filled and it follows that the sum of the sizes of for all such is bounded by . The number of ancestors of is also linearly bounded by as half of these ancestors are series and therefore have a child which is filled.

When, the insertion node has been determined, the completed neighbourhood of can be computed in extension by a search of the part of that is filled, which takes time. Then, the cotree of the completion of is obtained from the cotree of (as depicted in Figure 3) in the same time complexity thanks to the algorithm of [16]. Overall, one incremental step takes time and the whole running time of the algorithm is , where is the number of edges in the output cograph.

5 An algorithm

Even though it is linear in the number of edges in the output cograph, the complexity achieved by the algorithm in [41] and the one we presented in Section 4 is not necessarily optimal, as the output cograph can actually be represented in space using its cotree. We then design a refined version of the inclusion-minimal completion algorithm that runs in time, when no additional condition is required on the completion output at each incremental step. This improvement is further motivated by the fact that, as we show below, there exist graphs having only edges and which require edges in any of their cograph completions. For such graphs, the new complexity we achieve also writes (since ) and constitutes a significant improvement over the complexity of the previous algorithm (since ).

5.1 Worst-case minimum-cardinality completion of very sparse graphs

In this section, we show that there exist graphs that have only edges and that require edges in any of their cograph completions. Actually, we show that this even holds in the more general case where the target graph class has bounded rank-width (see [47] for a definition), which includes the class of cographs as well as the class of distance hereditary graphs (see [52] for a definition). Furthermore, although it is not necessary for the purpose of this article, we also show that the same behaviour occurs for chordal completion, as we believe that this fact is interesting in itself. Our proofs are based on the notion of vertex expander graphs (see [33] for a survey on the topic). We first show that these graphs require edges in any of their cograph completions, as stated by Theorem 5.1 below, and we conclude by pointing out that there exist constructions of vertex expander graphs with only edges.

Definition 6 (Vertex expander)

A graph is a -expander if, for every vertex subset with we have .

In our proof of Theorem 5.1, we will use the fact that cographs are graphs of bounded rank-width, for which we have Proposition 1 below. Roughly speaking, it states that if a graph has rank-width at most , then there exists a cut of of rank at most such that both parts of the cut are large.

Proposition 1 ([48])

Let be an integer and let be a graph whose rank-width is at most . Then there exists a subset of vertices, such that and .

We remark that Proposition 1 is stated by Oum and Seymour [48] in terms of symmetric submodular functions. Also see [47] for definitions of rank-width and cutrank. We will need the following proposition which shows that if a cut of a graph has a small rank, say , then there can be only a small number222More explicitly, this number is bounded by a quantity depending only on . of equivalence classes of vertices in according to their neighbourhood in .

Proposition 2 ([53])

Let be a graph and be a vertex set such that . Then there exists a partition , such that for every and pair of vertices in , .

We are now ready to state and prove Theorem 5.1, regarding completions in graph classes of bounded rank-width.

Theorem 5.1

Let be a positive real number and be a positive integer. Let also be a -expander and be a class of graphs whose rank-width is at most . Then, there exists a positive real number , depending only on and , such that any completion of into a graph in has at least edges.

Proof

Let be a completion of into a graph in . Since is a supergraph of , it follows immediately from the definition that is a -expander. Moreover, since has rank-width at most , from Propositions 1 and 2, there exists a subset of vertices, such that and there exists a partition , with , such that for every and any pair of vertices in , . Assume, without loss of generality, that the ’s are ordered by increasing cardinality. We denote .

If , then we have and so , which gives . And since the ’s are ordered by increasing size, we conclude that the inequality holds for all indices: for all , we have .

In the complement case, i.e. if , then consider the largest index such that . Note that necessarily we have . We now prove that , where the hidden factor depends only on and . By definition of , we have . This gives . On the other hand, because the ’s are ordered by increasing cardinality, we have that . By injecting this inequality in the one above we obtain and so .

As a partial conclusion, we have either (i) for all , , or (ii) there exists such that and for all , we have (because the ’s are ordered by increasing cardinality). Beside this, because of the expansion property of , we have , meaning that there are at least vertices out of that are adjacent to at least one vertex of . Moreover, note that from the definition of the ’s, we have that if a vertex is adjacent to some vertex , for some , then is adjacent to all the vertices of . In case (i) of the alternative above, where for all , we obtain that there must be at least edges between and in graph . Thus, in this case, because , the conclusion of the theorem holds.

In the other case, i.e. case (ii) of the alternative above, we have for some and for all , . The expansion property applied to gives . Since , we have . Observe that because , we have and consequently . Moreover, each of the vertices in is adjacent to all the vertices of for some . And since , we obtain that there are at least edges between and in graph (because ), which achieves the proof of the theorem.

Remark 1

The result of Theorem 5.1 holds in particular for cographs and distance hereditary graphs, which both have rank-width at most .

It is also worth noting that in the particular cases of cographs and distance hereditary graphs, the proof above can be greatly simplified as follows. For a cut of rank at most , all the vertices of having some neighbour in have exactly the same neighbours in . This corresponds to the fact that there are at most equivalence classes in Proposition 2 (): the vertices of that have some neighbour in and those that do not have any. Moreover, the expansion property for and for (remind that from Proposition 1 we have ) implies that the numbers of vertices in and in that have some neighbour on the other side of the cut are both , which proves the statement of Theorem 5.1.

The results above hold for any input graph that is a -expander. Nevertheless, in order to achieve our goal, we still need the existence of very sparse -expanders. This has already been established as there exist deterministic constructions of very sparse graphs that are -expanders, see for example the construction of -regular -expanders by Alon and Boppana [1], for some fixed . Such graphs have only edges but, from Theorem 5.1, require edges in any of their cograph completions (as well as in any of their completions in a graph class of bounded rank-width). More generally, it is part of the folklore that, for any constant , there exist and such that, for any sufficiently large, the proportion of graphs on vertices and edges that are -expanders is at least . This means that many graphs of fixed mean degree have the vertex expansion property and therefore require edges in any of their cograph completions. Motivated by this frequent worst-case for the complexity, we will design an -time algorithm for inclusion-minimal cograph completion of arbitrary graphs.

5.1.1 A similar behaviour for chordal completion.

The fact that some very sparse graphs, having edges, may require edges in any of their completions also occurs for other target classes, whose rank-width is unbounded. In particular, we now show that the very popular chordal completion problem also exhibits such a behaviour, which we believe is worth of interest in itself, though unnecessary for the strict purpose of this article. Our proof is as previously based on vertex expander graphs, for which we have the following result.

Proposition 3 ([24])

If is a -expander for a constant independent of , then the treewidth of is .

In addition, it is well known (see [2]) that the treewidth of a graph is the minimum size (minus ) of the maximum clique among all chordal completions of . Consequently, Proposition 3 immediately gives an lower bound on the number of edges in any chordal completion of a -expander , since must have a clique of size . To conclude, remind that, as mentioned above, there exist constructions, both deterministic and random, of -expanders having only edges.

We now turn to the description of our -time algorithm for inclusion-minimal cograph completion.

5.2 Data structure

Our data structure is composed of two copies of the cotree: one stored in a basic data structure and one using the advanced dynamic data structure of [51] named dynamic trees. We note that we could use only the advanced data structure of [51], as it can be patched to contain the additional information that we store in the basic data structure. But to avoid questions about the compatibility of such a patch with the performances of the data structure of [51], we prefer to store the additional information we need, and to perform the related operations, independently in another structure. This is the reason why we describe our algorithm using two structures.

In the first copy of (the basic data structure), each node stores its parent, the list of the children of and their number , as well as a bidirectional couple of pointers to the corresponding node of in the second copy of , so that we can move from one element in one copy of the cotree to the same element in the other copy in time. In addition, we enhance this basic data structure storing the cotree with one additional feature: given a node and two of its children , this feature allows us to determine which of appears first in the list of children of in time. To this purpose, the set of children of a node is not only stored in a doubly linked list, as in the classical version of the tree, but a copy of this list is also stored using the order data structure of [3, 21]. This data structure allows to answer order queries, i.e. which of two given elements of the list precedes the other one, and supports two update operations, insert and delete. The delete operation removes a given element from the order data structure while the insert operation insert a new element in the order data structure just after a specified element. The order query and the two update operations all take worst-case time.

Dynamic trees [51]

In addition to the classical data structure described above, we also use the data structure developed in [51] to store a copy of the cotree and maintain it at each incremental step. This data-structure maintains a dynamic forest rather than a single tree. This will be useful for us as we will cut a part of the cotree and attach it to another node during the update of the cotree under the insertion of a new vertex. The dynamic trees of [51] allow to answer the two following kinds of query:

lowest-common-ancestor?

Given two nodes and of , provide the lowest common ancestor of and .

next-step-to-descendant?

Given a node of and one of its strict descendants , provide the (unique) child of which is an ancestor of .

These two kinds of query are handled in worst-case time in the data structure of Sleator and Tarjan [51]. To be precise, the second operation is not described in [51], but it can be obtained as a combination of other operations they provide. Indeed, their data structure also supports, in the same complexity:

  • an update operation called evert which, given a vertex of , makes become the root of , and

  • a query operation named root? that provides the root of the tree to which node belongs.

Then, the query next-step-to-descendant? we use here can be resolved by the sequence of operations (two updates and two queries): root?, evert, parent?, evert, which takes time.

Along our incremental algorithm, we need to maintain the dynamic data structure of [51], which can be done thanks to the following update operations:

cut.

Given a node in a tree of the forest such that is not the root of , remove the edge between and . Then, becomes the root of its new tree in forest .

link.

Given a node in a tree of the forest such that is not the root of and given the root of a tree , make the parent of .

Note that operations cut and link are converse of each other. As for queries, all update operations takes worst-case time.

5.3 Algorithm

Our algorithm determines the set of the nodes that are simultaneously eligible, non-hollow and non-completion-forced and that are minimal for the ancestor relationship among nodes having these three properties (i.e. none of their descendants satisfies the considered property). Then, it picks any of them to be the insertion node of the minimal completion returned at this incremental step. Indeed, since nodes of satisfy the conditions of Lemma 5 and none of their children does (because nodes of are minimal for the ancestor relationship), it follows that nodes of are completion-minimal insertion nodes. In order to get the improved complexity, we avoid to completely search the upper tree to determine . Instead, we use a limited number of lowest-common-Ancestor? queries.

Clearly, if a parallel node of is the of two leaves in then contains no eligible node. Let be the set of parallel common ancestors of vertices of that are maximal for the ancestor relationship and let us denote , where is the set of vertices of that are not descendant of any node in , i.e. . Note that all the nodes are eligible, and so are their ancestors. It follows that the set that we want to compute is the set of the non-completion-forced nodes in the upper tree that are minimal for the ancestor relationship (i.e. none of their descendants in are not completion-forced).

5.3.1 Finding an inclusion-minimal insertion node.

In order to compute , we start by computing the tree extracted from (see Section 2) the leaves that belong to and the set of their lowest common ancestors, i.e. nodes such that for some leaves . Then, we search to find its parallel nodes that are maximal for the ancestor relationship and we remove their strict descendants. The leaves of the resulting tree are exactly nodes of . Finally, for each node we determine its lowest non-completion-forced ancestor in and we keep only the ’s that are minimal for the ancestor relationship: this is the set . It is worth noting from the beginning that since has exactly leaves and since all its internal nodes have degree at least , then the size of is .

Let us now show how to compute in time. To this purpose, we sort the neighbours of according to a special order of the vertices of the cograph called a factorising permutation [14]. A factorising permutation is the order in which the vertices of (which are the leaves of the cotree) are encountered when performing a depth-first search of the cotree . There are as many different factorising permutations as different depth-first search of . Here, we use the factorising permutation which is obtained by visiting the children of one node of in the order they are stored in the list of the children of used in the implementation of the cotree. To determine whether a vertex is before or after a vertex in the factorising permutation , we can proceed as follows: 1) find and find the two children and of that are respectively the ancestor of and , and 2) determine whether is before or after in the list of children of . Operation 1) can be executed in time thanks to the data structure of [51] by performing one lowest-common-ancestor? query and two next-step-to-descendant? queries. Operation 2) can be executed in time using the order data structure of [3, 21]. Then, comparing the order of occurrence of two vertices and in takes time and totally, sorting all the neighbours of respectively to order takes time.

The benefit of doing so is that, once the neighbours of are sorted in the order in which they appear in (we say from left to right), we can build efficiently. We consider the neighbours of one by one in this order and at each step we compute the tree extracted from and their lowest common ancestors. Then, at the end of the computation, when , we obtain . For each between and , we obtain from as follows: we compute and we insert it at its correct position in the tree built so far.

Note that, since we consider the ’s from left to right in the order of the factorising permutation , the newly computed common ancestor is the only node that may be in but not in . Moreover, for the same reason, if is not yet a node of then has to be inserted on the rightmost branch of the tree , and if is already a node of then already belongs to this branch, and so we discover it when we try to insert it on this branch. In order to do so, we climb up the rightmost branch of , starting from the father of , and for each node encountered on this branch we determine whether is higher or lower than in the tree (or eventually equal) by computing . The total number of comparisons (treated by queries) made along the computation of is . Indeed, as explained in [25], every time we pass above a node on the rightmost branch, leaves the rightmost branch for ever and will then never participate again to any comparison. Then, the total number of queries we need to built (including the queries made on the pairs of neighbours of appearing consecutively in the order of the factorising permutation) is proportional to its size, that is . Since each of these queries takes time thanks to the data structure of [51], the complexity of building from the sorted list of neighbours of is .

Once