One of the most widely used operation in graph algorithms is the neighbourhood query: given a vertex of a graph , one wants to obtain the list of neighbours of in . The classical data structure that allows to do so is the adjacency lists. It stores a graph in space, where is the number of vertices of and its number of edges, and answers a neighbourhood query on any vertex in time, where is the degree of vertex . This time complexity is optimal, as long as one wants to produce the list of neighbours of . On the other hand, in the last decades, huge amounts of data organized in the form of graphs or networks have appeared in many contexts such as genomic, biology, physics, linguistics, computer science, transportation and industry. In the same time, the need, for industrials and academics, to algorithmically treat this data in order to extract relevant information has grown in the same proportions. For these applications dealing with very large graphs, a space complexity of is often very limiting. Therefore, as pointed out by Turan (1984), finding compact representations of a graph providing optimal time neighbourhood queries is a crucial issue in practice. Such representations allow to store the graph entirely in memory while preserving the complexity of algorithms using neighbourhood queries. The conjunction of these two advantages has great impact on the running time of algorithms managing large amount of data.
One possible way to store a graph in a very compact way and preserve the complexity of neighbourhood queries is to find an order on the vertices of such that the neighbourhood of each vertex of is an interval in . In this way, one can store the order on the vertices of and assign two pointers to each vertex: one toward its first neighbour in and one toward its last neighbour in . Therefore, one can answer adjacency queries on vertex simply by listing the vertices appearing in between its first and last pointer. It must be clear that such an order on the vertices of does not exist for all graphs . Nevertheless, this idea turns out to be quite efficient in practice and some compression techniques are precisely based on it Apostolico and Drovandi (2009); Boldi and Vigna (2004, 2005); Boldi et al. (2009); Maserrat and Pei (2010): they try to find orders on the vertices that group the neighbourhoods together, as much as possible.
Then, a natural way to relax the constraints of the problem so that it admits a solution for a larger class of graphs is to allow the neighbourhood of each vertex to be split in at most intervals in order . The minimum value of which makes possible to encode the graph in this way is a parameter called contiguity Goldberg et al. (1995) and denoted by . Another natural way of generalization is to use at most orders on the vertices of such that the neighbourhood of each vertex is the union of exactly one interval taken in each of the orders. This defines a parameter called the linearity of Crespelle and Gambette (2009), denoted . The additional flexibility offered by linearity (using orders instead of just ) results in a greater power of encoding, in the sense that if a graph admits an encoding by contiguity , using one linear order and at most intervals for each vertex, it is straightforward to obtain an encoding of by linearity : take copies of and assign to each vertex one of its intervals in each of the copies of .
As one can expect, this greater power of encoding requires an extra cost: the size of an encoding by linearity , which uses orders, is greater than the size of an encoding by contiguity , which uses only order. Nevertheless, very interestingly, the sizes of these two encodings are equivalent up to a multiplicative constant. Indeed, storing an encoding by contiguity requires to store a linear ordering of the vertices of , i.e. a list of integers, and the bounds of each of the intervals for each vertex, i.e. integers, the total size of the encoding being integers. On the other hand, the linearity encoding also requires to store integers for the bounds of the intervals of each vertex, but it needs linear orderings of the vertices instead of just one, that is integers. Thus, the total size of an encoding by linearity is integers instead of for contiguity and therefore the two encodings have equivalent sizes.
Then the question naturally arises to know whether there are some graphs for which the linearity is significantly less than the contiguity. More formally, does there exist some graph family for which the linearity is asymptotically negligible in front of the contiguity? Or are these two parameters equivalent up to a multiplicative constant? This is the question we address here. Our results show that linearity is strictly more powerful than contiguity.
Related work. Only little is known about contiguity and linearity of graphs. In the context of matrices, Goldberg et al. (1995); Wang et al. (2007) studied closed contiguity and showed that deciding whether an arbitrary graph has closed contiguity at most is NP-complete for any fixed . For arbitrary graphs again, Gavoille and Peleg (1999) (Corollary 3.4) gave an upper bound on the value of closed contiguity which is . Regarding graphs with bounded contiguity or linearity, only the class of graphs having contiguity , or equivalently linearity , has been characterized, as being the class of proper (or unit) interval graphs Roberts (1968). For interval graphs and permutation graphs, Crespelle and Gambette (2009) showed that both contiguity and linearity can be up to . For cographs, a subclass of permutation graphs, Crespelle and Gambette (2014) showed that the contiguity can even been up to and is always , implying that both bounds are tight. The upper bound consequently applies for the linearity (of cographs) as well, but Crespelle and Gambette (2014) only provides an lower bound. Finally, let us mention for the sake of completeness that Crespelle and Gambette (2013) gave an algorithm that computes a constant ratio approximation of the contiguity of a cograph, as well as a corresponding encoding, in linear time.
Our results. Our main result (Theorem 2) is to exhibit a family of graphs , , such that the linearity of is asymptotically negligible in front of the contiguity of . In order to do so, we prove (Theorem 1) that the linearity of a cograph on vertices is always . It turns out that this bound is tight, as it matches the previously known lower bound on the worst-case linearity of a cograph Crespelle and Gambette (2014).
Outline of the paper. Section 2 gives necessary background on the notions used throughout the article. Section 3 proves the key technical statement of our work, showing that the linearity of a cograph is dominated by the maximal height of a certain type of tree, called double factorial tree, included in its cotree. From there, Section 4 derives our main results: the tight upper bound on the linearity of cographs and the construction of a subfamily of cographs for which the linearity is negligible in front of the contiguity.
All graphs considered here are finite, undirected, simple and loopless. In the following, is a graph, (or ) is its vertex set and (or ) is its edge set. We use the notation and stands for the cardinality of . An edge between vertices and will be arbitrarily denoted by or . The (open) neighbourhood of is denoted by (or ) and its closed neighbourhood by . The subgraph of induced by the set of vertices is denoted by .
For a rooted tree and a node , the depth of in is the number of edges in the path from the root of to (the root has depth ). The height of , denoted by , is the greatest depth of its leaves. We employ the usual terminology for children, father, ancestors and descendants of a node in (the two later notions including itself), and denote by the set of children of . The subtree of rooted at , denoted by , is the tree induced by node and all its descendants in . A monotonic path of a rooted tree is a path such that there exists some node such that all nodes of are ancestors of . The unique node of which has no parent in is called the root of the monotonic path.
In the following, the notion of minors of rooted trees is central. This is a special case of minors of graphs (see e.g. Lovász (2006)), for which we give a simplified definition in the context of rooted trees. The contraction of edge in a rooted tree , where is the parent of , consists in removing from and assigning its children (if any) to node .
Definition 1 (Minor)
A rooted tree is a minor of a rooted tree if it can be obtained from by a sequence of edge contractions.
2.1 Linearity of graphs
There are actually two notions of linearity (as well as for contiguity, see Crespelle and Gambette (2014) for definitions) depending on whether one uses the open neighbourhood or closed neighbourhood .
Definition 2 (-line-model)
A closed -line-model (resp. open -line-model) of a graph is a tuple of linear orders on such that such that and (resp. ).
The closed linearity (resp. open linearity) of , denoted by (resp. ), is the minimum integer such that there exists a closed -line-model (resp. open -line-model) of .
In the definition of a -line-model, the set of vertices of the intervals assigned to a vertex are not necessarily disjoint. They are only required to cover the neighbourhood of while being included in it.
In the rest of the paper, we consider only closed linearity and closed contiguity. But, from Crespelle and Gambette (2014) and from the inequalities below, for both parameters, the closed notion and the open notion are equivalent. Therefore, the bounds we obtain here (which hold up to multiplicative constants) hold indifferently for open notions and closed notions.
For an arbitrary graph , we have the following inequalities: .
The first inequality comes from the fact that an open model can always be turned into a closed model having one additional order and where each vertex of is assigned a singleton interval of equal to . Conversely, one can transform a closed model into an open model by duplicating every order of the closed model into two copies and in the open model. Then, for each vertex , the interval assigned to in is the left part of the interval (i.e. vertices of the interval which are before ) assigned to in . And the interval assigned to in is the right part of its interval in .
Finally, we give two basic properties of linearity that we use in the following.
The linearity of an induced subgraph of a graph is at most equal to the linearity of itself.
Indeed, restricting a -line-model of a graph to a subset of its vertices results in a -line-model of .
The linearity of the disjoint union of a (finite) collection of graphs is the maximum of the linearities of the graphs in .
This comes from the fact that a model of can be built simply by appending the orders used for the models of the graphs in .
There are several characterizations of the class of cographs. They are often defined as the graphs that do not admit the (path on vertices) as induced subgraph. Equivalently, they are the graphs obtained from a single vertex under the closure of the parallel composition and the series composition. The parallel composition of two graphs and is the disjoint union of and , i.e., the graph . The series composition of two graphs and is the disjoint union of and plus all possible edges from a vertex of to one of , i.e., the graph . These operations can naturally be extended to a finite number of graphs.
This gives a very nice representation of a cograph by a tree whose leaves are the vertices of the graph and whose internal nodes (non-leaf nodes) are labelled , for parallel, or , for series, corresponding to the operations used in the construction of . It is always possible to find such a labelled tree representing such that every internal node has at least two children, no two parallel nodes are adjacent in and no two series nodes are adjacent. This tree is unique Corneil et al. (1981) and is called the cotree of . See the example on Figure 1. Note that the subtree rooted at some node of cotree also defines a cograph, denoted , and then is the set of leaves of . The adjacencies between vertices of a cograph can easily be read on its cotree, in the following way.
Two vertices and of a cograph having cotree are adjacent iff the least common ancestor of leaves and in is a series node. Otherwise, if is a parallel node, and are not adjacent.
Note that in all the paper, we abusively extend the notion of linearity to cotrees, referring to the linearity of their associated cograph.
2.3 Comparing power of encodings
For a graph encoding scheme and a graph , we denote the minimum size of an encoding of based on (there are in general, like here, different encodings based on the same encoding scheme and they do not have necessarily the same size, some being more efficient than others). We now give a formal definition for an encoding scheme to be strictly more powerful than another one.
Definition 3 (Strictly more powerful encoding)
Let and be two graph encoding schemes. We say that is at least as powerful as iff there exists such that for all graphs , . Moreover, we say that is strictly more powerful than iff is at least as powerful as and the converse is not true.
Note that, is not at least as powerful as iff there exists a series of graphs , , such that tends to infinity when tends to infinity. In the introduction, we showed that the encoding schemes and based on linearity and contiguity respectively are such that, for any graph on vertices, we have and . Since , this gives , showing that linearity is an encoding at least as powerful as contiguity according to Definition 3. In addition, the previous inequalities also imply that . Altogether, we obtain the following remark.
Linearity is an encoding at least as powerful as contiguity. Moreover, it is strictly more powerful iff there exists a series of graphs , , such that tends to infinity when tends to infinity.
3 Linearity of a cograph and factorial rank of its cotree
In this section, we show that the linearity of a cograph is upper bounded by the size of some maximal structure contained in its cotree, more precisely by the height of a maximal double factorial tree (defined below), which we call the factorial rank of a cotree. This result is interesting by itself as it provides a structural explanation of the difficulty of encoding a cograph by linearity. For our concern, the interesting point is that the number of leaves of a double factorial tree of height is . Combined with this fact, the result presented in this section (Lemma 2) will allow us to derive in next section the desired upper bound on the linearity of cographs. We start by some necessary definitions.
Definition 4 (Double factorial tree)
The double factorial tree of height is defined inductively as follows:
is the (unique) tree of height , i.e., the tree made of one single leaf node, and
for , is the tree whose root has children , whose subtrees are precisely .
Definition 5 (Factorial rank)
The factorial rank of a rooted tree (see example on Figure 2),
denoted , is the maximum height of a double factorial tree being a minor of , that is:
We extend the notion of factorial rank to a node in a tree , referring to the factorial rank of its subtree . The case where the children of node all have factorial rank strictly less than the one of will play a key role.
Definition 6 (Minimally of factorial rank )
Let be a node of a tree . If has factorial rank and if all the children of have factorial rank at most , we say that is minimally of factorial rank .
We are now ready to state the result of this section, which claims that the linearity of a cograph is linearly bounded by the factorial rank of its cotree.
Let be a cotree and let of factorial rank . Then, . Moreover, if and is minimally of factorial rank , then .
We prove the result by induction. For , the induction hypothesis is formulated as follows: “all nodes of factorial rank have linearity at most ; and all nodes which are minimally of factorial rank (i.e., whose children have factorial rank at most ) have linearity at most ”.
For the initialisation of our recursion, i.e. for , we must show that if has factorial rank , then , and that if is minimally of factorial rank , then .
Firstly, since every internal node of a cotree has at least two children, if has factorial rank , then is a leaf of or is an internal node having exactly two leaf children (in all other cases, we can find as a minor of ). Then, it is straightforward that .
Now consider a node which is minimally of factorial rank , that is has factorial rank and all its children have factorial rank at most . If is a parallel node, then, from Remark 3, its linearity is the maximum of the linearities of its children, which is in this case according to what precedes. Thus, we have . If is a series node, denote its children by . Since all the children of have factorial rank , as mentioned previously, they are either leaves of or internal nodes having exactly two leaf children. We consider the case where all of them are internal nodes having two leaf children and we denote the two leaf children of , for . We show that in this case, the linearity of is at most (and so ) by exhibiting a -line-model for . As, in the other cases, the graph is an induced subgraph of the graph we consider here, it follows from Remark 2 that its linearity is also at most 2 (and so ).
Arguments of this paragraph are illustrated on Figure 3. For and , we use the same order on the vertices of , defined as . For any , the interval associated to in is the set of vertices less or equal to in and the interval associated to in is the set of vertices greater or equal to in . In , the interval associated to is the set of vertices strictly greater than in and the interval associated to is the set of vertices strictly less than in .
We consider an integer such that is true, which means in particular that all nodes minimally of rank can be encoded using orders. We then show in two steps: first, we prove that any node of factorial rank (not necessarily minimally) can be encoded using one more order (i.e. orders instead of for nodes minimally of rank ), then we prove that adding again one more order (i.e. using orders), we can also encode any node which is minimally of factorial rank .
step: node of factorial rank .
In order to describe a -line-model of we need to distinguish different parts of (see illustration on Figure 4). Let be the subset of nodes of that have factorial rank . If is reduced to , then is minimally of factorial rank and the induction hypothesis allows to conclude without proving anything else. Otherwise, denote , where , the subset of nodes of that are minimal for the ancestor relationship (i.e., lowest in the cotree). By definition, these elements do not contain node and are incomparable for the ancestor relationship. Then, one can build a minor of , by a sequence of edge contractions, where the set of children of is exactly . It follows that , as otherwise would be of factorial rank . By definition again, all the children of the nodes of have factorial rank at most , and then the nodes of are minimally of rank . By induction hypothesis, it follows that for all , admits a -line-model for which we denote , with , its orders.
We denote the subtree of induced by the set of nodes (by definition, ). We also denote the set of nodes of whose parent is in . Nodes of have, by definition, rank at most and it follows from the induction hypothesis that they admit a -line-model. Then, for a node , we again denote , with , the orders of such a model. In addition , we use an arbitrary partition of the nodes of into monotonic paths such that for all , (see Figure 5). Partition naturally induces a generalised partition (some parts may be empty) of whose parts are denoted , with : is the subset of nodes of whose parent belongs to .
We can now describe the orders of the model we build for . Importantly, note that , , is a partition of . In our construction, will always be an interval of for all and all . Then, the description of is in two steps: we first give the order, denoted , in which the intervals of nodes appear in and then, for each , we give the order, denoted , in which the vertices of appear in this interval. The description of orders will be done by choosing a local order on the children of each node of . Then is defined as the unique order on respecting all the chosen local orders, i.e. such that for any , if and have the same parent and if comes before in the order chosen on children of , then all descendants of come before all descendants of in .
To fully describe the -line-model of , we must also assign to each vertex one interval of its neighbours in each of the orders of the model, in such a way that these intervals entirely cover the neighbourhood of . In order to help our analysis, we distinguish between the external neighbourhood of vertex , which is defined as (or equivalently , as ), where is the unique node of being an ancestor of leaf in , and its internal neighbourhood which is defined as . Our construction starts with the description of the first orders of the model, which we use to encode the majority of adjacencies of , and finishes with the description of order which is used to encode the remaining adjacencies.
For , the purpose of order is to satisfy the external neighbourhoods of vertices of for . It entirely succeeds to do so for and encodes only one part (out of the two parts that we distinguish in the following) of the external neighbourhoods of for nodes , the remaining part being encoded in . Then, for each , the internal neighbourhoods of vertices of are encoded in the remaining orders of . It is enough for , since they admit a -line-model by recursion hypothesis, but one order is missing for which is minimally of rank and is then only guaranteed to admit a -line-model by recursion hypothesis. Again, the missing order will be found in .
External neighbourhoods and choice of ’s. Let , in this paragraph, we define the order in which the intervals of vertices of appear in , for . If , the order we choose does not matter, any arbitrary order is suitable. However, if , the purpose of order is to satisfy the external adjacencies of the vertices of for any node (see Figure 5). In this case, as explained above, we define by choosing an order for the children of for each node of . If is an ancestor of and if is a parallel node, we choose an order for the children of such that the (unique) child of which is an ancestor of is the last child in the order (any such order being suitable). If is an ancestor of and is a series node, we choose an order such that the child of which is an ancestor of is the first child of the order (any such order being suitable). And finally, if is not an ancestor of , then any order on its children is suitable for . This way, the external neighbourhood of any vertex of is exactly the interval of formed by the vertices on the right of the interval of (containing the last vertex of ), and this is the interval assigned to in . Indeed, the vertices on the right of the interval of have a series least common ancestor with node and are therefore adjacent to all the vertices of , while the vertices on the left of the interval of have a parallel least common ancestor with node and are then non-adjacent to the vertices of (see example on Figure 6). As a conclusion of this paragraph, thanks to this choice of ’s, the external neighbourhood of all the vertices of , for all , is entirely encoded in order . Also note that the interval associated to the vertices of in , which is the same for all vertices of , contains the last vertex of order . We use this property in the step of the induction.
For a node , the situation is slightly more complicated and we consider two cases.
If the father of , denoted , is a parallel node, then, as previously, the external neighbourhood of vertices of is an interval of . Indeed, this external neighbourhood is exactly the set of leaves contained in the subtrees of the children of the series ancestors of (which are all strict ancestors of ) such that is not itself an ancestor of . But, as is an ancestor of , thanks to the order chosen above, this set of leaves is an interval containing the last element of . This interval is the one we associate, in , to all the vertices of .
If the father of , denoted , is a series node, then the external neighbourhood of vertices of is not an interval of but almost: it is the union of two intervals of . Let us distinguish three parts in the external neighbourhood of the vertices of . The first part, denoted , is the set of leaves descending from the children of the series nodes being strict ancestors of such that is not itself an ancestor of . As in the parallel case above, thanks to the choice we made for order , is an interval containing the last element of . The second part, denoted , is the set of leaves descending from the children of that come after in the order chosen for . Clearly, is an interval of and from the definition of , is exactly the interval of vertices, denoted , that are on the right of the interval of in . This interval is the one we associate to vertices of in . Note that it contains the last element of . The last part of the external neighbourhood of the vertices of is denoted and is made of the set of leaves descending from the children of that precede in the order chosen for . As , is an interval of , but this part of the external neighbourhood of the vertices of is not covered in . This will be done in the additional order .
Before we describe order , for the purpose of the step of the induction, note that again, the interval of external neighbours associated to any node , for any , contains the last vertex of order .
We now define the order used to build order , using the partition of into paths introduced earlier. To define , for any node , we use the same order on the children of as the one used for , with such that . This ensures that for any node whose parent is a series node of , the interval of external neighbours which was not covered in order (note that since then ) will also be an interval of . This is precisely the interval we assign to vertices of in , which is possible as their internal neighbourhood will be entirely satisfied in the first orders, as described below.
Internal neighbourhoods and choice of ’s. The orders used for the vertices of , with , in order , with , are chosen as follows.
For any node , with , and all ,
if , then we can take any arbitrary order for the vertices of . Indeed, in , the vertices of have already been assigned an interval made only of their external neighbours (see above), meaning that this interval does not contain any vertex of .
If , the order on the vertices of is and the interval associated to vertices of in is the same as the one associated to them in .
For a node and all , the order we choose for the vertices of depends on the path , with , of the partition to which belongs the father of .
If then we use any arbitrary order for the vertices of . Again, in , the vertices of have already been assigned an interval made only of their external neighbours (see above) and therefore it does not contain any vertex of .
If (resp. if ) then we use the order (resp. ), and the interval of associated to the vertices of is the same as the one associated to them in (resp. ).
In this way, for any and for any , since needs only orders to be encoded, all the internal adjacencies of vertices of have been covered by the intervals associated to them in orders , for and . For nodes , , the situation is the same: only orders, namely the orders for and , have been used to encode the internal neighbourhoods of . But unfortunately, since is minimally of factorial rank , the recursion hypothesis only guarantees that . Then, one more interval is needed to fully cover the internal neighbourhood of vertices of . For this, we use one additional order .
Actually, we already used order in what precedes, in order to cover the external neighbourhood of some vertices. To this purpose, we fixed the order in which the intervals of vertices of , for , appear in . But we still have the liberty of choosing the orders on the vertices of , for all . We use this possibility for each node : we choose the order on the vertices of in as being , the one which has not been used until now, and the interval associated to vertices of in is the same as the one associated to them in . This is possible as the external neighbourhood of vertices of has already been entirely satisfied before, in order .
Thus, using the orders described above, both the internal and the external neighbourhoods of the vertices of , for all , have been covered. Since is a partition of the vertices of , this proves that and this achieves the step of the induction. Also remember, as we use it in the step of the induction described below, that in the model we built for , for any vertex there exists an index such that the interval associated to in contains the last vertex of .
step: node minimally of factorial rank .
In order to finish the induction step and then the proof of Lemma 2, we now show that for a node minimally of factorial rank (i.e., whose children have factorial rank at most ), we have .
First consider the case where is a parallel node. In this case, from Remark 3, the linearity of is the maximum of the linearity of its children. Since the children of all have factorial rank at most , it follows from the step of our induction that their linearity is at most . Consequently, we have , and then in particular .
Let us now consider the case where is a series node and let us denote , with , the children of . From what precedes, all of them have linearity at most and for each we have a -line-model of denoted . A remarkable property of this -line-model, which we have constructed above, is that for any vertex of , there exists an index such that the interval associated to in contains the last vertex of . For each vertex , we denote such an index . We now use this property in order to construct a -line-model of , which we denote .
For any , the order used for is simply the concatenation (denoted ) of the orders of the -line-models of its children, from left to right in increasing value of the index. More explicitly, for all , we define as . For any and for any vertex of , if , the interval associated to in is the same as the one associated to in . On the other hand, if , as the interval associated to in contains the last vertex of , in the order of the model of , we extend this interval on the right by including the vertices of for all . As is a series node, all these vertices are indeed adjacent to (see Figure 7).
In this way, for any and for any vertex of , the internal neighbourhood of is entirely covered in the orders . Regarding the external neighbourhood of , note that it can be expressed as . The part is already covered in order . Then, only the part of the external neighbourhood of remains to be covered. This is the purpose of order which we define as follows. For , we take any arbitrary order on the vertices of and we build as . Then, for any and for any vertex of , we associate to the interval of made of the vertices of (see Figure 7). Doing so, the entire external neighbourhood of all the vertices of are covered in the orders we defined. Thus, is a -line-model of which is then of linearity at most .
This completes the induction step and the proof of Lemma 2.
4 Main results
The first result we derive from Lemma 2 is a tight upper bound on the worst-case linearity of cographs on vertices. Until now, the best known upper bound Crespelle and Gambette (2014) was , and Crespelle and Gambette (2014) also exhibits some cograph families having a linearity up to . Here, we show a new upper bound of that matches the lower bound of Crespelle and Gambette (2014). This is a direct consequence of Lemma 2 and of the fact that a double factorial tree of height has leaves.
For any cograph on vertices, we have , and this upper bound is tight.
Let denote the cotree of and . From Lemma 2, the linearity of is in . Let us now show that