Colour Refinement, which is also known as Naïve Vertex Classification or the 1-dimensional Weisfeiler-Leman algorithm (1-WL), is an important combinatorial algorithm in theoretical and practical approaches to the graph isomorphism problem. In an iterative fashion, it refines an isomorphism-invariant partition of the vertex set of the input graph. This process stabilises at some point and the final partition can often be used to distinguish non-isomorphic graphs [BabErdSelSta80]. Colour Refinement can be implemented to run in time , where is the order of the input graph and is its number of edges [carcro82, mck81]. Most notably, its efficient implementations are used in all competitive graph isomorphism solvers (such as Nauty and Traces [mckaypip14], Bliss [JunttilaK07] and saucy [DargaLSM04]).
Colour Refinement has been rediscovered many times, one of its first occurences being in a paper on chemical information systems from the 1960s [mor65]
. The procedure is applied in plenty of other fields, for example, it can be modified to reduce the dimension of linear programs significantly[GroheKMS14]. Other applications are in the context of graph kernels [ShervashidzeSLMB11] or static program analysis [LiSSSS16]morritfey+19].
As described above, Colour Refinement computes a stable colouring of its input graph. It is known that two given graphs result in equal colourings, i.e. are not distinguished by Colour Refinement, if and only if there is a fractional isomorphism between them [god97, ramscheiull94, tin91]. Moreover, the graphs which Colour Refinement identifies up to isomorphism (i.e. distinguishes from all non-isomorphic ones) have been completely characterised [DBLP:journals/cc/ArvindKRV17, kieschweisel15].
To obtain its final colouring, the algorithm proceeds in iterations. In this paper, we investigate how many iterations it takes for the algorithm to terminate. More specifically, for , we are interested in , the maximum number of iterations required to reach stabilisation of Colour Refinement among all graphs of order .
While not directly linked to the running time on a sequential machine, the iteration number corresponds to the parallel running time of Colour Refinement (on a standard PRAM model) [groverb06, KoblerV08]. Furthermore, via a connection to counting logics, a bound on the iteration number for graphs of a fixed size directly translates into a bound on the descriptive complexity of the difference between the two graphs, namely into a bound on the quantifier depth of a distinguishing formula in the extension of the 2-variable fragment of first-order logic by counting quantifiers [caifurimm92, immlan90]. Moreover, the iteration number of 1-WL equals the depth of a graph neural network that outputs the stable vertex colouring of the underlying graph with respect to Colour Refinement [morritfey+19].
Considering paths, one quickly determines that holds for every . By contrast, on random graphs, the iteration number is asymptotically almost surely [BabErdSelSta80]. The best published lower bound on the iteration number of Colour Refinement on -vertex graphs is [krever15]. Concerning the upper bound, the trivial inequality holds for every repeated partitioning of a set of size and it does not take into account any further properties of the input graph or of the algorithm used to execute the partitioning. Still, no improvement over this upper bound has been established.
Our first main result reads as follows.
For every with or , it holds that .
Thus, there are infinitely many with . We can even determine the iteration number up to an additive constant of 1 for all (where the precise numbers for can easily be determined computationally), as stated in our second main result.
For every , it holds that .
We obtain our bounds via an empirical approach. More precisely, we have designed a procedure that enables us to systematically generate for all graphs of order that obey certain constraints (to render the procedure tractable) and on which Colour Refinement takes iterations to stabilise. Analysing the graphs, we determined the connections between colour classes during the execution of the algorithm in detail. If the vertex degrees that are present in the graph are low, then the connections between colour classes of size 2 are restricted. This allows us to develop an elegant graphical visualisation and a compact string representation of the graphs with low vertex degrees that take iterations to stabilise. Using these encodings, we are able to provide infinite families with Colour Refinement iterations until stabilisation.
Our analysis enables a deep understanding of the families that we present. Via slight modifications of the graph families, we can then cover a large portion of graph sizes and, allowing to go from connected graphs to general graphs, we can construct the graphs that yield Theorem 1.
Colour Refinement is the 1-dimensional version of the so-called Weisfeiler-Leman algorithm. For every , there exists a generalisation of it (-WL), which colours vertex -tuples in the input graph instead of single vertices only. See [kiefer] for an in-depth study of the main parameters of Colour Refinement and -WL.
Similarly as for Colour Refinement, one can consider the number of iterations of -WL on graphs of order . Notably, contrasting our results for Colour Refinement, in [kieschwe16], it was first proved that the trivial upper bound of is not even asymptotically tight (see also the journal version [kieschw19]). This foundation fostered further work, leading to an astonishingly good new upper bound of for the iteration number of 2-WL [lichponschwei19].
For fixed , it is already non-trivial to show linear lower bounds on . Modifying a construction of Cai, Fürer, and Immerman [caifurimm92], this was achieved by Fürer [Furer01], who showed that , remaining to date the best known lower bound when the input is a graph. Only when considering structures with relations of higher arity than 2 as input, better lower bounds on the iteration number of -WL have been proved [BerkholzN16].
For , regarding upper bounds on the iteration number of -WL, without further knowledge about the input graph, no significant improvements over the trivial upper bound are known.111Note that the bound is not tight, since the initial partition of the -tuples already has multiple classes, for example, one consisting of all tuples of the form . Still, when the input graph has bounded treewidth or is a 3-connected planar graph, polylogarithmic upper bounds on the iteration number of -WL needed to identify the graph are known [groverb06, verb07].
Although for every natural number , there are non-isomorphic graphs that are not distinguished by -WL [caifurimm92], it is known that for every graph class with a forbidden minor, a sufficiently high-dimensional Weisfeiler-Leman algorithm correctly decides isomorphism [Grohe12]. Recent results give new upper bounds on the dimension needed for certain interesting graph classes [grokie19, groneu19]. A closely-related direction of research investigates what properties the Weisfeiler-Leman algorithm can detect in graphs [ArvindFKV2018, fuhlkoebverb20, fur17].
By , we denote the set of natural numbers, i.e. . We set and, for , we define and . For a set , a partition of is a set of non-empty sets such that and for all with , it holds that . For two partitions and of the same set , we say that is finer than (or refines ) if every element of is a (not necessarily proper) subset of an element of . We write (and equivalently ) to express that is finer than . Concurrently, we say that is coarser than . If both and hold, we denote this by .
For , the partition is the unit partition of . The partition is called the discrete partition of . A set of cardinality 1 is a singleton.
All graphs that we consider in this paper are finite and simple, i.e. undirected without self-loops at vertices. For a graph with vertex set and edge set , its order is . For a vertex , we denote by the neighbourhood of in , i.e. the set . Similarly, for a vertex set , we set . The degree of a vertex is (since the graph will be clear from the context, we do not need to include it in our notation). We also set . If there is a such that , the graph is -regular. A regular graph is a graph that is -regular for some . By a matching, we mean a 1-regular graph.
Let be a graph with at least two vertices. If there are sets such that and and , then is bipartite (on bipartition ). If, additionally, , the graph is complete bipartite.
For , a -biregular graph (on bipartition ) is a bipartite graph on bipartition such that for every , it holds that , and for every , it holds that . A biregular graph is a graph for which there are and such that is -biregular on bipartition .
For a graph and a set , we let be the induced subgraph of on , i.e. the subgraph of with vertex set and edge set . We define . Furthermore, for vertex sets , we denote by the graph with vertex set and edge set .
A coloured graph is a tuple , where is a graph and is a function that assigns colours (i.e. elements from a particular set ) to the vertices. We interpret all graphs treated in this paper as coloured graphs and just write instead of when is clear from the context. If the colouring is not specified, we assume a monochromatic colouring, i.e. all vertices have the same colour.
For a coloured graph with colouring , a (vertex) colour class of is a maximal set of vertices that all have the same -colour. Every graph colouring induces a partition of into the vertex colour classes with respect to .
3 Colour Refinement
Colour Refinement proceeds by iteratively refining a partition of the vertices of its input graph until the partition is stable with respect to the refinement criterion.
[Colour Refinement] Let be a colouring of the vertices of a graph , where is some set of colours. The colouring computed by Colour Refinement on input is defined recursively: we set , i.e. the initial colouring is . For , the colouring computed by Colour Refinement after iterations on is defined as .
That is, consists of the colour of from the previous iteration as well as the multiset of colours of neighbors of from the previous iteration. It is not difficult to see that holds for every graph and every . Therefore, there is a unique minimal integer such that . For this value , we define the output of Colour Refinement on input to be and call and the stable colouring and the stable partition, respectively, of . Accordingly, executing Colour Refinement iterations on means computing the colouring . We call a graph with colouring and the induced partition stable if . Note that for all with , the graph is regular and is biregular.
Colour Refinement can be used to check whether two given graphs and are non-isomorphic by computing the stable colouring on the disjoint union of the two. If there is a colour such that, in the stable colouring, the numbers of vertices of colour differ in and , they are non-isomorphic. However, even if they agree in every colour class size in the stable colouring, the graphs might not be isomorphic. It is not trivial to describe for which graphs this isomorphism test is always successful (see [DBLP:journals/cc/ArvindKRV17, kieschweisel15]).
We write for the number of iterations of Colour Refinement on input , that is, , where is the minimal integer for which . Similarly, for , we write to denote the maximum number of iterations that Colour Refinement needs to reach stabilisation on an -vertex graph.
We call every graph with a long-refinement graph.
Let be an uncoloured path with vertices. Then .
In the first iteration, the two end vertices are distinguished from all others because they are the only ones with degree . Then in each iteration, the information of being adjacent to a “special” vertex, i.e. the information about the distance to a vertex of degree 1, is propagated one step closer to the vertices in the centre of the path. This procedure takes iterations. ∎
In 2015, Krebs and Verbitsky improved on the explicit linear lower bound for graphs of order given by Fact 3 by constructing a family of pairs of graphs whose members of order can only be distinguished after Colour Refinement iterations (see [krever15, Theorem 4.6]). Hence, since for a set and partitions of that satisfy
it holds that , we obtain the following corollary.
For every , it holds that .
It has remained open whether any of the two bounds is tight. In preliminary research conducted together with Gödicke and Schweitzer, towards improving the lower bound, the first author took up an approach to reverse-engineer the splitting of colour classes. Gödicke’s implementation of those split procedures led to the following result.
[[Goedicke]] For every , it holds that . For , it holds that .
Unfortunately, due to computational exhaustion, it was not possible to test for larger graph sizes. Also, the obtained graphs do not exhibit any structural properties that would lend themselves for a generalisation in order to obtain larger graphs.
Using a fast implementation of Colour Refinement, we could verify that there are exactly 16 long-refinement graphs of order 10, 24 long-refinement graphs of order 11, 32 of order 12, and 36 of order 13. However, again, with simple brute-force approaches, we could not go beyond those numbers exhaustively.
4 Compact Representations of Long-Refinement Graphs
In the light of the previous section, the question whether the lower bound obtained by Krebs and Verbitsky is asymptotically tight has remained open. With the brute-force approach, it becomes infeasible to test all graphs of orders much larger than 10 exhaustively for their number of Colour Refinement iterations until stabilisation. Still, knowing that there exist long-refinement graphs, it is natural to ask whether the ones presented in [Goedicke] are exceptions or whether there are infinitely many such graphs. In this section, we show that the latter is the case.
When the input is a coloured graph with at least two vertex colours, the initial partition already has two elements. Hence, all long-refinement graphs are monochromatic. Therefore, in the following, all initial input graphs are considered to be monochromatic.
Let be a graph and let . If there exists an such that holds, then is not a long-refinement graph.
Every pair of partitions with satisfies . Thus, every sequence of partitions of the form
must satisfy for all . ∎
The proposition implies that in order to find long-refinement graphs, we have to look for graphs in which, in every Colour Refinement iteration, only one additional colour class appears. That is, in each iteration, only one colour class is split and the splitting creates exactly two new colour classes.
Let be a long-refinement graph with at least two vertices. Then there exist with and such that .
This is a direct consequence of Proposition 4: every (monochromatic) regular graph satisfies and if there were more than two vertex degrees present in , we would have . ∎
We can thus restrict ourselves to graphs with exactly two vertex degrees.
For a graph and , we let denote the partition induced by on , i.e. after Colour Refinement iterations on . If is clear from the context, we omit it in the expression.
As a result of the regularity conditions that must hold for the graph , we make the following observation. It implies that, in a long-refinement graph, to determine the class that is split in iteration , it suffices to consider the neighbourhood of an arbitrary class obtained in the preceding iteration.
Let be a graph. Suppose there are and with and . Then there are vertices such that .
Note that there must be a with . Since and , there is a such that for every , it holds that . Since and , there are vertices such that or . In the first case, we are done. In the second case, we obtain . ∎
Note that the validity of the lemma depends on the assumption , which by Proposition 4 is always fulfilled in long-refinement graphs as long as .
No graph with more than one connected component is a long-refinement graph.
Since the refinement process takes place in parallel in each connected component, is the maximum of all for the connected components of . ∎
We can therefore restrict ourselves to connected graphs. The only connected graphs with are paths and, by Fact 3, they are not long-refinement graphs. Thus, the smallest degree pairs for a search for candidates are and .
Let be a long-refinement graph. Then .
Suppose the lemma does not hold. Let be a long-refinement graph with at least three vertices of degree 1. Consider the execution of Colour Refinement on input and let . In , there are two vertex colour classes, namely a class containing the vertices of degree 1 and a class containing the vertices of the second vertex degree .
Suppose that . The class is not split before has been split. Thus, consider the iteration after which has been subdivided into two classes and . This induces the splitting of into and , which by Proposition 4 implies in particular that for all pairs of partition classes with , the graph induced between the two classes is biregular. Therefore, however, now for every pair of classes , the graph is biregular and thus, the partition is equitable. Hence, , i.e. the splitting of must happen in the -st iteration. In particular, and must be singletons, i.e. . ∎
Table 1 displays the adjacency lists of two long-refinement graphs on 12 and 14 vertices, respectively, which each have exactly one vertex of degree 1.
The lemma allows us to reduce the decision problem whether there are infinitely many long-refinement graph with degrees in to the question whether there are such families with degrees in .
If there is a long-refinement graph with , then there is also a long-refinement graph with and .
Let be a long-refinement graph with . Then , where and . By Lemma 4, it holds that .
First suppose . Consider the graph with and , i.e. obtained from by inserting an edge between the two vertices in . In the following, we identify the vertices of with their counterparts in . For , let be the partition of induced by . Let . Then, for , it holds that
This follows from and , the regularity of and that there is only one way to split , which results in two singletons. In particular, it holds that .
Now suppose . In , there are only the two partition classes and . In , the set is subdivided into the singleton and . Define and again, for , let be the partition of induced by . Then and, more generally, for , we obtain . This can be deduced from the equality . Thus, . ∎
With the help of the tool Nauty [mck81], our quest for long-refinement graphs was successful. We tested exhaustively up to order 13. To render the search for larger long-refinement graphs tractable, we imposed further conditions. Restricting the degrees to , it was possible to test for graphs up to order 64. Altogether, we found graphs with Colour Refinement iterations, where , for all even
and for all odd.222We exclude the case in the following analysis since, as our computational results have shown, although long-refinement graphs of order 10 do exist, none of them has vertex degrees 2 and 3.
In the following, in order to generalise the results to bigger graph sizes, we analyse the obtained graphs. Among our computational results, the even-size graphs with vertex degrees 2 and 3 have the following property in common: there is an iteration such that for every , it holds that . That is, with respect to their assigned colours, the vertices remain in pairs until there are no larger colour classes left. Then the first such pair is split into singletons, which must induce a splitting of another pair, and so on, until the discrete partition is obtained. (Similar statements hold for the odd-size graphs, but are more technical.) In the following, a pair is a set of two vertices which occurs as a colour class during the execution of Colour Refinement. That is, vertices form a pair if and only if is an element of for some .
As just argued, there is a splitting order on the pairs, i.e. a linear order induced by the order in which pairs are split into singletons. We now examine the possible connections between pairs.
From now on, we make the following assumption.
is a long-refinement graph with and such that there is an for which contains only pairs. Let be the splitting order of these pairs.
We call pairs successive if is the successor of with respect to . Note that for successive pairs , , in the graph , every must have the same number of neighbours in , otherwise it would hold that . By a simple case analysis, together with an application of Lemma 4, this rules out all connections but matchings for successive pairs.
Let and be successive pairs. Then is a matching.
Towards a compact representation of the graphs, we further examine the connections between pairs and with , where is the successor of with respect to .
Let be a pair. Then exactly one of the following holds.
and for every pair with , it holds that .
and there are exactly two choices for a pair with such that . Furthermore, there is a vertex such that and are complete bipartite and .
Suppose . If , the statement trivially holds. Otherwise, by Corollary 4, every vertex has exactly one neighbour in and exactly one neighbour in the predecessor of , i.e. in the unique pair such that . Thus, due to the degree restrictions, can have at most one additional neighbour in a pair with and . However, if had a neighbour in such a , the graph would not be biregular, implying that , a contradiction. Therefore, and thus, . In particular, for every pair with and , it holds that .
Now suppose that . Since the splitting of must be induced by a splitting of a union of two pairs and is biregular and is regular, we cannot have . Thus, there is a pair with and such that . Let be a vertex with . Then , otherwise . Thus, is complete bipartite. Therefore and due to the degree restrictions, has exactly three neighbours: one in and two in . In particular, for every pair with , it holds that .
Let be the second vertex in . Since the splitting of induces the splitting of , by Proposition 4, for every pair with , the graph must be biregular, i.e. either empty or complete bipartite.
Moreover, since , also . By Corollary 4, it holds that . Therefore, there is exactly one pair such that is complete bipartite and for all other pairs with , the graph is empty.
Suppose . Choose such that . Then the unique element in is a union of two pairs, whose splitting induces the splitting of . However, and both graphs and are biregular.
Thus, , which concludes the proof. ∎
Corollary 4 and Lemma 4 characterise for all pairs . Thus, all additional edges must be between vertices from the same pair. Hence, we can use the following compact graphical representation to fully describe the graphs of order at least 12 that we found. As the set of nodes, we take the pairs. We order them according to and connect successive pairs with an edge representing the matching. If the two vertices of a pair are adjacent, we indicate this with a loop at the corresponding node. The only other type of connection between pairs is constituted by the edges from to two other pairs which form the last colour class of size 4, i.e. a colour class of size 4 in the partition for which . We indicate this type of edge with a dotted curve.
An example graph as well as the evolution of the colour classes computed by Colour Refinement on the graph is depicted in Figure 1.
Since is a linear order, we can also use a string representation to fully describe the graphs. For this, we introduce the following notation, letting and be the predecessor and successor of , respectively, with respect to .
0 represents a pair of vertices of degree 2.
1 represents a pair of vertices of degree 3 that is not the minimum of and for which . (This implies that .)
X represents a pair of vertices of degree 3 that is not the minimum of and for which .
S represents the minimum of .
Thus, by Lemma 4, there are exactly two pairs of type X, namely and from the lemma. Now we can use the alphabet and the order to encode the graphs as strings. The -th letter of a string is the -th element of . Note that S is always a pair of non-adjacent vertices of degree 3 due to the degree restrictions. For example, the string representation for the graph in Figure 1 is S11100111X1X1110.
Formally, for every and every string with and for some with , we define the corresponding graph with and
We use this encoding in the next section, which contains our main results.
5 Infinite Families of Long-Refinement Graphs
In this section, we present infinite families of long-refinement graphs. We adapt them further to deduce that holds for all .
For , the notation abbreviates the -fold concatenation of . We let .
For every string contained in the following sets, the graph is a long-refinement graph.
Let (cf. Figure 2). The vertices and are the only ones of degree 2. Thus,
|since the vertices in the S-pair have no neighbours in . Similarly,|
Now the splitting of the last colour class of size 4 into two X-pairs induces the splitting of the S-pair into singletons, which is propagated linearly according to , adding 6 further iterations, thus summing up to 11 iterations.
We now consider the various infinite families of graphs. The proofs for them work similarly by induction over . Therefore, we only present the full detailed proof for the family , which includes the graph from Figure 1.
For , the graph has 14 vertices. It is easy to verify that it indeed takes 13 Colour Refinement iterations to stabilise. We sketch how Colour Refinement processes the graph: for this, for , we let denote the partition of induced by , i.e. after iterations of Colour Refinement on . First, vertices are assigned colours indicating their degrees. That is,
|since the vertices contained in the -pair are not adjacent to vertices from -pairs. Since no vertex contained in the S-pair is adjacent to any vertex from the -pair, we obtain|
i.e. with respect to the order induced by the string representation, the first -pair, the second -pair and the first X-pair are separated from the others. Once the two X-pairs form separate colour classes, this induces the splitting of S into two singletons, which is propagated linearly through the entire string, adding 7 further iterations, thus summing up to 13 iterations.
For general , let . To count the iterations of Colour Refinement, we introduce some vocabulary for the pairs in (see also Figure 1). We let . Note that is the set of vertices contained in the subgraphs corresponding to the substrings in the string representation. Furthermore, for all , we call the set