An old result due to Lovász  states a graph can be characterized by counting homomorphisms from all graphs to . That is, two graphs and are isomorphic if and only if, for all , the number of homomorphisms from to equals the number of homomorphism from to . This simple result has far reaching consequences, because mapping graphs to their homomorphism vectors
(or suitably scaled versions of these infinite vectors) allows us to apply tools from functional analysis in graph theory. This is the foundation of the beautiful theory of graph limits, developed by Lovász and others over the last 15 years (see).
However, from a computational perspective, representing graphs by their homomorphism vectors has the disadvantage that the problem of computing the entries of these vectors is NP-complete. To avoid this difficulty, we may want to restrict the homomorphism vectors to entries from a class of graphs for which counting homomorphisms is tractable. That is, instead of considering the full homomorphism vector we consider the vector for a class of graphs such that the problem of computing for given graphs and is in polynomial time. Arguably the most natural example of such a class is the class of all trees. More generally, computing for given graphs and is in polynomial time for all classes of bounded tree width, and under a natural assumption from parameterized complexity theory, it is not in polynomial time for any class of unbounded tree width . This immediately raises the question what the vector , for a class of bounded tree width, tells us about the graph .
A first nice example (Proposition 4) is that the vector for the class of all cycles characterizes the spectrum of a graph, that is, for graphs we have if and only if the adjacency matrices of and
have the same eigenvalues with the same multiplicities. This equivalence is a basic observation in spectral graph theory (see[25, Lemma 1]). Before we state deeper results along these lines, let us describe a different (though related) motivation for this research.
Determining the similarity between two graphs is an important problem with many applications, mainly in machine learning, where it is known as “graph matching” (e.g.). But how can the similarity between graphs be measured? An obvious idea is to use the edit distance, which simply counts how many edges and vertices have to be deleted from or added to one graph to obtain the other. However, two graphs that have a small edit distance can nevertheless be structurally quite dissimilar (e.g. [19, Section 1.5.1]). The edit distance is also very hard to compute as it is closely related to the notoriously difficult quadratic assignment problem (e.g. [3, 21]).
Homomorphism vectors offer an alternative, more structurally oriented approach to measuring graph similarity. After suitably scaling the vectors, we can can compare them using standard vector norms. This idea is reminiscent of the “graph kernels” used in machine learning (e.g. ). Like the homomorphism vectors, many graph kernels are based on the idea of counting certain patterns in graphs, such as paths, walks, cycles or subtrees, and in fact any inner product on the homomorphism vectors yields a graph kernel.
A slightly different type of graph kernel is the so-called Weisfeiler-Leman (subtree) kernel . This kernel is derived from the color refinement algorithm (a.k.a. the 1-dimensional Weisfeiler-Leman algorithm), which is a simple and efficient heuristic to test whether two graphs are isomorphic (e.g. ). The algorithm computes a coloring of the vertices of a graph based on the iterated degree sequences, we give the details in Section 3. To use it as an isomorphism test, we compare the color patterns of two graphs. If they are different, we say that color refinement distinguishes the graphs. If the color patterns of the two graphs turn out to be the same, the graphs may still be non-isomorphic, but the algorithm fails to detect this.
Whether color refinement is able to distinguish two graphs and has a very nice linear-algebraic characterization due to Tinhofer [23, 24]. Let and be the vertex sets and let and be the adjacency matrices of and , respectively. Now consider the system of linear equations:
In these equations, denotes a -matrix of variables and denotes the all-1 vector over the index set . Equations (F2) and (F3) simply state that all row and column sums of are supposed to be . Thus the nonnegative integer solutions to are permutation matrices, which due to (F1) describe isomorphisms between and . The nonnegative real solutions to , which in fact are always rational, are called fractional isomorphisms between and . Tinhofer proved that two graphs are fractionally isomorphic if and only if color refinement does not distinguish them.
For every , color refinement has a generalization, known as the -dimensional Weisfeiler-Leman algorithm (-WL), which colors not the vertices of the given graph but -tuples of vertices. Atserias and Maneva  (also see ) generalized Tinhofer’s theorem by establishing a close correspondence between -WL and the level- Sherali-Adams relaxation of .
How expressive are homomorphism vectors for restricted graph classes ? We consider the class of trees first, where the answer is surprisingly clean.
For all graphs and , the following are equivalent:
Color refinement does not distinguish and .
and are fractionally isomorphic, that is, the system of linear equations has a nonnegative real solution.
As mentioned before, the equivalence between ii and iii is due to Tinhofer [23, 24]. An unexpected consequence of our theorem is that we can decide in time whether holds for two given graphs and with vertices and edges. (If two graphs have a different number of vertices or edges, then their homomorphism counts already differ on the 1-vertex or 2-vertex trees.) This is remarkable, because every known algorithm for computing the entry of the vector requires quadratic time when and are given as input.
It is a consequence of the proof of Theorem 1 that, in order to characterize an -vertex graph up to fractional isomorphisms, it suffices to restrict the homomorphism vector to trees of height at most . What happens if we restrict the structure of trees even further? In particular, let us restrict the homomorphism vector to its path entries, that is, consider for the class of all paths. Figure 1 shows an example of two graphs and with and .
Despite their weaker distinguishing capabilities, the vectors are quite interesting. They are related to graph kernels based on counting walks, and they have a clean algebraic description: it is easy to see that , the number of homomorphisms from the path of length to , is equal to the number of length- walks in , which in turn is equal to , where is the adjacency matrix of and is the all- vector of appropriate length.
For all graphs and , the following are equivalent:
The system of linear equations has a real solution.
While the proof of Theorem 1 is mainly graph-theoretic—we establish the equivalence between the assertions i and ii by expressing the “colors” of color refinement in terms of specific tree homomorphisms—the proof of Theorem 1 is purely algebraic. We use spectral techniques, but with a twist, because neither does the spectrum of a graph determine the vector nor does the vector determine the spectrum. This is in contrast with for the class of all cycles, which, as we already mentioned, distinguishes two graphs if and only if they have the same spectrum.
Let us now turn to homomorphism vectors for the class of all graphs of tree width at most . We will relate these to -WL, the -dimensional generalization of color refinement. We also obtain a corresponding system of linear equations. Let and be graphs with vertex sets and , respectively. Instead of variables for vertex pairs , as in the system , the new system has variables for of size . We call a partial bijection if holds for all , and we call it a partial isomorphism if in addition holds for all . Now consider the following system of linear equations:
This system is closely related to the Sherali-Adams relaxations of : Every solution for the level- Sherali-Adams relaxation of yields a solution to , and every solution to yields a solution to the level Sherali-Adams relaxation of [4, 14]. Our result is this:
For all and for all graphs and , the following are equivalent:
-WL does not distinguish and .
has a nonnegative real solution.
The equivalence between ii and iii is implicit in previous work [16, 4, 14]. The system has another nice interpretation related to the proof complexity of graph isomorphism: it is shown in  that has a real solution if and only if a natural system of polynomial equations encoding the isomorphisms between and has a degree- solution in the Hilbert Nullstellensatz proof system [6, 8]. In view of Theorem 1, it is tempting to conjecture that the solvability of characterizes the expressiveness of the homomorphism vectors for the class of all graphs of path width . Unfortunately, we only prove one direction of this conjecture.
Let be an integer with and let be graphs. If has a real solution, then .
Combining this theorem with a recent result from  separating the nonnegative from arbitrary real solutions of our systems of equations, we obtain the following corollary.
For every , there are graphs and with and .
Graphs in this paper are simple, undirected, and finite (even though our results transfer to directed graphs and even to weighted graphs). For a graph , we write for its vertex set and for its edge set. For , the set of neighbors of are denoted with . For , we denote with the subgraph of induced by the vertices of . A rooted graph is a graph together with a designated root vertex . We write multisets using the notation .
An -decomposition of a matrix consists of a lower triangular matrix and an upper triangular matrix such that holds. Every finite matrix over has an -decomposition. We also use infinite matrices over , which are functions where and are locally finite posets and countable. The matrix product is defined in the natural way via if all of these inner products are finite sums, and otherwise we leave it undefined. An
real symmetric matrix has real eigenvalues and a corresponding set of orthogonal eigenspaces. The spectral decomposition of a real symmetric matrixis of the form where are the eigenvalues of with corresponding eigenspaces . Moreover, each is the projection matrix corresponding to the projection onto the eigenspace . Usually, is expressed as for a matrix whose columns form an orthonormal basis of .
Recall that a mapping is a homomorphism if holds for all and that is the number of homomorphisms from to . Let be the number of homomorphisms from to that are surjective on both the vertices and edges of . Let be the number of injective homomorphisms from to . Let , where is the number of automorphisms of . Observe that is the number of subgraphs of that are isomorphic to . Where convenient, we view the objects , , and as infinite matrices; the matrix indices are all unlabeled graphs, sorted by their size. However, we only use one representative of each isomorphism class, called the isomorphism type of the graphs in the class, as an index in the matrix. Then is lower triangular and is upper triangular, so is an LU-decomposition of . Finally, is the number of times occurs as an induced subgraph in . Similarly to the homomorphism vectors we define vectors and . Finally, let be rooted graphs. A homomorphism from to is a graph homomorphism that maps the root of to the root of . Moreover, two rooted graphs are isomorphic if there is an isomorphism mapping the root to the root.
3 Homomorphisms from trees
3.1 Color refinement and tree unfolding
Color refinement iteratively colors the vertices of a graph in a sequence of refinement rounds. Initially, all vertices get the same color. In each refinement round, any two vertices and that still have the same color get different colors if there is some color such that and have a different number of neighbors of color ; otherwise they keep the same color. We stop the refinement process if the vertex partition that is induced by the colors does not change anymore, that is, all pairs of vertices that have the same color before the refinement round still have the same color after the round. More formally, we define the sequence of colorings as follows. We let for all , and for we let . We say that color refinement distinguishes two graphs and if there is an with
We argue now that the color refinement algorithm implicitly constructs a tree at obtained by simultaneously taking all possible walks starting at (and not remembering nodes visited in the past). For a rooted tree with root , a graph , and a vertex , we say that is a tree at if there is a homomorphism from to such that and, for all non-leaves , the function induces a bijection between the set of children of in in and the set of neighbors of in . In other words, is a homomorphism from to that is locally bijective. If is an infinite tree at and does not have any leaves, then is uniquely determined up to isomorphisms, and we call this the infinite tree at (or the tree unfolding of at ), denoted with . For an infinite rooted tree , let be the finite rooted subtree of where all leaves are at depth exactly . For all finite trees of depth , define to be the number of vertices for which is isomorphic to . Note that this number is zero if not all leaves of are at the same depth or if some node of has more than children. The -vector of is the vector , where denotes the family of all rooted trees. The following connection between the color refinement algorithm and the -vector is known.
3.2 Proof of Theorem 1
Throughout this section, we work with rooted trees. For a rooted tree and an (unrooted) graph , we simply let be the number of homomorphisms of the plain tree underlying to , ignoring the root.
Let and be rooted trees. A homomorphism from to is depth-preserving if, for all vertices , the depth of in is equal to the depth of in . Moreover, a homomorphism from to is depth-surjective if the image of under contains vertices at every depth present in . We define as the number of homomorphisms from to that are both depth-preserving and depth-surjective. Note that holds if and only if and have different depths.
Let be a rooted tree and let be a graph. We have
where the sum is over all unlabeled rooted trees . In other words, the matrix identity holds.
Let be the depth of and let be the root of . Every with has depth too and there are at most non-isomorphic rooted trees of depth with . Thus the sum in (2) has only finitely many non-zero terms and is well-defined.
For a rooted tree and a vertex , let be the set of all homomorphisms from to such that holds and the tree unfolding is isomorphic to . Let and observe . Since is the number of with , we thus have . Since each homomorphism from to is contained in exactly one set , we obtain the desired equality (2). ∎
For rooted trees and , let be the number of depth-preserving and surjective homomorphisms from to . In particular, not only do these homomorphisms have to be depth-surjective, but they should hit every vertex of . For rooted trees and of the same depth, let be the number of subgraphs of that are isomorphic to (under an isomorphism that maps the root to the root); if and have different depths, we set .
is an -decomposition of , and and are invertible.
As is the case for finite matrices, the inverse of a lower (upper) triangular matrix is lower (upper) triangular. As the matrix is lower triangular and the matrix is upper triangular, their inverses are as well. We are ready to prove our first main theorem.
Proof of Theorem 1.
We only need to prove the equivalence between assertions i and ii. For every graph , let . By our convention that for a rooted tree and an unrooted graph we let be the number of homomorphisms of the plain tree underlying to , for all and we have . By Lemma 3.1, it suffices to prove for all graph that
We view the vectors and as infinite column vectors. By Lemma 3.2, we have
The forward direction of (3) now follows immediately.
It remains to prove the backward direction. Since holds by Lemma 3.2 for two invertible matrices and , we can first left-multiply with to obtain the equivalent identities
Now suppose holds, and set . Then is well-defined, because and its inverse are lower triangular. Thus we obtain and set . Unfortunately, may be undefined, since is upper triangular. While we can still use a matrix inverse, the argument becomes a bit subtle. The crucial observation is that is non-zero for at most different trees , and all such trees have maximum degree at most . Thus we do not need to look at all trees but only those with maximum degree . Let be the set of all unlabeled rooted trees of maximum degree at most . Let , let , and let . Then we still have the following for all and :
The new matrix is a principal minor of and thus remains invertible. Moreover, is well-defined, since
is a finite sum for each : The number of (unlabeled) trees that have the same depth as is bounded by a function in and . Thus . By a similar argument, we obtain . This implies and thus . ∎
4 Homomorphisms from cycles and paths
While the arguments we saw in the proof of Theorem 1 are mainly graph-theoretic, the proof of Theorem 1 uses spectral techniques. To introduce the techniques, we first prove a simple, known result already mentioned in the introduction. We call two square matrices co-spectral if they have the same eigenvalues with the same multiplicities, and we call two graphs co-spectral if their adjacency matrices are co-spectral.
Proposition (e.g. [25, Lemma 1]).
Let be the class of all cycles (including the degenerate cycle of length , which is just a single vertex). For all graphs and , we have if and only if and are co-spectral.
For the proof, we review a few simple facts from linear algebra. The trace of a square matrix is the sum of the diagonal entries. If the eigenvalues of are , then . Moreover, for each the eigenvalues of the matrix are , and thus . The following technical lemma encapsulates the fact that the information for all suffices to reconstruct the spectrum of with multiplicities. We use the same lemma to prove Theorem 1, but for Proposition 4 a less general version would suffice. Let be two finite sets and let and be two vectors. If the equation
holds for all , then and .
We prove the claim by induction on . For , the claim is trivially true since both sums in (8) are equal to zero by convention.
Let and let without loss of generality. If , then and we claim that holds. Clearly (8) for yields . In particular, holds. Since is the maximum of in absolute value, we have and thus also .
Now suppose that holds. We consider the sequences and with
Note that holds for all by assumption. Observe the following simple facts:
If , then .
If , then and .
As well as the following exhaustive case distinction for :
If , then .
If and , then .
If and , then and .
If , then and .
If holds, we see from 1) that converges to the non-zero value . Since the two sequences are equal, the sequence also converges to a non-zero value. The only case for where this happens is b), and we get , , and . On the other hand, if , we see from 2) that
does not converge, but the even and odd subsequences do. The only cases forwhere this happens for too are c) and d). We cannot be in case c), since the two accumulation points of just differ in their sign, while the two accumulation points of do not have the same absolute value. Thus we must be in case d) and obtain as well as
This linear system has full rank and implies and .
Either way, we can remove or from both and and apply the induction hypothesis on the resulting instance . Then follows as claimed. ∎
Proof of Proposition 4.
For all , the number of homomorphisms from the cycle of length to a graph with adjacency matrix is equal to the number of closed length- walks in , which in turn is equal to the trace of . Thus for graphs with adjacency matrices , we have if and only if holds for all .
If and have the same spectrum , then holds for all . For the reverse direction, suppose for all . Let be the set of eigenvalues of and for each , let be the multiplicity of the eigenvalue . Let and for be the corresponding eigenvalues and multiplicities for . Then for all , we have
By Lemma 4, this implies , that is, the spectra of and are identical. ∎
In the following example, we show that the vectors for the class of cycles and for the class of trees are incomparable in their expressiveness.
The graphs and shown in Figure 2 are co-spectral and thus , but it is easy to see that for the class of all paths.
Let be a cycle of length and the disjoint union of two triangles. Then obviously, . However, color refinement does not distinguish and and thus .
Let us now turn to the proof of Theorem 1.
Proof of Theorem 1.
Let and be the adjacency matrices of and , respectively. Since is a symmetric and real matrix, its eigenvalues are real and the corresponding eigenspaces are orthogonal and span . Let be the -dimensional all- vector, and let be the set of all eigenvalues of whose corresponding eigenspaces are not orthogonal to . We call these eigenvalues the useful eigenvalues of and without loss of generality assume . The -dimensional all- vector
can be expressed as a direct sum of eigenvectors ofcorresponding to useful eigenvalues. In particular, there is a unique decomposition such that each is a non-zero eigenvector in the eigenspace of . Moreover, the vectors are orthogonal. For the matrix , we analogously define its set of useful eigenvalues and the direct sum .
We prove the equivalence of the following three assertions (of which the first and third appear in the statement of Theorem 1).
and have the same set of useful eigenvalues and holds for all . Here, denotes the Euclidean norm with .
The system of linear equations has a real solution.
Note that in 2, we do not require that the useful eigenvalues occur with the same multiplicities in and . We show the implications (1 2), (2 3), and (3 1).
(1 2): Suppose that holds for all paths . Equivalently, this can be stated in terms of the adjacency matrices and : for all , we have . We claim that and have the same useful eigenvalues, and that the projections of onto the corresponding eigenspaces have the same lengths.
Note that holds. Thus we have
The term can be expanded analogously, which together yields
Since all coefficients and are non-zero, we are in the situation of Lemma 4. We obtain and, for all , we obtain and . This is exactly the claim that we want to show.
(2 3): We claim that the )-matrix defined via
satisfies the equations and . Indeed, we have
This follows, since , , and is symmetric. Moreover, we have
This holds by definition of and and from . The claim follows analogously.
(3 1): Suppose there is a matrix with and . We obtain by induction for all . For , this also holds since by convention. As a result, we have for all . Since these scalars count the length- walks in and , respectively, we obtain for all paths as claimed. ∎
5 Homomorphisms from bounded tree width and path width
We briefly outline the main ideas of the proofs of Theorems 1 and 1; the technical details are deferred to the appendix. In Theorem 1, the equivalence between ii and iii is essentially known, so we focus on the equivalence between i and ii. The proof is similar to the proof of Theorem 1 in Section 3.
Let us fix . The idea of the -WL algorithm is to iteratively color -tuples of vertices. Initially, each -tuple is colored by its atomic type, that is, the isomorphism type of the labeled graph . Then in the refinement step, to define the new color of a -tuple we look at the current color of all -tuples that can be reached from by adding one vertex and then removing one vertex.
Similar to the tree unfolding of a graph at a vertex , we define the Weisfeiler-Leman tree unfolding at a -tuple of vertices. These objects have some resemblance to the pebbling comonad, which was defined by Abramsky, Dawar, and Wang  in the language of category theory. The WL-tree unfolding describes the color of