Enumerating Unique Computational Graphs via an Iterative Graph Invariant

02/17/2019
by   Chris Ying, et al.
Google
0

In this report, we describe a novel graph invariant for computational graphs (colored directed acylic graphs) and how we used it to generate all distinct computational graphs up to isomorphism for small graphs. The algorithm iteratively applies isomorphism-invariant operations, which take into account the graph structure and coloring, and outputs a fixed-length hash that is identical for all isomorphic computational graphs. While the algorithm cannot perfectly distinguish all pairs of non-isomorphic computational graphs, we suggest that it may be useful as a heuristic for comparing graphs.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

10/05/2021

Graph Coloring: Comparing Cluster Graphs to Factor Graphs

We present a means of formulating and solving graph coloring problems wi...
11/21/2021

Isomorphism Testing for T-graphs in FPT

A T-graph (a special case of a chordal graph) is the intersection graph ...
07/23/2020

The expressive power of kth-order invariant graph networks

The expressive power of graph neural network formalisms is commonly meas...
10/17/2019

Graph Embedding VAE: A Permutation Invariant Model of Graph Structure

Generative models of graph structure have applications in biology and so...
08/14/2019

The Power of the Weisfeiler-Leman Algorithm to Decompose Graphs

The Weisfeiler-Leman procedure is a widely-used approach for graph isomo...
05/03/2020

Graph Homomorphism Convolution

In this paper, we study the graph classification problem from the graph ...
03/14/2019

Keyed hash function from large girth expander graphs

In this paper we present an algorithm to compute keyed hash function (me...

1 Introduction

As part of a research project studying neural architectures, we needed an algorithm that could identify isomorphic computational graphs

, the building blocks of neural networks. In our context, computational graphs represent operations performed on arbitrary tensors where the vertices are operations and the edges are tensors. To generalize this problem, we represent a computational graph as a colored directed acyclic graph where the colors represent unique operations. Depending on the graph representation (e.g., adjacency matrix plus a color per vertex), multiple representations may encode the same computational graph (Figure 

1).

In this report, we describe a novel graph invariant for computational graphs and how we used it to generate all distinct computational graphs up to isomorphism for small graphs. While this invariant cannot perfectly distinguish all pairs of non-isomorphic computational graphs, we suggest that it may be useful as a heuristic for comparing graphs.

Figure 1: Three computational graphs with different adjacencies and colorings when ordered that are isomorphic in the sense of Definition 2 (best viewed in color).

2 Definitions

Definition 1.

A computational graph on  vertices and  colors is a directed acyclic graph where each vertex is assigned an arbitrary color, and every vertex lies on a path between two designated vertices. Formally, it is a tuple , where:

  1. is the number of vertices;

  2. is the number of colors;

  3. is a set of directed edges  such that ;

  4. is a function assigning a color to each vertex; and

  5. for each vertex , there is a directed path from vertex  to vertex  that passes through vertex  111This condition is not required for the graph invariant algorithm 1 but is used by the enumeration algorithm 2 to heavily reduce the number of possible graphs..

Note that there is no restriction that adjacent vertices have different colors.

The goal is to count the number of distinct computational graphs up to isomorphism, where isomorphism is defined as follows.

Definition 2.

Computational graphs and on  vertices and  colors are isomorphic if there exists a bijection such that:

  1. adjacency is preserved: ; and

  2. coloring is preserved: .

3 Related Approaches

In the context of general graph isomorphism, a graph invariant is a property of a graph such that if two graphs are isomorphic, they have the same value of that property (the converse is not necessarily true). The Weisfeiler–Lehman algorithm [5] uses an iterative coloring approach to come up with a canonical coloring on a graph, though subsequent work [1] shows that the algorithm can fail for some graphs. It is unknown whether graph isomorphism can be solved in polynomial time [2].

This paper deals with a more constrained problem than general graph isomorphism, where the graphs are directed acyclic graphs and there are colorings assigned to the vertices. Nonetheless, the algorithm is partially inspired by the iterative nature of the Weisfeiler–Lehman algorithm.

OEIS [4] provides a few sequences which are close to what we are looking for:

  • A000088: Number of graphs on unlabeled nodes. This series does not consider coloring and considers all graphs rather than just directed acyclic graphs.

  • A003024: Number of acyclic digraphs with labeled nodes. This series treats all vertices as uniquely labeled rather than individual colored and also counts disconnected graphs.

  • A057500: Number of connected labeled graphs with edges and nodes. This series does not use directed edges and does not consider coloring.

  • A240955: Number of -colored labeled digraphs with vertices. In this series, colored refers to vertices which cannot be colored the same color as its neighbors, which is different than the notion of coloring used here.

Pólya–Redfield counting can be used to count the number of colorings on undirected graphs [3] but it does not provide a way to quickly identify if two directed colored graphs are isomorphic.

4 Iterative Graph Hashing Algorithm

In this section, we describe an algorithm for generating a novel graph invariant for computational graphs. At a high-level, the key idea is to iteratively apply isomorphism-invariant operations to the graph in a way takes into account the graph structure as well as the coloring. Algorithm 1 provides the pseudo-code for the graph hashing algorithm:

Input: Computational graph
Output: Fixed-length hash of the graph and coloring
let ;
  List such that is hash for vertex 
1 forall vertices  do
2       let ;
3       let ;
       ;
        Initialize hashes
4      
5 end forall
6for  to  do
       let ;
        Next iteration of hashes
7       forall vertices  do
             let ;
              List of in-neighbor hashes
             let ;
              List of out-neighbor hashes
8             ;
9            
10       end forall
       ;
        Update hashes
11      
12 end for
return
Algorithm 1 Iterative graph hashing algorithm for colored DAGs

Specifically, hash returns a fixed-length hash. Our implementation uses the 128-bit MD5 hash algorithm which we found sufficient for our use-case. The sort function performs a lexicographical sort of the hash outputs. We repeat the algorithm up to the number of vertices iterations (line 7) but we suspect that it may be sufficient to iterate up to the diameter of the graph. In our implementation, we represent as an adjacency matrix along with a list of colors of length equal to the number of vertices. The in-degree and out-degree functions are implemented as summations across the columns or rows of the matrix.

4.1 Proof of graph invariance

.

To show that this algorithm computes a graph invariant, we must show that any two isomorphic any computational graphs output the same hash. Consider two graphs and that are isomorphic with isomorphism . Suppose that we run Algorithm 1 on  and . We will use  and  to refer to the values of the hashes in the executions of the algorithm on  and , respectively. We will say that the hashes are consistent if they respect the isomorphism : that is, if  for each vertex .

Choose any vertex , and let . Then the initial hash  of  in  is equal to the initial hash  of  in  (after line 6), because the two vertices have the same adjacency and coloring, by Definition 2. Thus:

Lemma 4.1.

The initial hashes  and  are consistent (after line 6).

In each iteration of the outer for-loop (line 7), the hashes are updated. Suppose, at the start of an iteration, that the hashes are consistent. Again, choose any , and let . If , then by the isomorphism condition. By the consistency assumption, . Since and (line 10), then the multisets (ignoring order) represented by and are the same. Likewise, is the same as (line 11) ignoring order. Because sorting ensures that the list orderings are identical, and . We also have (because  and  are consistent), so it follows that (line 14) since the hash function is operating on the identical triplets. Thus:

Lemma 4.2.

If the hashes are consistent at the start of an iteration (line 7), then they are also consistent at the end of that iteration (after line 14).

By induction, Lemma 4.1 and Lemma 4.2 show that the hashes are consistent throughout the full loop. Because the final hashes  and  are consistent, they are permutations of each other, so their sorted forms are the same, and thus the final hashed results are identical. Therefore, Algorithm 1 computes a graph invariant. ∎

5 Graph Enumeration Procedure

Given the graph invariant, we can proceed to generate all computational graphs, up to isomorphism. Using the canonical ordering, we treat the first vertex (no in-neighbors) as the “input” vertex and the last vertex (no out-neighbors) as the “output” vertex.

We observe that vertices not on a directed path from to in a colored directed acyclic graph can be pruned to yield a valid computational graph. If we generate the directed acyclic graphs in increasing number of vertices order, then any graph that needs to be pruned has been already generated at a previous iteration and can be immediately skipped.

Input: Maximum vertices , maximum edges , and colors 
Output: Yields all unique computational graphs up to constraints
1 for numbers of vertices  to  do
2       forall

 bit vectors of length

 do
3             convert bit vector to upper-triangular adjacency matrix ;
4             if  or
5              contains vertex not on directed path from input to output then
6                   discard and continue to next matrix ;
7                  
8            else
9                   forall potential colorings  do
10                         hash using Algorithm 1 ;
11                         if hash has not been observed before then
12                               yield ;
13                              
14                         end if
15                        
16                   end forall
17                  
18             end if
19            
20       end forall
21      
22 end for
Algorithm 2 Enumerating computational graphs

This algorithm also provides a canonical computational graph (i.e., the first one that is observed) for each unique hash which represents the equivalence class of computational graphs induced by the Algorithm 1.

6 Verification

For our neural network use-case, we needed to generate all graphs up to 7 vertices, 9 edges, and 3 colors. Furthermore, the first vertex and the last vertex are specially colored and distinct from each other and the other 3 colors (they represent the input and output tensors of the network).

We verified that all graphs with the same hash generated in Algorithm 2 were unique up to isomorphism by running an expensive procedure which enumerates all possible permutations to confirm that any graph with a duplicate hash (line 11) is isomorphic to the canonical computational graph. The definition of graph invariant implies that graphs with different hashes are non-isomorphic.

Thus for our constrained use-case, Algorithm 1 can exactly identify if two computational graphs are isomorphic or not.

7 Adversarial graphs

An adversarial example to the identifiability of the algorithm consists of two non-isomorphic computational graphs which hash to the same value. One such example can be seen in Figure 2 using 10 vertices and 16 edges. The counterexample holds also long as vertices 2, 3, 4, 5 are the same color and likewise for vertices 6, 7, 8, 9. The two graphs are non-isomorphic by inspection but Algorithm 1 fails to distinguish between the two. This is because vertices 2, 3, 4, 5 all start with the same initial hash due to having the same degree and each iteration maintains this equivalence because the in and out neighbors share the same colors (and likewise for vertices 6, 7, 8, 9).

An infinite number of similar adversarial graphs can be constructed from pairs of directed non-isomorphic bipartite graphs where all edges point from one partition to the other and the degree of all vertices within each partition is the same.

Figure 2: A counterexample using 10 vertices and 16 edges. Vertices 2, 3, 4, 5 must be the same color and likewise for 6, 7, 8, 9 (the two sets of vertices can be the same color). Vertices 1 and 10 can be colored with any color.

8 Future Work

Modifying the algorithm to deal with cases like the counterexample above is the first direction for future work.

In addition to the counter example discussed above, another possible problem is hash collision. A possible solution is to replace the hash function with string concatenation, which would cause the iterative “hashes” to grow exponentially in length at each iteration. This eliminates the possibility of hash collision, and the proof of graph invariance still holds.

Acknowledgements

We would like to thank Chris Jones for suggesting Weisfeiler–Lehman color refinement and finding a counterexample, Esteban Real for reviewing the code implementation of the algorithm, and William Chargin for reviewing the notation and proof.

References

  • [1] L. Babai and L. Kucera. Canonical labelling of graphs in linear average time. In 20th Annual Symposium on Foundations of Computer Science (sfcs 1979), pages 39–46, Oct 1979.
  • [2] László Babai. Graph isomorphism in quasipolynomial time. CoRR, abs/1512.03547, 2015.
  • [3] Robert W Robinson. Enumeration of colored graphs. Journal of Combinatorial Theory, 4(2):181–190, 1968.
  • [4] N. J. A. Sloane. The on-line encyclopedia of integer sequences. https://oeis.org.
  • [5] Boris Weisfeiler and Andrei A. Lehman. A reduction of a graph to a canonical form and an algebra arising during this reduction. Nauchno-Technicheskaya Informatsiya, 2(9):12–16, 1968. (in Russian).