# A polynomial time parallel algorithm for graph isomorphism using a quasipolynomial number of processors

The Graph Isomorphism (GI) problem is a theoretically interesting problem because it has not been proven to be in P nor to be NP-complete. Babai made a breakthrough in 2015 when announcing a quasipolynomial time algorithm for GI problem. Babai's work gives the most theoretically efficient algorithm for GI, as well as a strong evidence favoring the idea that class GI NP and thus P NP. Based on Babai's algorithm, we prove that GI can further be solved by a parallel algorithm that runs in polynomial time using a quasipolynomial number of processors. We achieve that result by identifying the bottlenecks in Babai's algorithms and parallelizing them. In particular, we prove that color refinement can be computed in parallel logarithmic time using a polynomial number of processors, and the k-dimensional WL refinement can be computed in parallel polynomial time using a quasipolynomial number of processors. Our work suggests that Graph Isomorphism and GI-complete problems can be computed efficiently in a parallel computer, and provides insights on speeding up parallel GI programs in practice.

## Authors

• 1 publication
• 2 publications
• 4 publications
• ### A Polynomial Time Algorithm for a NPC Problem

It is introduced a so called 'Multi-stage graph Simple Path' problem (MS...
08/09/2021 ∙ by Xinwen Jiang, et al. ∙ 0

• ### On Salum's Algorithm for X3SAT

This is a commentary on, and critique of, Latif Salum's paper titled "Tr...

In real-time systems, in addition to the functional correctness recurren...
09/12/2018 ∙ by Jian-Jia Chen, et al. ∙ 0

• ### Towards Work-Efficient Parallel Parameterized Algorithms

Parallel parameterized complexity theory studies how fixed-parameter tra...
02/20/2019 ∙ by Max Bannach, et al. ∙ 0

• ### On the tractability of the maximum independent set problem

The maximum independent set problem is a classical NP-complete problem i...
03/26/2019 ∙ by R. Dharmarajan, et al. ∙ 0

• ### Machines as Programs: P ≠ NP

The Curry-Howard correspondence is often called the proofs-as-programs r...
09/20/2021 ∙ by Jonathan J. Mize, et al. ∙ 0

• ### A Parallel Approximation Algorithm for Maximizing Submodular b-Matching

We design new serial and parallel approximation algorithms for computing...
07/13/2021 ∙ by S. M. Ferdous, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

### 1.1 Overview

An isomorphism between two graphs and is a bijection such that for every , is in . The Graph Isomorphism (GI) problem is the problem that given two graphs , determine if there exists an isomorphism between them.

The Graph Isomorphism problem is an interesting problem. Even though it is in NP, researchers have not successfully proven it to be in P, or to be NP-complete. In fact, it is widely believed to lie in the NP-intermediate class (NPI), the non-empty class which lies in NP but outside P in the case that P NP [Ladner1975].

The Graph Isomorphism problem have witnessed advancement in the practical aspect more than in the theoretical aspect. In practice, notable programs include Nauty [McKay1] and Traces [MCKAY2] by McKay, saucy by Darga [Darga], conauto by Presa [Presa], and bliss by Junttila [Junttila]. The practical GI solvers can solve GI of random graphs of about 10000 vertices quickly [McKay1]. In theory, however, the best known general theoretical bound for GI problem before Babai’s work was sub-exponential time . The algorithm was developed by Luks and Babai in [BabaiLuks] in 1983. It took more than three decades for the next best bound, quasipolynomial, to be developed. Intuitively, Babai’s work pushes GI closer to the class of P, providing more evidence suggesting that GI is not NP-complete. While the algorithm’s quasipolynomial runtime is still superpolynomial, it is now remarkably closer to polynomial than to exponential time.

On the other hand, demand for parallel algorithms is increasing. Nowadays many computational tasks are connected with big data or large input. On the other hand, more and more computers are equipped with multiple cores, and we have supercomputers with millions of cores. While memory access is still a big issue in parallel computing, it is undeniable that there is an increasingly high demand for parallel algorithms.

### 1.2 Result

In this work, we prove that we can solve the Graph Isomorphism problem in parallel polynomial time using a quasipolynomial number of processors. Using Babai’s algorithm as the basis of parallelization, we demonstrate that Babai’s algorithm, as well as GI algorithms in general, are highly parallel, and can be sped up superpolynomially in a parallel computer. The Graph Isomorphism problem can be solved using a parallel polynomial time algorithm with a quasipolynomial number of processors.

Later in this paper, we will show that Babai’s work, as well as GI algorithms in general, uses a backtracking tree structure. The tree structure enables simple parallelization scheme. However, in Babai’s algorithm the computation of each node in the tree is superpolynomial, and therein lies challenge of parallelizing the tree nodes. Several operations in the computation of each node that are bottlenecks and we will develop a parallelization scheme for each of them. One particular nontrivial parallelization is the parallelization of the -dimensional WL refinement procedure, which is a highly sequential procedure where each iteration depends on the result of the previous iteration.

The structure of the paper is as follows. Section 2 presents preliminary techniques and a short overview of Babai’s algorithm. Section 3 identifies the bottlenecks in Babai’s algorithm and parallelization schemes designed to handle them. Section 4 presents our conclusion and the impact of this result.

## 2 Preliminaries

### 2.1 Combinatorial techniques: Color refinement and k dimensional WL refinement, individualization

Color refinement has been used in most of the existing practical or theoretical GI algorithms. The idea behind color refinement is to use colors to represent similarities. Let us use to denote the color of a vertex . A vertex from graph with color can only be mapped to a vertex in with color if . Similarity can be evaluated using multiple criteria, for example the degree of a vertex. A common color refinement scheme is to evaluate the similarity of two vertices based on the colors of their neighbors, i.e. if . After comparing the vertices and updating the new colors, the process repeats until equilibrium is achieved - no additional color class is created. Two graphs are isomorphic only if they have the same coloring after refinement. Note that we call the coloring, as well as any partition scheme, canonical if they must be preserved by isomorphism and therefore must be agreed between two graphs.

Color refinement is a powerful tool as the GI problem for random graphs can usually be solved using one iteration of color refinement [Babai2]. However, color refinement fails to distinguish graphs in hard cases, a notable example being the case of regular graphs. Therefore, an extension of color refinement has been developed by Weisfeiler-Lehman [WL]: WL refinement, and then later -dimensional WL refinement. Note that color refinement is usually referred to as 1-dimensional WL refinement.

The classic WL refinement is an extension of color refinement. For a graph and two elements , the new color encodes the old color and, for all , the number of elements such that and . Thus, the classic WL refinement refines the colors of not only the vertices but also the edges. From construction, the classic WL refinement is also called the 2-WL refinement.

The -dimensional WL refinement further leverages the classic technique to work on tuples of size of the vertices. For a -tuple of vertices, the new color encodes the old color and for all tuples , the number of elements such that . It is worth noting that the classic WL refinement and the -dimensional WL refinement can be extended to work with relational structures, a generalization of graphs, as in Babai’s paper [Babai1].

On the other hand, individualization is a technique that can complement color refinement. Given two graphs and , individualization maps a fixed vertex from to a vertex in , usually by giving them a unique color. Since multiple vertices in could potentially be mapped to (i.e. have the same color as ), a GI algorithm that uses individualization has to consider all possible mappings. Therefore, individualization generates a multiplicative cost to the runtime of the program. Combining individualization and color refinement will produce a program with a backtracking search tree structure, see McKay [McKay1].

### 2.2 Group theoretic techniques: Group procedures

Group theory appears naturally in the course of solving Graph Isomorphism, because the set of automorphisms of a graph naturally makes up a group, and the set of isomorphisms between two graphs and must be a coset of that group.

Here we present some basic permutation group concepts and notations. A permutation group is a group that contains the permutations of a set of objects. The set of objects is called the permutation domain, or domain set, and we say that the permutation group acts on that domain. The symmetric group of a domain is the group containing all permutations of that domain, denoting or if . Given a permutation group subgroup of (denotes ), a orbit of a set element is the set of elements to which can be mapped through the group action. Orbits therefore are equivalence classes of objects that can be mapped to each other. If a group has only one orbit, then the group is called transitive i.e. every pair of objects can be mapped to each other. If a group acts transitively on a domain set, a block is a subset that either gets mapped to itself, or gets translated somewhere else entirely. The block system naturally forms a partition of the domain set, and the minimal block system is the block system with the minimum number of blocks (the block size is maximum).

In GI computation, a group is usually represented using its set of generators. We will list some useful computation group operations that appear in the literature. Given a group ) with generator set , we can

• find the orbits of and transitivity of by iterating through the images of the domain set elements.

• find ’s minimal block system by following [Helfgott], who quotes [Luks:1980] and [Sims1]: fix , for , construct the graph with its set of vertices and as edges, then the connected component containing and is the smallest block containing and . The action of is imprimitive if and only if the constructed graph is not connected for an arbitrary and at least one , thus we obtain the block containing thus the block system must be non-trivial.

• generate the "whole" group given a set of generators by using the Furst-Hopcroft-Luks (FHL) algorithm [Luks:1980]. From FHL, we can also determine whether an element belongs to a group, and determine the subgroup of a group given a polynomial time membership testing of the subgroup.

It is important to note that all these operations run in polynomial time in and .

### 2.3 Parallel computing and the PRAM model

In this work, we build our parallel algorithm using the Parallel RAM (PRAM) computational model. Unlike the typical Random Access Memory (RAM) model where there is one processor and instructions are executed sequentially, the PRAM model features multiple processors, local memory of each processor and global shared memory, and free read and write accesses to the global shared memory.

### 2.4 Babai’s quasipolynomial time Graph Isomorphism algorithm

In 2015, Babai showed that the String Isomorphism problem, a more general extension of GI, can be solved in quasipolynomial time. The String Isomorphism (SI) problem, given two string inputs and as functions from indices set to symbols and a subgroup , determines whether or not there exists at least one element such that is an isomorphism between and , i.e. . The Graph Isomorphism problem can be reduced to the String Isomorphism problem by flattening the input graphs’ adjacency matrices into the strings and constructing corresponding to this transformation.

[Babai] The String Isomorphism problem can be solved in quasipolynomial time. As a result, the Graph Isomorphism problem can be solved in quasipolynomial time.

In the interest of brevity, we will go through the novel concepts that make the algorithm succeed and will give a high level description of the algorithm. We refer the readers to the work of Babai [Babai1] and Helfgott’s explanatory document [Helfgott] to know more about the details of the algorithm.

Babai’s algorithm includes several key features that combine and complement each other beautifully to make the algorithm succeed:

• Divide and conquer, and recursion: at almost any point in the algorithm, the high-level goal is to either recursively reduce the underlying group into a combination of its subgroup and cosets, or the domain set into smaller subsets, and consider smaller problems having those smaller subgroups or subsets as input. This continues until the input sizes become sufficiently small such that each problem instance can be solved in polynomial time using brute force. The algorithm runs in quasipolynomial time if at any recursive step the problem can be broken down into a quasipolynomial number of smaller instances, and the depth of recursion is at most polylogarithmic.

• Group theoretic and combinatorial techniques: Both combinatorial techniques and group theoretic techniques are used in Babai’s algorithm. Intuitively, combinatorial techniques, namely partitioning techniques, work in a top-down fashion - those techniques try to partition the graph based on high level asymmetry. In contrast, group theoretic techniques approach the problem bottom up and try to construct the automorphism group element by element. Generally, combinatorial techniques work well when the partitioning structure is highly asymmetric, whereas group theoretical technique work well when the partitioning structure is highly symmetric.

• Symmetry vs asymmetry: Both symmetry and asymmetry benefit the algorithm. Asymmetry enables combinatorial techniques to reduce the problem to smaller instances as stated above. On the other hand, symmetry enables canonical splitting of the domain set into the symmetric part and the asymmetric part, where the automorphism group of the symmetric part is easy to construct due to its structure.

There are five main procedures in Babai’s algorithm: Luks’ reduction, Local Certificate, Aggregate Certificates, Design Lemma and Split-or-Johnson. Luks’ reduction is the first step in each recursion, where Babai employs the framework developed by Luks [Luks:1980] and reduces the bottleneck of GI to a certain type of group, whose structure enables the transformation of to an auxiliary domain . Local Certificate examines a logarithmic-size subset of to determine whether it is highly symmetric with respect to the current underlying group. Aggregate Certificates combines the results of Local Certificates of all those subsets to infer properties about the global symmetry or asymmetry. If there are a lot of symmetric test sets, then the global symmetries allow efficient recursion. Otherwise, if there are a lot of asymmetric test sets, the graph must be highly asymmetric. Design Lemma takes a highly asymmetric relational structure as the input and produces a uniprimitive coherent configuration, which is an important algebraic structure closely related to the 2-dimensional WL refinement, in the worst case. Finally, Split-or-Johnson takes a uniprimitive coherent configuration and produces either a "split" i.e. a canonical partition of the domain set, or a large canonically embedded graph whose structure is well-known and thus enables efficient recursion.

## 3 Parallel Graph Isomorphism algorithm

### 3.1 Identifying the bottlenecks

In this section, we identify all the superpolynomial bottlenecks in Babai’s algorithm. We refer the reader to the technical details of Babai’s paper [Babai1] or to Helfgott’s detailed time complexity analysis of Babai’s algorithm [Helfgott] to verify that the bottlenecks mentioned are indeed all superpolynomial bottlenecks in the algorithm. We will also identify the bottlenecks that are easy to solve, i.e. embarrassingly parallel ones.

In Babai’s algorithm, there are two main types of bottleneck: multiplicative bottlenecks and large computation bottlenecks.

#### 3.1.1 Multiplicative bottlenecks

Multiplicative bottlenecks are bottlenecks that arise from either individualization steps or recursion steps. These steps all create change to the structure of the program similar to branching out from a tree, and therefore introduce a multiplicative cost to the program.

Individualization is a step that occurs in the Design Lemma procedure, the Split-or-Johnson procedure and the Aggregate Certificates procedure. In fact, individualization is the main action of the Design Lemma procedure. As stated in subsection 2.1, individualization induces a multiplicative cost to the program’s runtime.

Recursion or reduction of a problem instance to multiple smaller instances also cause a multiplicative cost to the program’s runtime. The reduction steps create smaller problem instances with no guarantee about runtime (we can only reason about the maximum depth of recursion). The steps occur in the reduction of the working group to its subgroup and the cosets, the reduction to a Johnson group in Luks framework, and in the recursive call to a quasipolynomial number of instances of SI on smaller inputs in the Local Certificate procedure.

Due to their tree-like structure, the computations of the multiplicative bottlenecks are embarrassingly parallel.

Multiplicative bottlenecks can be computed in parallel in polynomial time, such that any multiplicative cost to the runtime of the programs becomes multiplicative cost to the number of processors. Furthermore, the parallelization is work-preserving.

This case is straightforward. For the recursion/reduction steps, we assign a processor for each problem instance created. Similarly for the individualization steps, we assign a processor for each of the vertex choice possibilities. Since the recursion/reduction problem instances as well as the possible cases of individualization are independent, they can be solved in parallel independently. Therefore, their multiplicative cost to the runtime of the program becomes multiplicative cost to the number of processors needed.

#### 3.1.2 Large computation bottlenecks

Large computation bottlenecks are the superpolynomial tasks which appear throughout Babai’s algorithm. They are: 1) the large repetitive tasks, 2) the aggregating results and procedures on large groups tasks, and 3) the -dimensional WL refinement task.

Large repetitive tasks are computational tasks that consist of a process that repeats itself multiple times independently. There are two such tasks in Babai’s algorithm: one is the repeating of Local Certificate procedure for test sets where at the start of Aggregating Certificates, and the other is the search through possibly all tuples of elements to see if individualizing those elements yields the desired outcome. Again, due to each iteration being independent, those tasks are embarrassingly parallel.

Large repetitive tasks can be computed in parallel in polynomial time. Furthermore, the parallelization is work-preserving.

Aggregating results tasks arise when the algorithm needs to combine results from multiple smaller problem instances or subprocedures. This type of task occurs in multiple places in Babai’s algorithm. For example, almost all reduction steps, such as the reduction step in Luks framework or the reduction step in the Local Certificate procedure, require combining the resulting groups or cosets (represented by their generating sets) into one group by merging their generating sets. The Aggregating Certificates procedure also combines the certificates into one group in a similar fashion. This is not problematic memory-wise, since we can avoid memory complications between processors by using the PRAM model of computation. However, this creates large group (in term of size of generating sets). As we mentioned above, many group operations have runtime polynomial in the size of the generating set. Therefore, this causes a superpolynomial bottleneck. In the next part of the paper we develop a generating set refinement procedure to deal exclusively with this type of bottleneck.

The last type of bottleneck is the -dimensional WL refinement. This operation deserves a separate subsection, since it is a highly sequential and highly non-trivial obstacle to parallelization. Our approach to parallelize this operation will be discussed in detail in subsection 3.3.3.

### 3.2 Group generators refinement

As stated previously, group operations usually run in polynomial time of the size of the generating set. Therefore, operations on a group with a large (non-polynomial) generating set is problematic.

In this subsection, we develop a subroutine that refines the generating set of any permutation group.

Given a permutation group with generating set , we can create a new generating set of which has size . The process can be done in parallel in steps using number of processors.

###### Proof.

The subroutine is based on the FHL algorithm (also called Schreier-Sims algorithm), an algorithm described by Luks in [Luks:1980]. Given some permutation group with domain set , the FHL algorithm works with a tower of stabilizers , where is the group of permutations that keeps all elements fixed. The algorithm builds a set of of representatives of and thus generates for all .

We note that the FHL algorithm itself runs in polynomial time of the size of the group’s generating set. Our subroutine is a derivation of the FHL algorithm, but is parallel so that it can handle input groups with large generating sets.

Proof of correctness: Denote the group generated by generating set . The membership testing method is derived from FHL algorithm: for each element , for increasing the algorithm searches for a representative of the coset of modulo in the set . If such a representative is found, then is already contained in the group created by . The RefineGeneratingSet procedure builds by gradually adding elements from to it gradually. In each iteration, elements of are tested to see if they are a member of . It chooses one such element and updates and the s. The algorithm stops when there are no more elements in can be added to , in other word all elements in are contained .

Proof of time complexity: First, it is easy to see that the membership testing function runs in polynomial time of and (note that ). Thus, the parfor loop runs in polynomial time using processors. Therefore, the runtime of the algorithm is polynomial if the number of iterations in the main while loop (line 3), which is , is polynomial. In each of its iterations, the main while loop finds one element that is not contained in and add it to . Doing this results in a that contains as a proper subgroup, meaning contains and at least one coset. Therefore, and thus where . We note that is always a subset of the symmetric set, thus . Therefore, which means .

Therefore, the newly created generating set has size polynomial and the subroutine can be done in parallel polynomial time using processors. ∎

### 3.3 Parallelization of k-dimensional WL refinement

We here introduce one of the main contributions of this paper: the parallelization of the -dimensional WL refinement.

In Babai’s algorithm, the -dimensional WL refinement procedure appears in the Split-or-Johnson procedure. Recall the definition of -dimensional WL refinement in subsection 2.1. Similar to color refinement, -dimensional WL refinement is a iterative procedure where the next iteration depends on the results of previous iterations. No proven upperbound on the number of iterations of -dimensional WL refinement exists other than the trivial one (each iteration procedures at least one new color and there are at most colors). Therefore, the -dimensional WL refinement procedure is a highly non-trivial obstacle for parallelization.

In this subsection, we will derive a parallelization scheme for the -dimensional refinement. We will first develop a parallelization scheme for the 1-dimensional WL refinement by leveraging it to 2-dimensional WL refinement, and then develop a transformation between performing -dimensional WL refinement and performing 1-dimensional WL refinement.

#### 3.3.1 Color refinement and 2-dimensional WL refinement

Let us first consider 1-dimensional WL refinement i.e. color refinement. The number of iterations of color refinement has a tight upperbound , as there are graphs where color refinement requires a linear number of iterations such as the line graph.

We will now take a different perspective by looking at a pair of vertices distinguished in the iteration of color refinement. Consider a graph with initial coloring of the vertices. We can expand the initial coloring to 2-tuples by assigning color of by the color of and based on whether or not . Then, we employ the idea of walk refinement by Lichter [Lichter]. Given a with an initial edge coloring , the walk refinement is a refinement scheme that for and determines a new coloring defined by:

 CW[k](u,v)={{C(u,wi1,wi2,…,wik−1,v)},wij∈V}

where .

Intuitively, a -walk refinement refines the color of a pair by comparing the colors of all the possible walks of length from to . Note that a -walk refinement also implicitly contains -walks by having first steps stationary.

Given a graph with two vertices and in distinguished in the iteration of color refinement, . Then can be distinguished by a -walk refinement, i.e. .

###### Proof.

Proof by induction. The statement is trivial true for where the algorithm simply returns the initial colors.

For a pair of vertices distinguished in the iterations, they must satisfy these conditions:

the set of colors of the neighbors of = and the set of colors of the neighbors of = (assuming they have the same number of neighbors) must be different in the iteration.

In every iteration , and must have exactly the same neighbor color set.

Consider a pair of vertices and in the neighbors of and respectively distinguished by the iteration of color refinement. By induction hypothesis, and are also distinguished by the walk refinement, i.e. .

Now, consider and . These colors are distinguished using and . We will only consider a subset of all the walks which is and for and are the neighbors of , and similar for . We will prove that these two subsets of walks result in a different set of colors, and that these two subsets having different colors result in .

First, since and are distinguished in the iteration of color refinement, their set of neighbor colors must be different after the color refinement iteration. From the induction hypothesis, this means that where the are neighbors of respectively. This means that . Therefore, , thus those two subsets of walks result in a different set of colors.

Second, we will prove that these subsets of walks have a unique role in all the walks considered. This is indeed true, because the initial color of an edge is unique from the definition.

Thus, if and are distinguished in the iteration of color refinement, then they can be distinguished by a -walk refinement. ∎

However, Lichter [Lichter] proved that a -walk refinement can be simulated by iterations of 2-dimensional WL refinement.

[Lichter] -walk refinement can be simulated with iterations of Weisfeiler-Leman refinements.

From the above two theorems, we immediately come to our conclusion. Given a graph , if color refinement stabilizes in iterations, then we can use 2-dimensional WL refinement to simulate the result using iterations.

We note that this result has significant implications beyond Babai’s algorithm.

There is a parallel algorithm to simulate the result of color refinement in logarithmic time.

###### Proof.

We can use 2-dimensional WL refinement to simulate color refinement as stated above. Note that each step of the 2-dimensional WL refinement is completely parallelizable (i.e. can be done in constant time by a work-preserving parallel algorithm), therefore the runtime of the parallel algorithm is the number of iterations, which is as from the above two theorems. ∎

Achieving the result of color refinement in logarithmic time is remarkably useful for practical GI solvers, which mainly use color refinement instead of the complicated -dimensional version. Furthermore, this result implies that any decision problem which can be NC-reduced to color refinement is in NC, the class of problems that can be solved efficiently in a parallel computer.

#### 3.3.2 k-dimensional WL refinement and color refinement

Now we will present a way to simulate -dimensional WL refinement using color refinement

Given a graph , there is a graph such that performing -dimensional WL refinement on can be simulated by performing color refinement on .

###### Proof.

Recall that the -dimensional WL refinement determines the new color of a tuple based on the number of for all tuples of color .

Denote . We construct as follows: For each tuple of vertices in , create a corresponding vertex in . Let us write , and call all the s the base layer. For each vertex of that corresponds to the tuple , construct two layers of auxiliary nodes. The first layer consists of nodes . Connect each of the with . The second layer consists of nodes . Connect each with and with the node that corresponds to the -tuple . We say that these nodes in the auxiliary and layer come out from in order to distinguish them with the auxiliary nodes that expand from other vertices, or come in, to , and call the layer of nodes that expand from as constructed above the out-layer and the layer of nodes that connect to the in-layer. The initial color of each in is similar to the initial color of the corresponding -tuple in , all the auxiliary are given a same new color, and for each all the auxiliary are given a same new color. The construction scheme of the out-layers of node in corresponding to a -tuple is as illustrated in the following figure.

For vertex corresponding to the tuple of vertices in , denote as the tuple and the corresponding vertex. Intuitively, for each the auxiliary is used to capture the set of tuples for all , and the auxiliary with distinct initial colors for different are used to make sure that the exact ordering of the sequence of colors of affects the colors of . Note that the different layers are always distinguished since their initial colors are different.

We will prove that color refinement on results in the same coloring of the base layer nodes as performing -dimensional WL refinement on .

Consider 2 -tuples and of vertices in that are not distinguished at the end of -dimensional WL refinement. In this case, the tuples’ corresponding vertices and in must have the same out-layer. Indeed, each the nodes captures the color determined by the set of colors of . Since and are not distinguished by the -dimensional WL refinement, the set of and must be symmetric. Therefore, the nodes of and must be identical up to permutation. On the other hand, and must also have the same in-layer, because the color of each of the nodes coming from depends on the color of and of , which must be symmetric between and .

Now, if and are distinguished at the end of -dimensional WL refinement, then it is straightforward to see that and are distinguished by the end of color refinement on . This is because the out-layer of each node captures the colors of not preserved under permutation due to the initial coloring of . Therefore, asymmetries in the colors of and (which are detected at some point because and are distinguished) lead to asymmetries in the colors of the auxiliary of and , which in turn leads to different colors of and .

#### 3.3.3 Parallelizing k-dimensional WL refinement

We arrive at our main goal of this subsection. Given a graph , we can perform -dimensional WL refinement, where , in parallel in polynomial time using a quasipolynomial number of processors.

###### Proof.

First, we use the simulation in theorem 3.3.2 to convert performing -dimensional WL refinement on to performing color refinement on . From construction, has vertices, therefore color refinement on runs at most iterations.

Next, we use 2-dimensional WL refinement on to simulate color refinement on as in Corollary 3.3.1. The result of color refinement can be simulated in , which is iterations of 2-dimensional WL refinement for .

Finally, we can execute this 2-dimensional WL refinement in parallel polynomial time. We will prove that in each iteration of 2-dimensional WL refinement, the computation can be done efficiently in parallel. Recall that in the 2-dimensional WL refinement, the new color of a pair of vertices is determined by the number of elements such that and for all . For each pair of vertices , looking at the colors and for all vertices in is entirely parallelizable and can be done in constant time. Comparing those sets of color between two pair of vertices takes constant time, and performing all the comparisons among all sets of two pairs of vertices can be done in parallel. Therefore, each iteration of 2-dimensional WL refinement of can be done in parallel polynomial time, thus whether the time complexity of a parallel execution of 2-dimensional WL refinement is polynomial or not depends on the total number of iterations needed, which is polynomial in . Therefore, the time complexity of simulating color refinement on using 2-dimensional WL refinement in parallel is polynomial, with a quasipolynomial total amount of work (because our parallelization of 2-dimensional WL refinement iterations is work-preserving, and there is a quasipolynomial number of vertices in ).

In conclusion, we can perform -dimensional WL refinement where in parallel polynomial time with a quasipolynomial amount of total work, and therefore a quasipolynomial number of processors needed. ∎

### 3.4 The combined algorithm

We have examined all the bottlenecks in Babai’s algorithm and parallelized them. In this subsection, we combine the parallelization schemes from the previous parts to create the master algorithm.

The graph isomorphism problem can be solved by a parallel algorithm in polynomial time using a quasipolynomial number of processors.

###### Proof.

From lemma 3.1.1 and 3.1.2, the multiplicative bottleneck and the large repetitive tasks bottleneck of Babai’s algorithm can be done in parallel polynomial time in a work-preserving manner and such that any multiplicative cost to the runtime of the program becomes multiplicative cost to the number of processors.

For group operations on large groups, which are formed by aggregating results tasks, we apply our RefineGeneratingSet subroutine developed in subsection 3.2 after every aggregation of results operation. By doing this, we guarantee that all the groups in the algorithm have a polynomial size generating set, and therefore the runtime of group operations on them is not superpolynomial. The parallel subroutine runs in polynomial time and introduces an additional worst case quasipolynomial cost to the total amount of work.

For -dimensional WL refinement, we follow the procedure described in theorem 3.3.2 to parallelize the procedure to parallel polynomial time. The parallelization introduces a multiplicative cost to the total amount of work. However, we note that this is an one-time cost, and thus the total amount of work done by the algorithm after parallelization is still quasipolynomial.

Combining the above arguments, we arrive at our parallel version of Babai’s algorithm. The algorithm runs in polynomial time in each iteration, any multiplicative cost to the runtime of the algorithm after each iteration is converted into multiplicative cost to the number of parallel processors, and there are a polynomial (in fact logarithmic) number of iterations in total. Thus, the whole parallel algorithm runs in polynomial time. The amount of total work is increased by at most a quasipolynomial multiplicative cost, and thus is still quasipolynomial. Therefore, the number of processors needed is quasipolynomial.

## 4 Conclusion and remarks

In our work, we have proven that the GI problem can be solved using a parallel algorithm that runs in polynomial time using a quasipolynomial number of processors. Therefore, problems in the GI complexity class, which are problems that can be polynomially reduced to Graph Isomorphism, can be solved in parallel polynomial time using a quasipolynomial number of processors.

This result implies that in theory solving the worst cases of the GI problem is tractable in a parallel computer. It also implies that Babai’s algorithm is highly parallel and can be sped up superpolynomially in a parallel computer. We note that despite being quite complicated, Babai’s algorithm shares multiple similarities with other GI algorithms. Many of those similarities are bottlenecks of parallelization, such as the backtracking search tree structure, color refinement and -dimensional operations, individualization and so on. Therefore, the parallelization techniques used in this paper can be generalized to other GI algorithms as well, indicating that algorithms for GI in general is highly parallel.

In addition, our finding that color refinement can be calculated in logarithmic time in parallel can potentially have a notable impact in practical GI solvers. On the one hand, almost all practical GI solvers use color refinement as the main tool to solve the problem. On the other hand, given the increase in demand of big data processing and parallel algorithms, there have been a surprisingly few practical parallel algorithms for the GI problem - it is hard to find one such program beside [Son15]. Therefore, our techniques used for parallelizing color refinement can potentially be applied to create new practical parallel GI algorithms, or to improve existing ones.