Distributed Testing of Graph Isomorphism in the CONGEST model

03/01/2020 ∙ by Reut Levi, et al. ∙ Ben-Gurion University of the Negev IDC Herzliya 0

In this paper we study the problem of testing graph isomorphism (GI) in the CONGEST distributed model. In this setting we test whether the distributive network, G_U, is isomorphic to G_K which is given as an input to all the nodes in the network, or alternatively, only to a single node. We first consider the decision variant of the problem in which the algorithm distinguishes G_U and G_K which are isomorphic from G_U and G_K which are not isomorphic. We provide a randomized algorithm with O(n) rounds for the setting in which G_K is given only to a single node. We prove that for this setting the number of rounds of any deterministic algorithm is Ω̃(n^2) rounds, where n denotes the number of nodes, which implies a separation between the randomized and the deterministic complexities of deciding GI. We then consider the property testing variant of the problem, where the algorithm is only required to distinguish the case that G_U and G_K are isomorphic from the case that G_U and G_K are far from being isomorphic (according to some predetermined distance measure). We show that every algorithm requires Ω(D) rounds, where D denotes the diameter of the network. This lower bound holds even if all the nodes are given G_K as an input, and even if the message size is unbounded. We provide a randomized algorithm with an almost matching round complexity of O(D+(ϵ^-1log n)^2) rounds that is suitable for dense graphs. We also show that with the same number of rounds it is possible that each node outputs its mapping according to a bijection which is an approximated isomorphism. We conclude with simple simulation arguments that allow us to obtain essentially tight algorithms with round complexity Õ(D) for special families of sparse graphs.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Testing graph isomorphism is one of the most fundamental computational problems in graph theory. A pair of graphs and are isomorphic if there is a bijection that maps the nodes of to the nodes of such that every edge of is mapped to an edge of and likewise for non-edges. Currently, it is not known whether there exists an efficient algorithm for this problem and in fact it is one of the few natural problems which is a candidate for being in NP-intermediate, that is, neither in P nor NP-complete. In order to obtain efficient algorithms for this problem, various restrictions and relaxations were considered (e.g. [31, 27]). This problem has been extensively studied also in other computational models such as parallel computation models [26, 38, 8, 29, 42, 32, 22, 7] and in the realm of property testing in which the main complexity measure is the query complexity [15, 40, 23, 39, 35, 3].

In the context of distributive models such as the CONGEST [41] and the LOCAL [37] models, the main complexity measure is the round complexity and the computational power is usually considered to be unbounded. Therefore in these models the complexity of the problem may change dramatically. While there seem to be many sensible settings, one of the simplest settings of the problem for distributive models is to test for isomorphism between the distributed network, , and a known graph, , which is given as an input to all the nodes in the network, or alternatively, only to a subset of the nodes 111This formulation in which is a parameter falls into the category of massively parameterized problems and is also considered in the setting of property testing [15, 23].. The requirement from the algorithm is that if and

are isomorphic, then with high probability 

222We say that an algorithm succeed with high probability, if it succeeds with probability at least for any constant (without changing the round complexity asymptotically. all nodes should output accept and that at least one node should output reject otherwise.

Since the property of being isomorphic to a specific graph is inherently global, intuitively we expect the round complexity to be where denotes the diameter of the network (even for the case in which is given as an input to all the nodes in the network). As we show, this intuition is correct even for the LOCAL model, in which there is no bound on the message size. Therefore, in the LOCAL model, it is not possible to improve over the trivial algorithm that collects the entire information on the network at a single node in rounds and tests for graph isomorphism in a centralized manner. In the CONGEST model, in which the message-size is bounded by , where denotes the number of nodes in the network, implementing this trivial solution may require rounds. This leads to the obvious question whether is it possible to obtain round complexity which is better than in the CONGEST model.

Another interesting question is whether we can obtain better bounds if we relax the decision problem (as considered in the realm of property testing) such that the algorithm is only required to distinguish between pairs of graphs which are isomorphic and pairs of graphs which are far from being isomorphic (according to some predetermined distance measure).

In this setting we define the problem as follows. Let be the distributed network and let denote the number of edges in the network or an upper bound on this number. We say that a pair of graphs are -far from being isomorphic if edges need to be deleted/inserted in order to make the graphs isomorphic, where denotes the number of edges in the network. An adversarially chosen node, , receives as an input the graph and a proximity parameter . The requirement from the algorithm is as follows. If and are isomorphic, then w.h.p. all nodes should output accept. If and are -far from being isomorphic, then w.h.p. at least one node should output no.

1.1 Our Results

In this section we outline our results. We further elaborate on our results in the next sections. In all that follows, unless explicitly stated otherwise, when we refer to distributed algorithms, we mean in the CONGEST model.

A Decision Algorithm and a Lower Bound for the Decision Problem.

For the (exact) decision problem we provide a randomized one-sided error algorithm that runs in rounds and succeeds with high probability (see Theorem 3). The algorithm works even for the setting in which is given only to a single node (that may be chosen adversely). For this setting we prove that any deterministic algorithm requires rounds, which implies a separation between the randomized and the deterministic complexity of the decision problem (see Theorem 5). We note that our algorithm can be adapted to the semi-streaming model [13] in which it uses bits of space and performs only one-pass (see Theorem 4).

A Lower Bound for the Property Testing Variant.

For property testing algorithms we show that even under this relaxation rounds are necessary even for constant and constant error probability. This lower bound holds even in the LOCAL model, for two-sided error algorithms, and even if all the nodes receive as an input (see Theorem 6). It also holds for dense graphs, namely, when and for sparse graphs, that is when .

A Property Testing Algorithm and Computation of Approximated Isomorphism.

We provide a distributed two-sided error property testing algorithm that runs in rounds and succeeds with high probability for the case that , implying that our result is tight up to an additive term of (see Theorem 1). This algorithm works even in the setting in which is given only to a single node. We note that the graphs that are constructed for the lower bound for the exact variant are dense and have a constant diameter. Therefore for these graphs, the property testing algorithm runs in only rounds (while the decision algorithm runs in rounds).

If is given to all the nodes and the graphs are indeed isomorphic then we show that we can also approximately recover the isomorphism with the same round complexity as of testing. Specifically, each node outputs where is a bijection such that the graph , namely the graph in which we re-name the nodes according to , is -close to .

Simulation Arguments and their application to special families of sparse graphs.

Finally, we show, by simple simulation arguments, that it is possible to obtain essentially tight algorithms with round complexity for special families of sparse graphs by adapting centralized property testing algorithms. In particular, these algorithms apply for bounded-degree minor-free graphs and general outerplanar graphs.

1.2 The Decision Algorithm

As described above, a naive approach for testing isomorphism to is to gather the entire information on the network at a single node and then to test for isomorphism in a centralized manner. By the brute-force approach, we may go over all possible bijections between the nodes of the graphs and test for equality between the corresponding graphs. Our algorithm follows this approach with the difference that it only gathers a compressed version of the network as in the algorithm of Abboud et al. [2] for the Identical Subgraph Detection problem. The idea of their algorithm is to reduce the problem of testing if two graphs are equal to the problem of testing equality between a pair of binary strings. From the fact that the test for equality has a one-sided error, namely it never rejects identical graphs, it follows that our algorithm never rejects isomorphic graphs. To ensure that our algorithm is sound we amplify the success probability of the equality test and, as a result, obtain a total round complexity of .

1.3 A Lower Bound for the Decision Problem

We reduce Set-Equality to the problem of deciding isomorphism in the setting in which only a single node receives as an input (as it is the case for our upper bound). The idea is to construct a graph over nodes for every pair of strings where such that is isomorphic to if and only if and . Let and denote the input of Alice and Bob, respectively. In the reduction, is known to Alice and is taken to be . Alice and Bob simulate the distributed algorithm on the graph , which by construction is isomorphic to if and only if , as desired. This reduction yields a lower bound of rounds for any deterministic algorithm.

1.4 A High-Level Description of the Property Testing Algorithm

Our algorithm closely follows the approach taken by Fischer and Matsliah [15] for testing graph isomorphism in the dense-graph model [24] with two sided-error. However, in order to obtain a round complexity which only depends poly-logarithmically in (rather than a dependency of as the query complexity in [24]), we need to diverge from their approach as described next.

1.4.1 The Algorithm of Fischer-Matsliah

The algorithm of Fischer-Matsliah begins with picking, u.a.r., a sequence of nodes from the unknown graph. The selection of these nodes induces labels for each node in the graph as follows. The label of each node is a string of bits where the -th bit indicates whether is a neighbor of the -th node in the sequence. This labeling scheme guarantees that, with high probability, only “similar” nodes, that is, nodes with similar sets of neighbors, might have identical labels. It is not hard to see that if the graphs are isomorphic, then given that we managed to map the nodes in the sequence according to the isomorphism, both graphs should have the same frequency over labels. More surprisingly, it is shown by Fischer and Matsliah that if the nodes in the sequence are mapped according to the isomorphism then it is possible to extend this mapping on-the-fly and obtain, roughly speaking, an approximate isomorphism. In particular, they showed that as long as each node in the graph is mapped to a node with the same label in the other graph (with respect to the mapped sequence), then the obtained function is close to being an isomorphism. This is due to the fact that nodes which are too “different” are likely to have different labels and the fact that similar nodes are exchangeable. Given a candidate for the approximate isomorphism, the problem is then reduced to testing closeness of graphs. Therefore, if the graphs are isomorphic then by going over all possible mappings of the selected sequence (there are only quasi-polynomial many ways to map these nodes) and extending this partial mapping as described above, one should be able to obtain a function, , which is close to being an isomorphism. On the other hand, if the graphs are far from being isomorphic then by definition any bijection gives two graphs which are far from each other. Therefore, these two cases can be distinguished by approximating the Hamming distance of the corresponding adjacency-matrices. In turn, this can be done by selecting random locations (that is, potential edges) and checking the values of both matrices in these locations. Since constructing entirely would be too costly, one needs to be able to generate on-the-fly. A crucial point is that its generation can not depend on the selection of the random locations. In other words, its generation should be query-order-oblivious. To this end, in the algorithm of Fischer-Matsliah they first test if the distributions over the labels are close. If so, they can safely generate on-the-fly while ensuring that there is only little dependency between and the queries the algorithm makes to . This is done by simply mapping a node to a random node in that have the same label as 333The little dependency between and the queries that the algorithm makes to comes from the fact that the frequencies of labels of the two graphs are not necessarily identical as they are only guaranteed to be close (w.h.p.). The query complexity of testing closeness of distributions, which is , dominates the query complexity of the algorithm. As shown in [15], in the centralized setting this algorithm is essentially tight.

1.4.2 Our Algorithm

In the CONGEST model, by straight-forward simulation arguments it follows that one can simulate the algorithm of Fischer-Matsliah in rounds by collecting the answers to the queries of the algorithm at a single node and simulating the centralized algorithm (see Claim 12). A crucial observation for improving this bound is that nodes that have the same label also have at least one neighbor in common (with the only exception of the all-zero label), therefore they can be coordinated by one of their common neighbors. Moreover, it is possible to obtain both samples and access to the frequencies of the labels via these coordinators. Our algorithm proceeds as follows. As in [15] a sequence of random nodes is selected and is sent to the entire network (in rounds). Each node figures out its label and broadcasts this label to its neighbors. The node , that received as an input, selects a random set of potential edges , where . It then broadcasts this set to the entire network. For each potential edge, the information whether it is an actual edge in the network is sent to . For each label of a node in , the corresponding coordinator sends to the frequency of this label. From this point the rest of the computation is done centrally at . We say that a sequence, , of nodes in is good with respect to and if it induces the same frequency of labels as when restricted to labels of nodes in . The node goes over all possible mappings of to the known graph and looks for good sequences (with respect to and ). For every good sequence, , generates a function on-the-fly: on query , maps to a random node in which is still unmatched and has the same label as (w.r.t. ). As we show, from the fact that the sequence is good it follows that is query-order-oblivious. Let denotes the graph obtained from after applying on . As in the algorithm of Fischer-Matsliah, if the graphs are isomorphic and the sequence is the mapping of according to the isomorphism, then is guaranteed to be (w.h.p.) -close . The set of potential edges is then used to approximate the distance between and . This allows us to obtain a significant improvement in the round complexity (over the straight-forward simulation), in terms of , from to .

1.5 High-level Approach for Computing an Approximated Isomorphism

As described in the previous section, if the graphs are isomorphic then w.h.p. the algorithm finds a sequence that corresponds to a bijection such that is guaranteed to be (w.h.p.) -close to . The algorithm accesses only on a small set of random locations. It is tempting to try to output for every in the network. Assume now that every node in the network knows . If the sequence is indeed the mapping of according to an isomorphism then the following naive approach should work. Each coordinator can independently map to the nodes that are assigned to it according to their labels. However, it might be the case that is not the mapping of according to any isomorphism (although it passed the test). In particular it might be that it is not good with respect to and (recall that is good w.r.t. and ). In this case we may want the coordinators of the nodes to be coordinated such that they exchange the mapping of nodes with “underflow” and “overflow” labels. Since there might be labels, such coordination might cause too much congestion. To this end we cluster the labels according to their most significant bit and assign a single coordinator to each cluster. Since there are only many clusters, these coordinators can coordinate without causing too much congestion. The main technicality that needs to be addressed is showing that the resulting mapping, , is close enough to and hence is an approximated isomorphism. We prove this by coupling and and showing that they agree on the mapping of most nodes.

1.6 A Lower Bound for the Property Testing Variant

We prove that for any there exists a family of graphs with diameter such that any distributed two-sided error property testing algorithm for testing isomorphism on this family of graphs requires rounds. In the construction we start with a pair of graphs and that have diameter which are far from being isomorphic. The graph is then defined to be composed of and and a path of length that connects the two graphs. Roughly speaking, the idea is to argue that for round complexity which is at most , where is some absolute constant, the nodes in which belong to the side of cannot distinguish the case in which the network is composed of two graphs which are isomorphic to (connected by a path). Likewise for the nodes that belong to the side of (that cannot distinguish the case in which the network is composed of two graphs which are isomorphic to ). It then follows that the algorithm must err. In the detailed proof, which appears in the appendix, there are some technicalities that need to be addressed in order to prove that the above argument still holds when the nodes may use randomness, port numbers and IDs.

1.7 Related work

In this section we overview results in distributed decision and property testing in the CONGEST model. We also overview related results in centralized property testing.

Distributed Decision.

There is a large body of algorithms and lower bounds for the subgraph detection problem: given a fixed graph , and an input graph , the task is to decide whether contains a subgraph which is isomorphic to . The subgraphs considered include: paths [33], cycles [33, 17, 10], triangles [28, 1, 6], cliques [9, 10, 4]. Abboud et al. [2, Sec. 6.2] considered the identical subgraph detection problem. In this problem the graph’s nodes are partitioned into two equal sets. The task is to decide whether the induced graphs on these two sets are identical w.r.t. to a fixed mapping between the nodes of these two sets. They showed an lower bound on the number of rounds of any deterministic algorithm and a randomized algorithm that performs rounds which succeeds w.h.p.

Distributed Property Testing for Graph Problems.

Distributed property testing was initiated by Censor-Hillel et al. [5]. In particular, they designed and analyzed distributed property testing algorithms for: triangle-freeness, cycle-freeness, and bipartiteness. They also proved a logarithmic lower bound for the latter two properties. While they mainly focus on the bounded degree model and the general model they also studied the dense model. In this model they showed that for a certain class of problems, any centralized property testing algorithm can be emulated in the distributed model such that number of rounds is where denotes the number of queries made by the centralized tester. Fraigniaud et al. [21] studied distributed property testing of excluded subgraphs of size and . Since the appearance of the above papers, there was a fruitful line of research in distributed property testing for various properties, mainly focusing on properties of whether a graph excludes a fixed sub-graph [20, 18, 19, 12, 11, 16]. Other problems on graphs such as testing planarity, and testing the conductance was studied in [36, 14], respectively.

Centralized Property Testing.

Fischer and Matsliah [15] studied the graph isomorphism problem in the dense-graph model [24].444In the dense-graph model, a graph is considered to be -far from a property if the symmetric difference between its edge set to the edge set of any graph in is greater than . They considered four variations of the Graph Isomorphism testing problem: (1) one-sided error, where one of the graphs is known, and there is a query access to the graph which is tested, i.e., the tested graph is “unknown”, (2) one-sided error, where there is a query access for both graphs, i.e., both graphs are unknown, (3) two-sided error, where one graph is known, (4)  two sided error, where both graphs are unknown. For the first three variants Fischer and Matsliah [15] showed (almost) matching lower and upper bounds of, respectively: (1) , , (2) , , and (3) , , where is the number of vertices of each input (known or unknown) graph. For the fourth variant they showed an upper-bound of and a lower-bound of . Onak and Sun [40] improved the upper bound of the fourth case to by bypassing the distribution testing reduction that was used by [15]. Property testing of graph isomorphism was also considered in the bounded-degree model [25]555In the bounded-graph model, a graph with maximum degree , is considered to be -far from a property if the symmetric difference between its edge set to the edge set of any graph in is greater than . Goldreich [23] proved that the query complexity of any property testing algorithm is at least , for the variant in which one graph is known, and when both graphs are unknown. Newman and Sohler [39] provide an algorithm for minor-free graphs with degree bounded by (this class includes for example bounded degree planar graphs) whose query complexity is independent of the size of the graph. Moreover, they showed that any property is testable in this class of graphs with the same query complexity. Kusumoto and Yoshida [35], and Babu, Khoury, and Newman [3] considered testing of isomorphism between graphs which are forests and outerplanar in the general model [30]666In the general-graph model, a graph , is considered to be -far from a property if the symmetric difference between its edge set to the edge set of any graph in is greater than ., respectively. They both proved an upper bound of and a lower bound of was shown in [35]. Moreover, they proved that any graphs property is testable on these family of graphs with queries.

2 The Algorithm for Testing Isomorphism in Dense Graphs

In this section, we describe and analyze the distributed algorithm for testing graph isomorphism in dense graphs. We begin with several useful definitions and observations, followed by the listing Algorithm 1 and the proof of its correctness (which follows from Lemma 9 and Lemma 10). Finally we discuss in more details how the algorithm is implemented in the CONGEST model.

We establish the following theorem.

Theorem 1

There exists a distributed two-sided error property testing algorithm for testing isomorphism (of dense graphs) that runs in rounds in the CONGEST model. The algorithm succeeds with high probability.

2.1 Definitions and Notation

We shall use the following definitions in our algorithm and in its analysis.

Let be a graph and let be a sequence of nodes from .

Definition 1 ([15])

For every node , the -label of in , denoted by , is a string of bits defined as follows:

where denotes the neighbors of in .

We use the -operator to denote both the symmetric difference between two sets and when applied on graphs it denotes the Hamming distance between the corresponding adjacency matrices.

Definition 2 ([15])

For , we say that is -separating if for every pair of nodes such that it holds that and have different -labels in .

Definition 3 (inverse of )

For a label , define . Namely, is the set of nodes in for which the -label is .

Let and be a pair of graphs such that . The following definitions are defined with respect to a pair of sequences of nodes from and , and , respectively.

We next define what we mean by saying that the mapping of a function is consistent w.r.t. the labels of and .

Definition 4

For which is a bijection, we say that is -label-consistent if the following holds:

  1. maps to : for every .

  2. The label of a node and its image is the same: for every .

For and a sequence , we define to denote and to denote the graph whose nodes are and its edge set is .

We next observe that if and are isomorphic than for any sequence and any function which is an isomorphism between and , is consistent w.r.t. and .

Observation 1

If and are isomorphic and is an isomorphism from to then for every sequence of nodes , , from , is -label consistent. In particular, for every .

If is not an isomorphism then it might be the case that it is not consistent w.r.t. and . We next define a weaker notion of consistency which is being maximally-label-consistent.

Definition 5

We say that a function is maximally -label-consistent if the following holds:

  1. is a bijection.

  2. maps to : for every .

  3. For every such that , maps the elements of to the elements of .

See Figure 1 for illustration for the definitions in this section.

(a)
(b)
(c)
Figure 1: The unknown and known graphs are depicted in Sub-figure 0(a), and 0(b), respectively. The sequence consists of the nodes and . Hence, the label of a node is a binary string , where , and the same for . In the figures, we removed the super (sub) scripts as they are clear from the context. The induced labels are depicted next to the node number. Note that all the nodes have a neighbor in , hence the label class is empty. In Sub-figure 0(c) two tables are depicted: (1) label assignment to the nodes of , and (2) label assignment to the nodes of . The order of which the nodes are presented is according to the (single) isomorphism function between the two graphs. One can observe that the number of nodes per each corresponding label-classes, e.g., a buckets, is the same, as well as buckets which have the same bit, e.g., a cluster of labels. A random bijection that preserves the bucket of labels is depicted by the arrows. We show in Thm. 2 that such a bijection yields graphs which are close to being isomorphic.

2.2 Distributed Algorithm Description

The listing of the distributed algorithm appears in Algorithm 1. The detailed description of the distributed implementation of Algorithm 1 appears in Section 2.4.

Input: A“known” graph, , that is an input to a single node (may be chosen adversarially).
Output: with high probability, all nodes output yes if is isomorphic to and no otherwise.
Compute a BFS tree, , in rooted at . Pick, u.a.r., a sequence of nodes in . Let denote this sequence. Each node computes its label, , according to and its neighbors in (see Definition 1), and sends this label to its neighbors. The node picks a sequence of pairs of nodes, , u.a.r. from . Let . For every , sends down the BFS tree and learns whether or not. Similarly, for every , it learns , i.e., the -label of in . For each sequence, , of nodes from , proceeds as follows:
  1. For every , verify that . If not, then reject as a candidate and proceed to the next sequence.

  2. For every , let denote . Check if .

If not, then reject as a candidate and proceed to the next sequence.
  • Pick uniformly at random a function from the set of all functions that are

  • maximally -label-consistent (see Definition 5).
  • Compute the number of edges in which are non-edges in and vice-versa.

  • 4That is, the number of edges such that and
    5, or and ).
    If it is at most then return yes. If all sequences, , failed to pass the previous step then return no.
    Algorithm 1 Testing Isomorphism: The distributed network is .

    2.3 Correctness of the Distributed Testing Algorithm

    In this subsection we prove the correctness of our algorithm. We begin with a couple of claims and lemmas that we use in our proof. Missing proofs are deferred to Appendix C.

    The proof of the following claim appears in [15]. The proof of Lemma 7 can be derived from the proof of Lemma 4.11 in [15] (for the sake of completeness we provide both proofs in the appendix).

    Claim 6 ([15])

    For and a sequence, , of nodes, chosen uniformly at random, is -separating with probability at least .

    Lemma 7 ([15])

    Let and be isomorphic graphs and let be an isomorphism between them. For any that is an -separating sequence of nodes of and for any that is -label-consistent it holds that .

    The following claim is implied directly from the multiplicative Chernoff’s bound (see Theorem 9 in Section A).

    Claim 8

    Let and be two graphs such that . Then by querying the adjacency-matrices of and in random entries it is possible to distinguish between the case that from the case that with probability at least .

    Lemma 9

    If is isomorphic to then Algorithm 1 accepts with high probability.

    Proof:   Assume is isomorphic to and let denote an isomorphism from to . Since the algorithm goes over every sequence of nodes from , it also checks . By Observation 1, the probability that passes Step 1 is . By Claim 6, with high probability, is -separating (see Definition 2). By Lemma 7, if is -separating, then is -close to . If is -close to , then by Claim 8, with high probability passes Step 1. Therefore by the union bound, the algorithm accepts with high probability.

    Lemma 10

    If is -far from being isomorphic to then Algorithm 1 rejects with high probability.

    Proof:   Assume is -far from being isomorphic to . We claim that with high probability, any sequence , fails to pass Step 1. We show this by bounding the probability that a fixed passes Step 1 and then apply the union bound over all possible sequences. Fix a sequence and assume that passes Step 1 (otherwise we are done). Let be the corresponding from Step 1 (that is chosen at random). Since is -far from being isomorphic to , by definition, . Recall that is chosen uniformly at random from the set of functions that are maximally -label-consistent. It is not hard to verify that and

    are independent random variables. Therefore we can apply Claim 

    8 on Step 1 as is a set of potential edges chosen uniformly at random and, in particular, independently from . By Claim 8, succeeds to pass Step 1 with probability at most for any absolute constant . Thus, the lemma follows by a union bound over all possible sequences, as their number is bounded by .

    2.4 A Detailed Description of the Distributed Implementation of Algorithm 1

    In this section we provide a detailed description of the distributed implementation of our algorithm. We focus on steps for which the implementation is not straightforward and analyze their round complexity. In particular, we focus on Steps 1,6.1,6.1-6.1.

    Step 1: Selecting Nodes u.a.r.

    We propose the following simple procedure to select nodes uniformly at random (which is a kind of folklore). Each node selects a random number in where is an absolute constant. For a fixed pair of nodes, the probability that both nodes pick the same number is at most . Therefore, by union bound over all pairs, with probability at least , all selected numbers are distinct. Conditioned on this event, the nodes with the highest numbers are distributed uniformly at random. Each node sends its ID and its selected number up the BFS tree and the messages are forwarded up the tree in a manner that prioritizes messages whose number is higher. Therefore, the root receives the highest numbers (along with the IDs of the corresponding nodes) in rounds. To see this observe that the message with the highest number is never delayed and in general the message with the -highest number may be delayed for at most rounds.

    Step 6.1: Computing .

    Clearly, can compute for any label as knows and . Therefore, in order to describe the implementation of Step 1 it suffices to explain how can obtain .

    We begin with the special case of . The nodes in that have this label are nodes that are not adjacent to any one of the nodes in . Their number can be computed in rounds by summing it up the BFS tree as follows. Assume w.l.o.g. that every node knows its layer in the BFS-tree. In the first round, every node that is in the last layer (which is also a leaf) sends up to its parent. In the next round, all nodes in the next layer sum up the received numbers and add if their -label is . They send this number up to their parents and so on until we get to the root.

    Consider a label for which at least one bit it . Let denote the maximum such that . Since the node is connected to all nodes whose -label is , it can compute their total number (recall that in Step 1 every node sends its -label to all its neighbors) and send it to the root. Therefore, by a pipelining argument, the root can obtain for every which is a -label of a node in in rounds.

    Steps 6.1-6.1: Accessing .

    Recall that we require from to be chosen u.a.r. from the set of all functions that are maximally -label-consistent (see Definition 5). Recall that is only evaluated on nodes in but at the same time its selection has to be independent of (and ). To this end, the root verifies the following:

    1. In Sub-step 1 of Step 1 it verifies that for the selected sequences, and , corresponding nodes have matching labels. Namely, for every .

    2. In Sub-step 1 of Step 1 it verifies that for every which is a -label of a vertex in .

    If both conditions hold, then it follows that by mapping the nodes in to the nodes in u.a.r. and independently from the mapping of all other nodes (except for the mapping of to which is already determined) for every such that , we are in fact accessing which is drawn according to the desired distribution. Therefore the root simply maps every to a uniform node such that: (1) (2) is still unmapped (such node always exists). Since knows and for every , it is able to compute for every , as desired.

    3 Computing an Approximated Isomorphism

    In this section we prove the following theorem.

    Theorem 2

    Let denote the input graph and let be a graph which is isomorphic to and is given as an input to all nodes in the network. There exists a randomized algorithm such that each node in , , outputs where is a bijection such that is -close to . The round complexity of the algorithm is . The algorithm succeeds with high probability.

    Proof:   The first step of the algorithm is to run Algorithm 1 with the only difference that in Step 1 the root also verifies that for . Since and are isomorphic, by Lemma 9, w.h.p. the algorithm accepts and hence finds and the corresponding that pass Step 1. Recall that w.h.p. is -close to . If every node could output then we were done. However, we can not compute for every node because for a constant fraction of the nodes its computation might require global information on . Instead, our goal is to output which is -close to and can be computed for every node without causing too much congestion. We next describe and its computation.

    We begin with some notation. Let denote the set of labels for which . Let denote the set of nodes in whose -label is . Similarly, let denote the set of nodes in whose -label is . For a graph and a sequence let , namely, this is the set of all nodes in whose -label belongs to . We may refer to as the -th cluster of the graph w.r.t. . For , define . Namely, is the difference between the sizes of the -th clusters in both graphs (w.r.t. and , respectively).

    We next define the set of reserved nodes of , denote by . For each such that , nodes from belong to . Specifically, these are the nodes whose order 777We assume that there is a total order on which is known to all the nodes in . is the least from the vertices in . We consider the order to be the same order as in only that elements in have the highest order (this is to ensure that none of the elements in belong to ).

    We are now ready to describe . Let and let . We assume that as we explain the mapping of the nodes in separately. We consider the following cases.

    The first case is when . We have the following sub-cases.

    1. For every such that , matches u.a.r. the elements in to the elements in .

    2. The rest of the elements in are matched u.a.r. to the unmatched elements in .

    Therefore, in this case the elements in are matched only to the elements in and vice versa.

    The second case is when . We have the following sub-cases.

    1. For every such that and , matches the elements in u.a.r. to the elements in .

    2. For every such that and , matches u.a.r. a random set of elements from to .

    3. The rest of the un-matched elements in are mapped u.a.r. to the un-matched elements in .

    Observe that all the elements in are matched to elements in and that the elements that belong to are still un-matched.

    The third case is when . We have the following sub-cases.

    1. For every such that , matches the elements in u.a.r. to the elements in .

    2. The rest of the elements in are matched u.a.r. to the unmatched elements in .

    3. The remaining elements in are matched to the nodes of order to in .

    This concludes the description of for nodes that do not belong to . Before we explain how is matched to we first describe how can be computed distributively for nodes that have at least one neighbor in (namely, nodes that do not belong to ). Each node is responsible to compute and to send to each node whose -label is in the value (note that and are necessarily neighbors). As a preliminary step, each node computes and sends up the BFS tree. Notice that as for and . The root sends the set down the BFS tree. By knowing and the set , every node can easily compute . It is not hard to see that this suffices in order to match the elements in to as described above.

    We next describe the matching of to and explain how it is computed distributively. We aim to assign to each node in a label in uniquely. This way each node in can match itself to a node in (recall that all nodes know and the total order on ). To this end, we use the BFS tree as follows. Each node in the BFS tree computes how many nodes in its subgraph are in . This can be done in rounds as follows. We assume w.l.o.g. that each node knows its layer in the BFS tree. Let denote the number of layers. We proceed in rounds. In the first round, every node in which is in the -th layer sends to its parent the message . In the next round, all the nodes in layer sum up the messages they received and add if they belong to . Then they send the result to their parents and so on until we end up at the root. Now the root partitions the interval into consecutive sub-intervals and assigns these sub-intervals to its children. Each child receives an interval whose size equals to the number of nodes in its subgraph that are in . In a similar manner, these sub-intervals are partitioned recursively down the tree until each node in is assigned with a unique number in , as desired.

    By construction it follows that is a bijection. The bound on the round complexity follows from the bound on the round complexity of Algorithm 1 and the fact that there are only clusters. It remains to prove the following claim.

    Claim 11

    With high probability .

    Proof:   We observe that both and are random variables. To prove the claim about we couple to . From the fact that w.h.p. with combination with the coupling it will follow that w.h.p. . Therefore, by setting the proximity parameter to be the claim will follow.

    Consider the following description of in terms of . Let . For every , unless or, and . In these cases we match to a node in as described above. Observe that under this formulation the distribution of remains the same. The only difference is that now, for the sake of the analysis, it is coupled to the distribution of .

    It follows that the number of nodes for which is at most . By Step 1 of Algorithm 1, with high probability it holds that . On the other hand, w.h.p., the number of neighbors of every node in is at most . This implies that the number of nodes in the neighborhood of is at most (since otherwise would reject w.h.p.). Therefore, w.h.p., the contribution of the nodes in and to is at most . Thus, w.h.p. , as desired. By Claim 11, w.h.p. , hence we obtain by the union bound that w.h.p. . By setting the proximity parameter appropriately we obtain the desired result.

    This concludes the proof of the Theorem.

    4 Distributed Algorithm for Deciding Isomorphism

    In this section we prove the following theorem.

    Theorem 3

    There exists a randomized distributed algorithm that decides if and are isomorphic with high probability. The round complexity of the algorithm is .

    Proof:   The idea of the algorithm is to go over all possible one-to-one mappings between the nodes of and the nodes of and to test for equality of the corresponding adjacency matrices. The test for equality is performed with very high confidence level in order to ensure that the total error probability is bounded by a small constant. We note that a similar reduction for testing equality also appears in the algorithm of [2, Sec. 6.2] for the Identical Subgraph Detection problem (ISDP).

    The first step of our algorithm is to construct a BFS tree and to assign to each node in the network a unique label in where . This step requires rounds where denotes the diameter of (see details on the implementation of this step in the proof of Theorem 2). Consider the adjacency matrix of , , in which the rows are sorted according to the labels assigned to the nodes. We consider the natural total order on the following set of pairs of nodes in which the pairs are sorted according to the first element and ties are broken according to the second element. Let denote the order of the pair . Each pair corresponds to the potential edge between the pair of nodes with labels and , respectively. The matrix can be represented as an integer where for each the -th lsb (least significant bit) of the binary representation of indicates whether is an edge in the graph. Observe that we can calculate in rounds by starting the calculation at the lowest layer of the BFS tree and summing up the outcomes as we go up the tree, layer by layer. The calculation is performed such that each power of two, , is added (once) if and only if the corresponding edge is present in the graph (i.e. is added if and only if is present in the graph).

    Since the representation of requires bits our goal we calculate instead. To this end we proceed in the same manner as mentioned above only that before the nodes send up the tree the outcome of the intermediate sums, they apply the operation on the outcome. Let be a multiset of prime numbers, where , each chosen independently and uniformly at random from the first primes. 888It is well known that for sufficiency large number the number of prime numbers that are at most is . Let denote the adjacency matrix of . If then the probability that for a random prime number in is at most  [34]. Therefore the probability that every is at most . Thus we can test with one-sided error if and are equal. The soundness of the equality test is . We can apply the equality test for every mapping between and and return yes if and only if there exists a mapping for which the test accepts. Namely, we go over all possible permutations over the nodes of and for each permutation, , we perform the equality test between and where denotes adjacency matrix of after applying the permutation on the nodes. By the soundness of the equality test and the union bound, the probability that this test returns no when and are not isomorphic is at least (for an appropriate adjustment of the parameters).

    By standard pipelining, it is possible to calculate for every in rounds, therefore the round complexity of the above test is . To verify this observe that in order to execute the above test the only information we need is of and the result of for every . This concludes the proof.

    We observe that the above algorithm can be adapted to the semi-streaming model in a straight-forward way as follows.

    Theorem 4

    There exists an algorithm in the semi-streaming model that receives a graph over nodes as an input, where the space for storing is a read-only memory, and a stream of the edges of another graph (according to any order) and decides, with one-sided error, whether and are isomorphic or not. The algorithm performs one-pass and uses bits of space.

    Proof:   Assume w.l.o.g. that the labels of the nodes in and are taken from . Otherwise we can re-name then by using a table of size bits. Let denote the adjacency matrix of . We first compute for every as in the proof of Theorem 3, in one-pass, using bits of space. We then go over all permutations of the nodes of and perform the same computation for the corresponding adjacency matrix. We accept if and only if there exists a permutation for which every , where denotes the adjacency matrix of after we permuted the nodes according to . Observe that we can go over the permutations one by one according to the lexicographical order by using bits of space. Therefore, the total space the algorithm uses is , as desired.

    5 Lower Bounds

    In this section we establish two lower bounds. The first is for the decision variant of GI, and the second is for the testing variant.

    For the decision variant, we prove a near-quadratic lower bound for any deterministic distributed algorithm in the CONGEST model. The second lower bound states that any Isomorphism testing distributed algorithm requires diameter time. This lower bound holds also for randomized algorithms, and in fact holds in the LOCAL model, even if all vertices are given as an input the graph .999In the LOCAL model there is no limitation on message size per round per edge. Obviously, a lower bound in the LOCAL model also applies to the CONGEST model.

    5.1 An Lower Bound for Deciding Isomorphism Deterministically

    The decision variant of our Isomorphism testing problem is as follows: if and are isomorphic, then all nodes should output yes, while if the graphs are not isomorphic, at least one node should output no.

    Theorem 5

    Any distributed deterministic algorithm in the CONGEST model for deciding whether is isomorphic to requires rounds.

    Proof:   We reduce the problem of Set-Equality to the problem of deciding Isomorphism. The reduction is as follows. Alice and Bob each receives as an input a subset of elements, and