1 Introduction
Finding a circuit of minimum size that computes a given Boolean function constitutes the overarching goal in nonuniform complexity theory. It defines an interesting computational problem in its own right, the complexity of which depends on the way the Boolean function is specified. A generic and natural, albeit verbose, way to specify a Boolean function is via its truthtable. The corresponding decision problem is known as the Minimum Circuit Size Problem (): Given a truthtable and a threshold , does there exist a Boolean circuit of size at most that computes the Boolean function specified by the truthtable? The interest in dates back to the dawn of theoretical computer science [Tra84]. It continues today partly due to the fundamental nature of the problem, and partly because of the work on natural proofs and the connections between pseudorandomness and computational hardness.
A closely related problem from Kolmogorov complexity theory is the Minimum KT Problem (), which deals with compression in the form of efficient programs instead of circuits. Rather than asking if the input has a small circuit when interpreted as the truthtable of a Boolean function,
asks if the input has a small program that produces each individual bit of the input quickly. To be more specific, let us fix a universal Turing machine
. We consider descriptions of the input string in the form of a program such that, for every bit position , on input and outputs the th bit of in steps. The cost of such a description is defined as , i.e., the bitlength of the program plus the running time. The complexity of , denoted , is the minimum cost of a description of . is polynomially related to the circuit complexity of when viewed as a truthtable (see Section 2.1 for a more formal treatment). On input a string and an integer , asks whether .Both and are in but are not known to be in or complete. As such, they are two prominent candidates for intermediate status. Others include factoring integers, discrete log over prime fields, graph isomorphism (), and a number of similar isomorphism problems.
Whereas complete problems all reduce one to another, even under fairly simple reductions, less is known about the relative difficulty of presumed intermediate problems. Regarding and , factoring integers and discrete log over prime fields are known to reduce to both under randomized reductions with zerosided error [ABK06, Rud17]. Recently, Allender and Das [AD14] showed that and all of (Statistical Zero Knowledge) reduce to both under randomized reductions with bounded error.
Those reductions and, in fact, all prior reductions of supposedlyintractable problems to / proceed along the same welltrodden path. Namely, / is used as an efficient statistical test to distinguish random distributions from pseudorandom distributions, where the pseudorandom distribution arises from a hardnessbased pseudorandom generator construction. In particular, [KC00] employs the construction based on the hardness of factoring Blum integers, [ABK06, AD14, AKRR10, Rud17] use the construction from [HILL99] based on the existence of oneway functions, and [ABK06, CIKK16] make use of the NisanWigderson construction [NW94]. The property that / breaks the construction implies that the underlying hardness assumption fails relative to / , and thus that the supposedly hard problem reduces to / .
Contributions.
The main conceptual contribution of our paper is a fundamentally different way of constructing reductions to based on a novel use of known interactive proof systems. Our approach applies to and a broad class of isomorphism problems. A common framework for those isomorphism problems is another conceptual contribution. In terms of results, our new approach allows us to eliminate the errors in the recent reductions from to , and more generally to establish zerosided error randomized reductions to from many isomorphism problems within our framework. These include Linear Code Equivalence, Matrix Subspace Conjugacy, and Permutation Group Conjugacy (see Section 6 for the definitions). The technical contributions mainly consist of encodings of isomorphism classes that are efficiently decodable and achieve compression that is at or near the informationtheoretic optimum.
Before describing the underlying ideas, we note that our techniques remain of interest even in light of the recent quasipolynomialtime algorithm for [Bab16]. For one, is still not known to be in , and Group Isomorphism stands as a significant obstacle to this (as stated at the end of [Bab16]). Moreover, our techniques also apply to the other isomorphism problems mentioned above, for which the current best algorithms are still exponential.
Let us also provide some evidence that our approach for constructing reductions to differs in an important way from the existing ones. We claim that the existing approach can only yield zerosided error reductions to from problems that are in , a class which and—a fortiori—none of the other isomorphism problems mentioned above are known to reside in. The reason for the claim is that the underlying hardness assumptions are fundamentally averagecase,^{7}^{7}7In some settings worstcase to averagecase reductions are known, but these reductions are themselves randomized with twosided error. which implies that the reduction can have both false positives and false negatives. For example, in the papers employing the construction from [HILL99], is used in a subroutine to invert a polynomialtimecomputable function (see Lemma 2.1 in Section 2.1), and the subroutine may fail to find an inverse. Given a reliable but imperfect subroutine, the traditional way to eliminate false positives is to use the subroutine for constructing an efficiently verifiable membership witness, and only accept after verifying its validity. As such, the existence of a traditional reduction without false positives from a language to implies that . Similarly, a traditional reduction from to without false negatives is only possible if , and zerosided error is only possible if .
Main Idea.
Instead of using the oracle for in the construction of a candidate witness and then verifying the validity of the candidate without the oracle, we use the power of the oracle in the verification process. This obviates the need for the language to be in in the case of reductions with zerosided error.
Let us explain how to implement this idea for . Recall that an instance of consists of a pair of graphs on the vertex set , and the question is whether , i.e., whether there exists a permutation such that , where denotes the result of applying the permutation to the vertices of . In order to develop a zerosided error algorithm for , it suffices to develop one without false negatives. This is because the false positives can subsequently be eliminated using the known searchtodecision reduction for [KST93].
The crux for obtaining a reduction without false negatives from to is a witness system for the complement inspired by the wellknown tworound interactive proof system for [GMW91]. Consider the distribution where is chosen uniformly at random. By the Orbit–Stabilizer Theorem, for any fixed , is uniform over a set of size and thus has entropy , where denotes the set of automorphisms of . For ease of exposition, let us assume that (which is actually the hardest case for ), so both and have the same entropy . Consider picking uniformly at random, and setting . If , the distributions , , and are all identical, and therefore also has entropy . On the other hand, if , the entropy of equals . The extra bit of information corresponds to the fact that in the nonisomorphic case each sample of reveals the value of that was used, whereas that bit gets lost in the reduction in the isomorphic case.
The difference in entropy suggests that a typical sample of can be compressed more in the isomorphic case than in the nonisomorphic case. If we can compute some threshold such that never
exceeds the threshold in the isomorphic case, and exceeds it with nonnegligible probability in the nonisomorphic case, we have the witness system for
that we aimed for: Take a sample from , and use the oracle for to check that it cannot be compressed at or below the threshold. The entropy difference of 1 may be too small to discern, but we can amplify the difference by taking multiple samples and concatenating them. Thus, we end up with a randomized mapping reduction of the following form, where denotes the number of samples and the threshold:(1) 
We need to analyze how to set the threshold and argue correctness for a value of that is polynomially bounded. In order to do so, let us first consider the case where the graphs and are rigid, i.e., they have no nontrivial automorphisms, or equivalently, .

If , the string contains all of the information about the random string and the random permutations , which amounts to bits of information. This implies that has complexity close to with high probability.

If , then we can efficiently produce each bit of from the adjacency matrix representation of ( bits) and the function table of permutations (for ) such that . Moreover, the set of all permutations allows an efficiently decodable indexing, i.e., there exists an efficient algorithm that takes an index and outputs the function table of the th permutation in according to some ordering. An example of such an indexing is the Lehmer code (see, e.g., [Knu73, pp. 1213] for specifics). This shows that
(2) for some constant , where the first term represents the cost of the indices of bits each, and the second term represents the cost of the bits for the adjacency matrix of and the polynomial running time of the decoding process.
If we ignore the difference between and , the righthand side of (2) becomes , which is closer to than to for any sufficiently large polynomial in , say . Thus, setting halfway between and , i.e., , ensures that holds with high probability if , and never holds if . This yields the desired randomized mapping reduction without false negatives, modulo the rounding issue of to . The latter can be handled by aggregating the permutations into blocks so as to make the amortized cost of rounding negligible. The details are captured in the Blocking Lemma of Section 3.1.
What changes in the case of nonrigid graphs? For ease of exposition, let us again assume that . There are two complications:

We no longer know how to efficiently compute the threshold because and involves the size of the automorphism group.

The Lehmer code no longer provides sufficient compression in the isomorphic case as it requires bits per permutation whereas we only have to spend, which could be considerably less than .
In order to resolve (ii) we develop an efficiently decodable indexing of cosets for any subgroup of given by a list of generators (see Lemma 3.2 in Section 3.2). In fact, our scheme even works for cosets of a subgroup within another subgroup of , a generalization that may be of independent interest (see Lemma Lemma. in the Appendix). Applying our scheme to and including a minimal list of generators for in the description of the program allows us to maintain (2).
Regarding (i), we can deduce a good approximation to the threshold with high probability by taking, for both choices of , a polynomial number of samples of and using the oracle for to compute the exact complexity of their concatenation. This leads to a randomized reduction from to with bounded error (from which one without false positives follows as mentioned before), reproving the earlier result of [AD14] using our new approach (see Remark 3.2 in Section 3.2 for more details).
In order to avoid false negatives, we need to improve the above approximation algorithm such that it never produces a value that is too small, while maintaining efficiency and the property that it outputs a good approximation with high probability. In order to do so, it suffices to develop a probablycorrect overestimator for the quantity , i.e., a randomized algorithm that takes as input an vertex graph , produces the correct quantity with high probability, and never produces a value that is too small; the algorithm should run in polynomial time with access to an oracle for . Equivalently, it suffices to develop a probablycorrect underestimator of similar complexity for .
The latter can be obtained from the known searchtodecision procedures for , and answering the oracle calls to using the above twosided error reduction from to . There are a number of ways to implement this strategy; here is one that generalizes to a number of other isomorphism problems including Linear Code Equivalence.

Find a list of permutations that generates a subgroup of such that the subgroup equals with high probability.
Finding a list of generators for deterministically reduces to . Substituting the oracle for by a twosided error algorithm yields a list of permutations that generates with high probability, and always produces a subgroup of . The latter property follows from the inner workings of the reduction, or can be imposed explicitly by checking every permutation produced and dropping it if it does not map to itself. We use the above randomized reduction from to as the twosided error algorithm for .

Compute the order of the subgroup generated by those permutations.
There is a known deterministic polynomialtime algorithm to do this [Ser03].
The resulting probablycorrect underestimator for runs in polynomial time with access to an oracle for . Plugging it into our approach, we obtain a randomized reduction from to without false negatives. A reduction with zerosided error follows as discussed earlier.
Before applying our approach to other isomorphism problems, let us point out the important role that the Orbit–Stabilizer Theorem plays. A randomized algorithm for finding generators for a graph’s automorphism group yields a probablycorrect underestimator for the size of the automorphism group, as well as a randomized algorithm for without false positives. The Orbit–Stabilizer Theorem allows us to turn a probablycorrect underestimator for into a probablycorrect overestimator for the size of the support of , thereby switching the error from one side to the other, and allowing us to avoid false negatives instead of false positives.
General Framework.
Our approach extends to several other isomorphism problems. They can be cast in the following common framework, parameterized by an underlying family of group actions where is a group that acts on the universe . We typically think of the elements of as abstract objects, which need to be described in string format in order to be input to a computer; we let denote the abstract object represented by the string .
Definition (Isomorphism Problem).
An instance of an Isomorphism Problem consists of a pair that determines a universe and a group that acts on such that and belong to . Each is identified with the permutation induced by the action. The goal is to determine whether there exists such that .
When it causes no confusion, we drop the argument and simply write , , , and . We often blur the—sometimes pedantic—distinction between and . For example, in , each is an binary matrix (a string of length ), and represents the abstract object of a graph with labeled vertices; thus, in this case the correspondence between and is a bijection. The group is the symmetric group , and the action is by permuting the labels.
Table 1 summarizes how the problems we mentioned earlier can be cast in the framework (see Section 6 for details about the last three).
Problem  

Graph Isomorphism  graphs with labeled vertices  
Linear Code Equivalence  subspaces of dimension in  
Permutation Group Conjugacy  subgroups of  
Matrix Subspace Conjugacy  subspaces of dimension in 
We generalize our construction for to any Isomorphism Problem by replacing where is chosen uniformly at random, by where is chosen uniformly at random. The analysis that the construction yields a randomized reduction without false negatives from the Isomorphism Problem to carries over, provided that the Isomorphism Problem satisfies the following properties.

The underlying group is efficiently samplable, and the action is efficiently computable. We need this property in order to make sure the reduction is efficient.

There is an efficiently computable normal form for representing elements of as strings. This property trivially holds in the setting of as there is a unique adjacency matrix that represents any given graph on the vertex set . However, uniqueness of representation need not hold in general. Consider, for example, Permutation Group Conjugacy. An instance of this problem abstractly consists of two permutation groups , represented (as usual) by a sequence of elements of generating each group. In that case there are many strings representing the same abstract object, i.e., a subgroup has many different sets of generators.
For the correctness analysis in the isomorphic case it is important that acts on the abstract objects, and not on the binary strings that represent them. In particular, the output of the reduction should only depend on the abstract object , and not on the way was provided as input. This is because the latter may leak information about the value of the bit that was picked. The desired independence can be guaranteed by applying a normal form to the representation before outputting the result. In the case of Permutation Group Conjugacy, this means transforming a set of permutations that generate a subgroup into a canonical set of generators for .
In fact, it suffices to have an efficiently computable complete invariant for , i.e., a mapping from representations of objects from to strings such that the image only depends on the abstract object, and is different for different abstract objects.

There exists a probablycorrect overestimator for that is computable efficiently with access to an oracle for . We need this property to set the threshold with correctly.

There exists an encoding for cosets of in that achieves complexity close to the informationtheoretic optimum (see Section 2.2 for the definition of an encoding). This property ensures that in the isomorphic case the complexity is never much larger than the entropy.
Properties 1 and 2 are fairly basic. Property 4 may seem to require an instantiationdependent approach. However, in Section 4 we develop a generic hashingbased encoding scheme that meets the requirements. In fact, we give a nearlyoptimal encoding scheme for any samplable distribution that is almost flat, without reference to isomorphism. Unlike the indexings from Lemma Lemma. for the special case where is the symmetric group, the generic construction does not achieve the informationtheoretic optimum, but it comes sufficiently close for our purposes.
The notion of a probablycorrect overestimator in Property 3 can be further relaxed to that of a probablyapproximatelycorrect overestimator, or pac overestimator for short. This is a randomized algorithm that with high probability outputs a value within an absolute deviation bound of from the correct value, and never produces a value that is more than below the correct value. More precisely, it suffices to efficiently compute with access to an oracle for a pac overestimator for with deviation . The relaxation suffices because of the difference of about 1/2 between the threshold and the actual values in both the isomorphic and the nonisomorphic case. As , it suffices to have a pac overestimator for and a pac underestimator for , both to within deviation and of the required efficiency.
Generalizing our approach for , one way to obtain the desired underestimator for is by showing how to efficiently compute with access to an oracle for :

a list of elements of that generates a subgroup of such that with high probability, and

a pac underestimator for , the logarithm of the order of the subgroup generated by a given list of elements of .
Further mimicking our approach for , we know how to achieve (a) when the Isomorphism Problem allows a searchtodecision reduction. Such a reduction is known for Linear Code Equivalence, but remains open for problems like Matrix Subspace Conjugacy and Permutation Group Conjugacy. However, we show that (a) holds for a generic isomorphism problem provided that products and inverses in can be computed efficiently (see Lemma 5.2 in Section 5.2). The proof hinges on the ability of to break the pseudorandom generator construction of [HILL99] based on a purported oneway function (Lemma 2.1 in Section 2.1).
As for (b), we know how to efficiently compute the order of the subgroup exactly in the case of permutation groups (), even without an oracle for , and in the case of many matrix groups over finite fields () with oracle access to , but some cases remain open (see footnote 8 in Section 5.2 for more details). Instead, we show how to generically construct a pac underestimator with small deviation given access to as long as products and inverses in can be computed efficiently, and allows an efficient complete invariant (see Lemma 5.2 in Section 5.2). The first two conditions are sufficient to efficiently generate a distribution on that is uniform to within a small relative deviation [Bab91]. The entropy of that distribution equals to within a small additive deviation. As is almost flat, our encoding scheme from Section 4 shows that has an encoding whose length does not exceed by much, and that can be decoded by small circuits. Given an efficient complete invariant for , we can use an approach similar to the one we used to approximate the threshold to construct a pac underestimator for with small additive deviation, namely the amortized complexity of the concatenation of a polynomial number of samples from . With access to an oracle for we can efficiently evaluate . As a result, we obtain a pac underestimator for with a small additive deviation that is efficiently computable with oracle access to .
The above ingredients allow us to conclude that all of the isomorphism problems in Table 1 reduce to under randomized reductions without false negatives. Moreover, we argue that Properties 1 and 2 are sufficient to generalize the construction of Allender and Das [AD14], which yields randomized reductions of the isomorphism problem to without false positives (irrespective of whether a searchtodecision reduction is known). By combining both reductions, we conclude that all of the isomorphism problems in Table 1 reduce to under randomized reductions with zerosided error. See Sections 5 and 6 for more details.
Open Problems.
The difference in compressibility between the isomorphic and nonisomorphic case is relatively small. As such, our approach is fairly delicate. Although we believe it yields zerosided error reductions to as well, we currently do not know whether that is the case. An open problem in the other direction is to develop zeroerror reductions from all of to . We refer to Section 7 for further discussion and other future research directions.
Relationship with arXiv 1511.08189.
This report subsumes and significantly strengthens the earlier report [AGM15].

Whereas [AGM15] only proves the main result for on rigid graphs, and for Graph Automorphism () on arbitrary graphs, this report proves it for on arbitrary graphs (which subsumes the result for on arbitrary graphs).

Whereas [AGM15] only contains the main result for , this report presents a framework for a generic isomorphism problem, and generalizes the main result for to any problem within the framework that satisfies some elementary conditions. In particular, this report shows that the generalization applies to Linear Code Equivalence, Permutation Group Conjugacy, and Matrix Subspace Conjugacy. The generalization involves the development of a generic efficient encoding scheme for samplable almostflat distributions that is close to the informationtheoretic optimum, and reductions to for the following two tasks: computing a generating set for the automorphism group, and approximating the size of the subgroup generated by a given list of elements.

The main technical contribution in [AGM15] (efficiently indexing the cosets of the automorphism group) was hard to follow. This report contains a clean proof using a different strategy, which also generalizes to indexing cosets of subgroups of any permutation group, answering a question that was raised during presentations of [AGM15].

The exposition is drastically revised.
2 Preliminaries
We assume familiarity with standard complexity theory, including the boundederror randomized polynomialtime complexity classes (twosided error), (onesided error, i.e., no false positives), and (zerosided error, i.e., no false positives, no false negatives, and bounded probability of no output). In the remainder of this section we provide more details about complexity, formally define the related notions of indexing and encoding, and review some background on graph isomorphism.
2.1 KT Complexity
The measure that we informally described in Section 1, was introduced and formally defined as follows in [ABK06]. We refer to that paper for more background and motivation for the particular definition.
Definition ().
Let be a universal Turing machine. For each string , define to be
We define if ; thus, for the machine accepts iff . The notation indicates that the machine has random access to the description .
is defined to be equal to for a fixed choice of universal machine with logarithmic simulation time overhead [ABK06, Proposition 5]. In particular, if consists of the description of a Turing machine that runs in time and some auxiliary information such that for , then , where and is a constant depending on . It follows that where represents the circuit complexity of the mapping [ABK06, Theorem 11].
The Minimum KT Problem is defined as . [ABK06] showed that an oracle for is sufficient to invert on average any function that can be computed efficiently. We use the following formulation:
Lemma (follows from Theorem 45 in [Abk06]).
There exists a polynomialtime probabilistic Turing machine using oracle access to so that the following holds. For any circuit on input bits,
where the probability is over the uniform distribution of
and the internal coin flips of .2.2 Random Variables, Samplers, Indexings and Encodings
A finite probability space consists of a finite sample space
, and a probability distribution
on . Typical sample spaces include finite groups and finite sets of strings. The probability distributions underlying our probability spaces are always uniform.A random variable is a mapping from the sample space to a set , which typically is the universe
of a group action or a set of strings. The random variable
with the uniform distribution on induces a distribution on . We sometimes use to denote the induced distribution as well.The support of a distribution on a set is the set of elements with positive probability . A distribution is flat if it is uniform on its support. The entropy of a distribution is the expected value of . The minentropy of is the largest real such that for every . The maxentropy of is the least real such that for every . For a flat distribution, the min, max, and ordinary entropy coincide and equal the logarithm of the size of the support. For two distributions and on the same set , we say that approximates within a factor if for all . In that case, and have the same support, and if has minentropy , then has minentropy at least , and if has maxentropy , then has maxentropy at most .
A sampler within a factor for a distribution on a set is a random variable that induces a distribution that approximates within a factor . We say that samples within a factor from length . If we call the sampler exact. The choice of reflects the fact that distributions need to be generated from a source of random bits. Factors with are necessary in order to sample uniform distributions whose support is not a power of 2.
We consider ensembles of distributions where ranges over . We call the ensemble samplable by polynomialsize circuits if there exists an ensemble of random variables where ranges over the positive rationals such that samples within a factor from length and can be computed by a circuit of size . We stress that the circuits can depend on the string , not just on . If in addition the mappings and can be computed in time , we call the ensemble uniformly samplable in polynomial time.
One way to obtain strings with high complexity is as samples from distributions with high minentropy.
Proposition.
Let be sampled from a distribution with minentropy . For all , we have except with probability at most .
One way to establish upper bounds on complexity is via efficiently decodable encodings into integers from a small range. Encodings with the minimum possible range are referred to as indexings. We use these notions in various settings. The following formal definition is for use with random variables and is general enough to capture all the settings we need. It defines an encoding via its decoder ; the range of the encoding corresponds to the domain of .
Definition (encoding and indexing).
Let be a random variable. An encoding of is a mapping such that for every there exists such that . We refer to as the length of the encoding. An indexing is an encoding with .
Definition 2.2 applies to a set by identifying with the random variable that is the identity mapping on . It applies to the cosets of a subgroup of a group by considering the random variable that maps to the coset . It applies to a distribution induced by a random variable by considering the random variable itself.
We say that an ensemble of encodings is decodable by polynomialsize circuits if for each there is a circuit of size that computes for every . If in addition the mapping is computable in time , we call the ensemble uniformly decodable in polynomial time.
2.3 Graph Isomorphism and the OrbitStabilizer Theorem
Graph Isomorphism () is the computational problem of deciding whether two graphs, given as input, are isomorphic. A graph for us is a simple, undirected graph, that is, a vertex set , and a set of unordered pairs of vertices. An isomorphism between two graphs is a bijection that preserves both edges and nonedges: if and only if . An isomorphism from a graph to itself is an automorphism; the automorphisms of a given graph form a group under composition, denoted . The Orbit–Stabilizer Theorem implies that the number of distinct graphs isomorphic to equals . A graph is rigid if , i.e., the only automorphism is the identity, or equivalently, all permutations of yield distinct graphs.
More generally, let be a group acting on a universe . For , each is an isomorphism from to . is the set of isomorphisms from to itself. By the Orbit–Stabilizer Theorem the number of distinct isomorphic copies of equals .
3 Graph Isomorphism
In this section we show:
Theorem.
.
The crux is the randomized mapping reduction from deciding whether a given pair of vertex graphs is in to deciding whether , as prescribed by (1). Recall that (1) involves picking a string and permutations at random, and constructing the string , where . We show how to determine such that a sufficiently large polynomial guarantees that the reduction has no false negatives. We follow the outline of Section 1, take the same four steps, and fill in the missing details.
3.1 Rigid Graphs
We first consider the simplest setting, in which both and are rigid. We argue that works, where .
Nonisomorphic Case. If , then (by rigidity), each choice of and each distinct sequence of permutations results in a different string , and thus the distribution on the strings has entropy where . Thus, by Proposition 2.2, with all but exponentially small probability in . Thus with high probability the algorithm declares and nonisomorphic.
Isomorphic Case. If , we need to show that always holds. The key insight is that the information in is precisely captured by the permutations such that . These permutations exist because ; they are unique by the rigidity assumption. Thus, contains at most bits of information. We show that its complexity is not much larger that this. We rely on the following encoding, due to Lehmer (see, e.g., [Knu73, pp. 12–33]):
Proposition (Lehmer code).
The symmetric groups have indexings that are uniformly decodable in time .
To bound , we consider a program that has the following information hardwired into it: , the adjacency matrix of , and the integers encoding . We use the decoder from Proposition 3.1 to compute the th bit of on input . This can be done in time given the hardwired information.
As mentioned in Section 1, a naïve method for encoding the indices only gives the bound on , which may exceed and—a fortiori—the threshold , no matter how large a polynomial is. We remedy this by aggregating multiple indices into blocks, and amortizing the encoding overhead across multiple samples. The following technical lemma captures the technique. For a set of strings and , the statement uses the notation to denote the set of concatenations of strings from ; we refer to Section 2.2 for the other terminology.
Lemma (Blocking Lemma).
Let be an ensemble of sets of strings such that all strings in have the same length . Suppose that for each and , there is a random variable whose image contains , and such that the is computable by a circuit of size and has an encoding of length decodable by a circuit of size . Then there are constants and so that, for every constant , every , every sufficiently large , and every
We first show how to apply the Blocking Lemma and then prove it. For a given rigid graph , we let be the image of the random variable that maps to (an adjacency matrix viewed as a string of bits). We let be the fold Cartesian product of , i.e., takes in permutations , and maps them to . is computable by (uniform) circuits of size . To encode an outcome , we use as index the number whose base representation is written , where is the index of from the Lehmer code. This indexing has length . Given an index, the list of permutations can be decoded by (uniform) circuits of size . By the Blocking Lemma, we have that
(3) 
for some constants , every constant , and all sufficiently large , where we use the fact that . Setting , this becomes . Taking , we see that for all sufficiently large , .
Proof (of Lemma 3.1).
Let and be the hypothesized random variables and corresponding decoders. Fix and , let denote the length of the strings in , and let be a parameter to be set later.
To bound , we first break into blocks where each . As the image of contains , is encoded by some index of length .
Consider a program that has , , , , the circuit for computing , the circuit for computing , and the indices hardwired, takes an input , and determines the th bit of as follows. It first computes so that points to the th bit position in . Then, using , , , and , it finds such that equals the . Finally, it computes and outputs the th bit, which is the th bit of .
The bitlength of is at most for the indices, plus for the rest. The time needed by is bounded by . Thus , where we used the fact that . The lemma follows by choosing .
3.2 Known Number of Automorphisms
We generalize the case of rigid graphs to graphs for which we know the size of their automorphism groups. Specifically, in addition to the two input graphs and , we are also given numbers where . Note that if , we can right away conclude that . Nevertheless, we do not assume that as the analysis of the case will be useful in Section 3.3.
The reduction is the same as in Section 3.1 with the correct interpretation of . The main difference lies in the analysis, where we need to accommodate for the loss in entropy that comes from having multiple automorphisms.
Let be the entropy in a random permutation of . Set , and . In the nonisomorphic case the minentropy of is at least , so with high probability. In the isomorphic case we upper bound by about . Unlike the rigid case, we can no longer afford to encode an entire permutation for each permuted copy of ; we need a replacement for the Lehmer code. The following encoding, applied to , suffices to complete the argument from Section 3.1.
Lemma.
For every subgroup of there exists an indexing of the cosets^{8}^{8}8The choice of left () vs right () cosets is irrelevant for us; all our results hold for both, and one can usually switch from one statement to the other by taking inverses. Related to this, there is an ambiguity regarding the order of application in the composition of two permutations: first apply and then , or vice versa. Both interpretations are fine. For concreteness, we assume the former. of that is uniformly decodable in polynomial time when is given by a list of generators.
We prove Lemma 3.2 in the Appendix as a corollary to a more general lemma that gives, for each , an efficiently computable indexing for the cosets of in .
Remark.
Before we continue towards Theorem 3, we point out that the above ideas yield an alternate proof that (and hence that ). This weaker result was already obtained in [AD14] along the welltrodden path discussed in Section 1; this remark shows how to obtain it using our new approach.
The key observation is that in both the isomorphic and the nonisomorphic case, with high probability stays away from the threshold by a growing margin, Moreover, the above analysis allows us to efficiently obtain highconfidence approximations of to within any constant using sampling and queries to the oracle.
More specifically, for , let denote the concatenation of independent samples from . Our analysis shows that always holds, and that holds with high probability. Thus, approximates with high confidence to within an additive deviation of . Similarly, approximates to within the same deviation margin, and approximates to within an additive deviation of . The latter bound can be made less than 1 by setting to a sufficiently large polynomial in and . Moreover, all these estimates can be computed in time with access to as enables us to evaluate efficiently.
3.3 ProbablyCorrect Underestimators for the Number of Automorphisms
The reason the algorithm in Remark 3.2 can have false negatives is that the approximation to may be too small. Knowing the quantities exactly allows us to compute exactly and thereby obviates the possibility of false negatives. In fact, it suffices to compute overestimates for the quantities which are correct with nonnegligible probability. We capture this notion formally as follows:
Definition (probablycorrect overestimator).
Let be a function, and a randomized algorithm that, on input , outputs a value . We say that is a probablycorrect overestimator for if, for every , holds with probability at least , and otherwise. A probablycorrect underestimator for is defined similarly by reversing the inequality.
We point out that, for any probablycorrect over/underestimator, taking the min/max among independent runs yields the correct value with probability .
We are interested in the case where . Assuming this on a given class of graphs has a probablycorrect overestimator computable in randomized polynomial time with an oracle, we argue that on reduces to in randomized polynomial time without false negatives.
To see this, consider the algorithm that, on input a pair of vertex graphs, computes as estimates of the true values , and then runs the algorithm from Section 3.2 using the estimates .

In the case where and are not isomorphic, if both estimates are correct, then the algorithm detects with high probability.

In the case where , if we showed in Section 3.2 that the algorithm always declares and to be isomorphic. Moreover, increasing can only decrease the probability of a false negative. As the computed threshold increases as a function of , and the estimate is always at least as large as , it follows that and are always declared isomorphic.
3.4 Arbitrary Graphs
A probablycorrect overestimator for the function on any graph can be computed in randomized polynomial time with access to . The process is described in full detail in Section 1, based on a algorithm for (taken from Remark 3.2 or from [AD14]). This means that the setting of Section 3.3 is actually the general one. The only difference is that we no longer obtain a mapping reduction from to , but an oracle reduction: We still make use of (1), but we need more queries to in order to set the threshold .
This shows that . As follows from the known searchtodecision reduction for , this concludes the proof of Theorem 3 that .
4 Estimating the Entropy of Flat Samplable Distributions
In this section we develop a key ingredient in extending Theorem 3 from to other isomorphism problems that fall within the framework presented in Section 1, namely efficient nearoptimal encodings of cosets of automorphism groups. More generally, our encoding scheme works well for any samplable distribution that is flat or almost flat. It allows us to probablyapproximatelycorrectly underestimate the entropy of such distributions with the help of an oracle for .
We first develop our encoding, which only requires the existence of a sampler from strings of polynomial length. The length of the encoding is roughly the maxentropy of the distribution, which is the informationtheoretic optimum for flat distributions.
Lemma (Encoding Lemma).
Consider an ensemble of random variables that sample distributions with maxentropy from length . Each has an encoding of length that is decodable by polynomialsize circuits.
To see how Lemma 4 performs, let us apply to the setting of . Consider the random variable mapping a permutation to . The induced distribution is flat and has entropy , and each can be sampled from strings of length . The Encoding Lemma thus yields an encoding of length that is efficiently decodable. The bound on the length is worse than Lemma 3.2’s bound of , but will still be sufficient for the generalization of Theorem 3 and yield the result for .
We prove the Encoding Lemma using hashing. Here is the idea. Consider a random hash function where denotes the length of the strings in the domain of for a given , and is set slightly below . For any fixed outcome of , there is a positive constant probability that no more than about of all samples have , and at least one of these also satisfies . Let us say that works for when both those conditions hold. In that case—ignoring efficiency considerations—about bits of information are sufficient to recover a sample satisfying from .
Now a standard probabilistic argument shows that there exists a sequence of hash functions such that for every possible outcome , there is at least one that works for . Given such a sequence, we can encode each outcome as the index of a hash function that works for , and enough bits of information that allow us to efficiently recover given . We show that bits suffice for the standard linearalgebraic family of hash functions. The resulting encoding has length and is decodable by circuits of polynomial size.
Proof (of Lemma 4).
Recall that a family of functions from to is universal if for any two distinct , the distributions of and for a uniform choice of are independent and uniform over . We make use of the specific universal family that consists of all functions of the form , where is a binary matrix,
is a binary column vector of dimension
, and is also viewed as a binary column vector of dimension [CW79]. Uniformly sampling from means picking and uniformly at random.Claim.
Let and .

For every universal family with , and for every