# Clifford Circuits can be Properly PAC Learned if and only if =

Given a dataset of input states, measurements, and probabilities, is it possible to efficiently predict the measurement probabilities associated with a quantum circuit? Recent work of Caro and Datta (2020) studied the problem of PAC learning quantum circuits in an information theoretic sense, leaving open questions of computational efficiency. In particular, one candidate class of circuits for which an efficient learner might have been possible was that of Clifford circuits, since the corresponding set of states generated by such circuits, called stabilizer states, are known to be efficiently PAC learnable (Rocchetto 2018). Here we provide a negative result, showing that proper learning of CNOT circuits is hard for classical learners unless =. As the classical analogue and subset of Clifford circuits, this naturally leads to a hardness result for Clifford circuits as well. Additionally, we show that if = then there would exist efficient proper learning algorithms for CNOT and Clifford circuits. By similar arguments, we also find that an efficient proper quantum learner for such circuits exists if and only if ⊆.

## Authors

• 2 publications
10/11/2021

### Learnability of the output distributions of local quantum circuits

There is currently a large interest in understanding the potential advan...
02/04/2020

### Pseudo-dimension of quantum circuits

We characterize the expressive power of quantum circuits with the pseudo...
07/28/2020

### On the Quantum versus Classical Learnability of Discrete Distributions

Here we study the comparative power of classical and quantum learners fo...
02/28/2019

### Efficient classical simulation of Clifford circuits with nonstabilizer input states

We investigate the problem of evaluating the output probabilities of Cli...
03/07/2019

### Quantum hardness of learning shallow classical circuits

In this paper we study the quantum learnability of constant-depth classi...
09/06/2017

### Quantum Advantage from Conjugated Clifford Circuits

A well-known result of Gottesman and Knill states that Clifford circuits...
11/11/2020

### StoqMA meets distribution testing

𝖲𝗍𝗈𝗊𝖬𝖠 captures the computational hardness of approximating the ground e...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The goal of efficient learning of quantum states and the circuits that act on them, is to be able to predict the outcome of various measurements with some degree of accuracy. For example, given a quantum state and a two-outcome measurement can we predict the probability that the measurement accepts?

Naively, one can try and learn everything there is to know about the system, a technique known as tomography with versions for quantum states [35, 36, 27] and quantum processes [21, 4]

. However, this requires exponential time in the number of qubits due to information theoretic reasons related to the exponential dimension of the system. This exponential bound remains when trying to find a state close in trace distance as shown by a combination of

Flammia et al. [24] and Holevo’s bound. To address this, one can choose to restrict the type of information one wanted to learn, which led to the ideas of shadow tomography [2] and classical shadows [28]. By only needing to predict the value of observables one is able to use only a number of measurements that is polynomial in the number of qubits and polylogarithmic in . In a similar vein, Aaronson [1] proposed the idea of PAC learning quantum states, which is the idea of learning relative to some distribution over measurements, but only being given samples from that distribution as well (see Section 2.5 for details).

An alternative direction was to restrict the class of objects being learned on, but allow one to choose what kind of measurements are taken. Montanaro [34] was able to learn stabilizer states and later Low [33] learned an unknown Clifford circuit. Lai and Cheng [32] built on these results in the case of actually recovering the circuit, as well as limited learning in the presence of a small amount of non-Clifford gates. Stabilizer states and Clifford circuits are of particular interest to the quantum information community because many of our quantum communication protocols and well known quantum query algorithms [12, 11, 10, 39, 14, 25] utilize these states and circuits. Gottesman and Knill [26] (with improvements by Aaronson and Gottesman [3]) were also able to give an efficient classical simulation of these objects, showing these class of objects to seemingly be much simpler than the set of all quantum states or circuits. Combined with the fact that stabilizer states are good approximations to Haar random states [41, 31], we get a set of circuits and states that are highly quantum with many interesting uses, but still have enough exploitable structure to be classically simulable, making them a prime candidate for learning.

Rocchetto [37] was able to combine the ideas of PAC learning with the structure provided by restricting to stabilizer states to give an efficient PAC algorithm for learning stabilizer states. In particular, the algorithm given was a proper learner, in that the hypothesis state was required to also be a stabilizer state, rather than something that is non-stabilizer but acts close to stabilizer and with low error. Caro and Datta [19] extended the ideas of PAC learning to quantum circuits, giving an analogous generalization theorem to Aaronson [1]. As such, an open problem was whether or not Clifford circuits could be efficiently PAC learned in an analogous way to stabilizer states. To that extent, we show in this paper that an efficient proper learner for Clifford circuits exists if and only if . Here, we are given inputs of the form for some stabilizer state , Pauli matrix with labels corresponding to an unknown Clifford circuit and asked to predict future labels. Furthermore, this is true even just for a learner of a subset of Clifford circuits called CNOT circuits. This subset essentially restricts to the set of Clifford circuits that map computational basis states to other computational basis states and these circuits are highly related to the complexity class (see Section 2.4). Note however that we slightly alter the definition of PAC learning a quantum circuit from that of Caro and Datta [19] to a setting we find more comparable to Aaronson [1]’s original PAC learning result for quantum states. In the setting introduced by Caro and Datta, the measurements were limited to being rank 1 projectors with product structure, rather than the rank projectors we use in our proof.

One can also imagine that the learning algorithm has access to a quantum computer. Since there exists problems like factoring [38] for which we have an efficient quantum algorithm but not an efficient classical algorithm, this learner may be able to efficiently learn more expressive concept classes. We also give results for this setting by relating to RQP the quantum analogue of RP. We now informally state our main theorems regarding CNOT and Clifford circuits.

###### Theorem 1.1.

There exists an efficient randomized proper PAC learner for CNOT circuits if and only if . Furthermore, an efficient quantum proper PAC learner for CNOT circuits exists if and only if .

###### Corollary 1.2.

There exists an efficient randomized proper PAC learner for Clifford circuits if and only if . Furthermore, an efficient quantum proper PAC learner for Clifford circuits exists if and only if .

We prove this by first realizing that finding a CNOT circuit with zero training error requires finding a full rank matrix in an affine subspace of matrices under matrix addition (so as to differentiate from a coset of a matrix group using matrix multiplication). This is known as the NonSingularity problem [17] and is -Complete. While this may seems like a backwards reduction, it turns out that the set of matrix affine subspaces used to show that NonSingularity can solve 3SAT are a subset of the ones needed to learn CNOT circuits with zero training error. Thus, there exist a set of samples such that a CNOT circuit with zero training error exists if and only if the SAT instance is satisfiable. Finding such a CNOT circuit is what is known as the search version of the consistency problem and in turn the decision version of the concistency problem is also -Complete.

To show that an efficient proper learner for CNOT circuits implies , let

be some sample from the decision version of the consistency problem for CNOT circuits. Using the uniform distribution over each element in

, we will sample every element of with high probability given enough queries. Thus we are able to show that an efficient learner would necessarily also solve the consistency problem with high enough probability to create a solution in RP.

To show that implies an efficient learner, we utilize search-to-decision reductions for -complete problems to get an efficient algorithm for the search problem of minimizing training error. We can treat this search algorithm as our means of generating such a hypothesis circuit with low training error. By the generalization theorem provided by Caro and Datta [19], assuming we have enough samples this will properly generalize and have low true error, thus completing the proof. The quantum forms of the proof essentially come for free by replacing RP with RQP everywhere and using learners capable of doing quantum computation.

### 1.1 Related Work

We note that we are dealing with the problem of classically PAC learning a classical function (i.e. classical labels) derived from a quantum system. This is as opposed to quantum PAC learning of a classical function as in Arunachalam and de Wolf [5, 6], Arunachalam et al. [7, 9] where instead of a distribution over samples we receive access to copies of a quantum state. This state results in the same distribution classically when measured in the computational basis but can be measured in other basis to get different results. There is also the attempt to directly learn a quantum process with quantum labels, as in Chung and Lin [22], Caro [18]. Here, they do not choose to measure the output state, and have samples of the form for quantum process . Other related quantum learning works, some of which are outside the PAC model, include Yoganathan [40], Low [33], Cheng et al. [20].

## 2 Preliminaries

### 2.1 Quantum States and Circuits

A quantum state on qubits is a PSD matrix with trace . If the matrix is rank then we refer to being a pure state, since it can be decomposed as where is a

-dimensional column vector with norm 1 and

is its complex conjugate. A two-outcome measurement is then a projector such that such that the probability of a ‘1’ outcome is and the probability of a ‘0’ outcome is , leaving the expectation value as simply .

A quantum process is how one evolves a quantum state, and therefore it must preserve the trace and the PSD condition. We will be primarily interested in quantum circuits, which are the subset of quantum processes that map pure states only to other pure states. These are constrained to be unitary operations, such that after acting on with the circuit , the state that we are left with is where is the complex conjugate of .

### 2.2 Paulis and Stabilizer States/Groups

We will start by giving the following matrices, known as the Pauli matrices.

 I=(1001)X=(0110)Y=(0−ii0)Z=(100−1)

Noting that these are all unitaries that act on a single qubit, we can generalize to qubits.

###### Definition 2.1.

Let be the matrix group consisting all -qubit Paulis with phase or .

We’ll also introduce some shorthand notation:

###### Definition 2.2.

Let and be the Pauli acting only on the -th qubit with or

respectively and the identity matrix on all other qubits.

###### Definition 2.3.

For , let and .

Note that , assuming the dimensions of and match. It is easy to see that also implies that .

A stabilizer state is any state that can be written as , where is an abelian subgroup without the negative identity. is known as the stabilizer group of . As it turns out, if is order then will be a pure state. This leads to the alternative (and more popular definition) where , is the unique state that is stabilized by . That is, for all , . This definition shows why isn’t allowed to be in , since stabilizes nothing. It also shows why one must restrict the entries of to only have real phase.

###### Proposition 2.4.

Any abelian subgroup of cannot contain any Paulis with an imaginary phase

###### Proof.

Given a Pauli with an imaginary phase, it’s square would be equal to , making the group not closed. This is a contradiction. ∎

One of the reasons stabilizer states are so important is this bijection between the stabilizer group of a stabilizer state and the state itself; by simply knowing the generators of the state one can easily reconstruct the state. And since there are at most generators, if one can efficiently write down the generators themselves then there is a polynomial size representation of a stabilizer state. We now show how one can write down any member of a stabilizer group as follows. Given, with real phase such that , define a function for each qubit , , , and concatenate to make . Additionally, have an extra bit for the sign for whether the sign is or . This results in a bit string for each generator, so writing down a stabilizer state requires only bits to write down classically.

### 2.3 Clifford Circuits

###### Definition 2.5.

A Clifford circuit is a unitary such that , while ignoring global phase on the unitary. More formally, consider the normalizer , and let be the Clifford group.

More informally, a Clifford circuit maps stabilizer states to other stabilizer states.

###### Lemma 2.6.

Aaronson and Gottesman [3] Any Clifford circuit has an equivalent Clifford circuit with gates and parallel depth .

Like stabilizer states, generators are an important part of how we deal with Clifford circuits. If we consider how a Clifford circuit acts on the canonical generators of , we find that

 UXjU†=(−1)pjn∏i=1XαijiZβijiUZjU†=(−1)qjn∏i=1XγijiZθiji. (1)

A Clifford circuit can then be encoded as a boolean matrix where column is equal to and column is equal to . However, because commutation relations are preserved, not all possible values of are allowed (the and values can be arbitrary). This leads us to the idea of symplectic matrices.

###### Definition 2.7.

A symplectic matrix over is a matrix with entries in such that

 (2)

These matrices form the symplectic group .

The symplectic matrices preserve the symplectic inner product on . It turns out that if we consider the submatrix defined by the first rows of our potential encoding of a clifford circuit, a necessary and sufficient condition to preserving the commutation relations of the generators is for this submatrix to be symplectic, as form what is known a symplectic basis. Formally, , where the Pauli in the divisor determines the and values.

### 2.4 CNOT circuits and ⊕L

It is a well known fact that every Clifford circuit can be generated using only , , and CNOT gates as defined below:

 H=1√2(111−1)P=(100i)CNOT=⎛⎜ ⎜ ⎜⎝1000010000010010⎞⎟ ⎟ ⎟⎠ (3)

We note that . If we restrict to the subset of circuits that are generated by only and CNOT, we get what are known as CNOT circuits [3], which are a clear subset of Clifford circuits.

###### Definition 2.8 (Aaronson and Gottesman [3]).

The complexity class is the class of problems that reduce to simulating a polynomial-size CNOT circuit.

A perhaps more familiar definition for complexity theorists is the class of problems that are solvable by a nondeterministic logarithmic-space Turing machine that accepts if and only if the total number of accepting paths is odd.

Let us now consider the set of all Clifford circuit that map computational basis states to other computational basis states, thereby stabilizing the subgroup . Very briefly, we will call these classical Clifford circuits as we will now prove that they are largely equivalent to CNOT circuits.

###### Proposition 2.9.

Let be an arbitrary classical Clifford circuit. It can be efficiently generated using solely , , and CNOT gates. Moreover, it’s effect on the computational basis states can be entirely simulated using only and CNOT.

###### Proof.

Let us first consider what happens to a computational basis state by acted upon by . Referencing Eq. 1, the must be for all and , and we will essentially ignore , , and for now leaving us with and . Let us now view the as a matrix over . Since every member of is full rank, must also be full rank. As an example, since the identity circuit is also a classical Clifford circuit, the resulting is the identity matrix. We note that a CNOT from qubit to qubit performs the rowsum operation of adding row to row . Thus it is possible to efficiently construct a circuit with matching using rowsum operations via CNOT gates. To get a matching , one can simply apply an X gate at the beginning of each qubit that has , since , and the following CNOT gates will not itself introduce any negative phases. From here, we have already proved the moreover statement.

To prove the full result, we return to the and . We will show that there exists a single unique solution. Similar to we will define the corresponding matrices and for the and respectively. Based on Eq. 2, to form a symplectic basis we find that and , since . Clearly , which is guaranteed to exist, and since will also be full rank. To match the values we simply place gates in front of the qubits where , similar to the gates for . ∎

Another way of viewing this is that CNOT circuits are the set of all Clifford circuits with and for all and . Using the moreover part, CNOT circuits can realize any valid full rank .

### 2.5 PAC Learning

The goal of PAC learning is to learn a function relative to a certain distribution of inputs, rather than in an absolute sense. Let’s say we want to learn an arbitrary from some concept class . If a hypothesis function matches the true function on many of the high probability inputs, then we can say that we have approximately learned . If we can do this with high probability for arbitrary , then we probably approximately (PAC) learned .

###### Definition 2.10.

Let be a set of functions . We say that is -PAC-learnable if there exists a learner that, when given samples of the form for for arbitrary and unknown distribution , outputs with probability at least a hypothesis with error satisfying

 Ex∼D[(f(x)−h(x))2]≤ϵ.

The number of samples used is referred to as the sample complexity, and we refer to the learner being efficient if it can compute in polynomial time.

From here, one can define two types of learning, based on where comes from. If is allowed to be any function that meets the PAC constraints, we refer to this as improper learning. If instead , we get what is known as proper learning, which will be the focus of this paper. With proper learning, we can then begin to talk about the consistency problem formally.

###### Definition 2.11.

Let be a set of labeled samples such that . Let be the problem of finding a function that is consistent with all of (i.e., for all , ) if such an exists, otherwise reject.

Intuitively, given a set of samples the best one can really hope to do is find such an that gets zero training error and hope that the true error for is also low. This leads to the idea of generalization, which aims to show that if for all , and for suitable implies

 Ex∼D[(f(x)−h(x))2]≤ϵ

with high probability. In terms of computational efficiency, this effectively reduces the problem of proper learning to the consistency problem, or an approximation of the consistency problem depending on the value of . It also bounds the sample complexity to be at most , since solving the consistency problem without computational constraints is always doable in this realizable setting. We will see a formal statement of a generalization theorem in Section 3.

One can also define the decision version of the consistency problem, which is deciding if there even exists an that is consistent with all of . We show that the existence of efficient learning algorithms can imply efficient one-sided error algorithms for the decision version of the consistency problem.

###### Definition 2.12.

Let be decision version of the consistency problem for using at most samples.

###### Proposition 2.13.

An efficient randomized 111We abuse notation to signify that is a value less than and likewise for . proper learning algorithm implies where is the minimum non-zero error any hypothesis function can make on a single input.

###### Proof.

For every set of samples such that , we can define the to be the uniform distribution over all such that . By coupon collector, if we draw many samples then with probability at least we will have drawn every item from . Now imagine that there exists some hypothesis that is not consistent with . Then our error must be at least by the definition of .

Now assume we have some efficient randomized proper learning algorithm for and . When running the learner on an arbitrary , it will see samples with probability at least . To get error less than the learner must then be able to solve the search version consistency problem with probability such that . Solving for we find on accepting instances.

This gives rise to the following algorithm in RP for solving . Given samples with , we can run our learning algorithm and pretend that is what we sampled from to get hypothesis . If is consistent with then accept, otherwise reject. On an accepting instance will be consistent with probability at least while on rejecting instances it will never be consistent so the algorithm will always reject. ∎

Informally, it can be possible to go the other way and show that an efficient algorithm for implies an efficient proper learner for . Namely, if a search-to-decision reduction exists for the consistency problem on and a generalization theorem exists for the learning problem on then we can also expect to show that an efficient algorithm for the decision problem would imply an efficient proper learner for . Of particular interest will be -Complete problems, which always admit search-to-decision reductions [29].

## 3 PAC Learning Quantum Circuits

Aaronson [1] was the first to try and apply the ideas of PAC learning to that of learning quantum states by giving a generalization theorem for quantum states. Here is the set of all two-outcome measurements and the set of quantum states with samples of the form . We stress that with this result and the proceeding result, the information is assumed to be classical in nature222While the astute reader may note that representation of the measurements and states may be impossible to efficiently classically encode simply due to the dimension, for the case of Clifford/CNOT circuits later this will not be an issue.. Caro and Datta [19] then gave a similar generalization algorithm but for quantum circuits, where the goal is to learn

 (4)

such that is the set of all unitaries on qudits formed by two qudit gates of depth at most and size at most

. In their proof the measurements were rank 1 projections with tensor product structure. We present a modified version of their result to allow arbitrary rank measurements with an informal proof sketch explaining why our modification works:

###### Theorem 3.1.

Caro and Datta [19] Let be the set of quantum states on qudits, and let be the set of all projectors on qudits. Let be a quantum circuit of 2-qudit quantum unitaries with size and depth . Let

be a probability distribution on

unknown to the learner. Let

 S={((x(i),y(i)),Tr[y(i)U∗x(i)U†∗])}mi=1

be a corresponding training data drawn i.i.d according to . Let . Then, training data of size

 m=O(1ϵ(Δd4Γ2logΔlog2(Δd4Γ2log(Γ)(β−α)ϵ)+log1δ))\lx@notefootnoteAsimilarthingcanbedonewith$n$quditquantumprocesses,bysimplychangingthe$d4$to$d8$inboththeproductandthelogarithmduetotheincreaseinthefreeparametersofaquantumprocesson$n$quditsversusaunitary.

suffice to guarantee that, with probability with regard to choice of the training data, any quantum circuit of size and depth that satisfies

 ∣∣Tr[y(i)U∗x(i)U†∗]−Tr[y(i)Ux(i)U†]∣∣≤α∀1≤i≤m

also satisfies

 E(x,y)∼D[(Tr[y(i)U∗x(i)U†∗]−Tr[y(i)Ux(i)U†])2]≤(1−ϵ)β2+ϵ
###### Proof.

The Caro and Datta results shows that with regard to rank 1 projectors , the trace can be described as a polynomial over the entries of with bounded degree . Because every rank projector is simply the sum of rank 1 projectors, linearity of the trace tells us that the resulting trace is the linear combination of polynomials with bounded degree. Thus we are again left with a polynomial bounded by the same degree . ∎

Now that we have a formal statement of a generalization theorem, we can also formalize the ideas at the end of Section 2.5 and expand on them.

###### Lemma 3.2.

Let be a subset of -qudit quantum unitaries with size and depth and let

 m=O(1ϵ(Δd4Γ2logΔlog2(Δd4Γ2log(Γ)βϵ)+log1δ))

be the parameter from Theorem 3.1. If for is -Complete then an -oracle can be used to efficiently proper learn .

###### Proof.

Because search-to-decision reductions exist for all -Complete problems [29], an oracle for can be used to efficiently solve . Let us run our algorithm for on a sample such that . We now have a such that

 ∣∣Tr[y(i)U∗x(i)U†∗]−Tr[y(i)Ux(i)U†]∣∣=0∀1≤i≤m

and so by Theorem 3.1

 E(x,y)∼D[(Tr[y(i)U∗x(i)U†∗]−Tr[y(i)Ux(i)U†])2]≤(1−ϵ)β2+ϵ.

Naturally, any other oracle for an -Complete problem can also be used since they all reduce to one another, thus completing the proof. ∎

###### Corollary 3.3.

Let be a subset of -qudit quantum unitaries with size and depth and let

 m=O(1ϵ(Δd4Γ2logΔlog2(Δd4Γ2log(Γ)βϵ)+log1δ))

be the parameter from Theorem 3.1. If for is -Complete and then an RP-oracle can be used to efficiently proper learn for arbitrary .

###### Proof.

If then there must exist an efficient one-sided error randomized algorithm for . With probability at least if we query it will output the same thing -oracle on YES instances. We can efficiently boost our success probability such that the probability that the algorithm fails to perfectly match the -oracle on the at most queries to the oracle is less than for , resulting in algorithm .

By Lemma 3.2 if we had an -oracle there would then exist an efficient proper learner for when given at least

 m=O(1ϵn4lognloglognlog2(nlog2nϵ)+1δ)

samples. Naturally, this reduction can only use calls to the -oracle, so with probability at least our queries from will be correct. The total probability that the learner outputs a bad hypothesis is at most . Thus we have an efficient proper learner for . ∎

## 4 PAC Learning applied to Clifford Circuits

Because of the works of Rocchetto [37] and Lai and Cheng [32], Clifford circuits are a prime candidate for an efficiently PAC-learnable class of circuits. Additionally, due to Lemma 2.6 one could hope to use Lemma 3.2 with , , and in the same way the main theorem from Aaronson [1] was invoked for learning stabilizer states.

Noting that each Pauli matrix is Hermitian, a very natural way to measure a stabilizer state is in a product basis where we measure each qubit with respect to a Pauli.

###### Definition 4.1.

If is a Pauli operator, then the two-outcome measurement associated with is , and is referred to as a Pauli measurement.

###### Definition 4.2.

Let the problem of PAC learning Clifford circuits with respect to Pauli measurements be defined as follows. Let be an unknown Clifford circuit and let

be an unknown joint distribution over both stabilizer states and Pauli measurements. Finally, let samples to

be given as

 (ρ,E,Tr[ECρC†])

where are a stabilizer state and Pauli measurement jointly drawn from and represented as classical bit strings using the stabilizer formalism. The goal is to then learn the measurements up to error under the distribution .

A critical part of Rocchetto [37] was noting that the measurement results with Pauli measurements could only have three distinct values:

###### Lemma 4.3.

Let be a Pauli measurement associated to a Pauli operator and be an -qubit stabiliser state. Then can only take on the values , and:

 ⎧⎪⎨⎪⎩Tr[EPCρC†]=1 iff P is a stabilizer of CρC†;Tr[EPCρC†]=1/2 iff neither P% nor −P is a stabilizer of CρC†;Tr[EPCρC†]=0 iff −P is a % stabilizer of CρC†.

Now let be the stabilizer group of . From this, we can gather that if then , and if then where . Finally, if then is in the complement of . If the measurement appears multiple times, we can gather further information. For instance, have

 S={(ρi,EP,Tr[EPCρC†])}

be the set of all samples such that is the measurement taken and let be the stabilizer group of each . Based on each we know that must lie in some set , and thus must lie in . To actually be a Clifford circuit, we must also add the constraint that , giving us

 C†PC⊂(⋂iHi)∖{I⊗n}.

The problem of finding a Clifford circuit with zero training error then reduces to the search problem of finding a set of from Eq. 1 representing a that is consistent with all of these constraints while remaining symplectic according to Eq. 2. Let be the set of Clifford circuits. We will call this problem . Due to Gottesman-Knill [26, 3] showing that Clifford circuits are classically simulable, the act of verifying that we have a circuit that has zero training error is efficient meaning that the decision version of the problem is in :

###### Proposition 4.4.

The decision problem, , of deciding if there exists a Clifford circuit consistent with polynomially sized sample is in .

###### Proof.

Given a set of , it easy to check that the form a symplectic matrix by checking with respect to . Checking that they are consistent with the samples in can be done by iterating through since the trace can be computed efficiently using Gottesman-Knill. ∎

Knowing this, we find that . This property extends to analogous problems for CNOT circuits and , since one can also efficiently verify that and for all and .

## 5 Generating Samples with Certain Constraints

We will now show how we can use samples from PAC learning to generate certain kinds of constraints. It will suffice to only consider CNOT circuits with computational basis state measurements and measurements of the form . The net effect of this is that from a PAC learning standpoint, for unknown CNOT circuit we only need to figure a set of that is consistent with the samples as described in Section 4. Since we will never be tested on a measurement with some component of involved, this is equivalent to finding the and values from Eq. 1 of . We will again choose to view the as the matrix , such that must be full rank.

###### Definition 5.1.

Given a set of abelian generators , let

 ρ(P1,P2,…,Pn)=12n∑P∈Span({Pi})P

be the stabilizer state that is formed from that stabilizer group.

###### Lemma 5.2.

Let be a CNOT circuit on qubits and have be a one-dimensional affine subspace of column vectors such that and . Given an arbitrary pauli there exists a set of samples that constrains to only have consistent solutions lying in . Furthermore these samples can be efficiently generated.

###### Proof.

Let be an arbitrary basis for containing and . This can be done using samples in expectation. Recalling Definition 2.3, let us start by creating the sample

 ((ρ(Zv,Zw,Zv3,Zv4,…,Zvn),I⊗n+P2),1),

which limits to be in with positive phase.

We can create the set of samples:

 ((ρ(Zv,Zw,−Zv3, Zv4,…,Zvn),I⊗n+P2),1), ((ρ(Zv,Zw,Zv3,− Zv4,…,Zvn),I⊗n+P2),1), ⋮ ((ρ(Zv,Zw,Zv3, Zv4,…,−Zvn),I⊗n+P2),1).

By construction cannot have any component of because of the first sample of this set, nor any for due to the remaining samples. This leaves to be one of , , or (since it cannot be identity). To remedy this, we can introduce the final sample:

 ((ρ(−Zv,Zw,Zv3,Zv4,…Zvn,I⊗n+P2),0),

which then eliminates (and identity, due to the negative sign). The total number of samples is and the whole process takes polynomial in time. ∎

We can easily extend this to the -dimensional case by simply treating as , using an extra sample to remove the last dimension. More importantly, let’s say we’ve constrained to lie in . The effect of this on is that if we sum the columns where then the sum must lie in .

###### Corollary 5.3.

Let

 v+Span(w)=⎡⎢⎣||||v1v2…vk||||⎤⎥⎦+Span⎛⎜⎝⎡⎢⎣||||w1w2…vk||||⎤⎥⎦⎞⎟⎠

be a one-dimensional affine subspace of matrices over such that for all , and . Finally, let be an arbitrary submatrix of . Then there exists a set of samples that constrain to only have consistent solutions lying in for CNOT circuit . Furthermore these samples can be efficiently generated.

###### Proof.

WLOG, we will let the set of different columns we choose for to be columns through . We will use induction on to prove this corollary, with the base case covered by Lemma 5.2. Now let us assume that we have samples that constrain columns through to lie in

 ⎡⎢⎣||||v2v3…vk||||⎤⎥⎦+Span⎛⎜⎝⎡⎢⎣||||w2w3…wk||||⎤⎥⎦⎞⎟⎠.

The goal will be to generate constraints such that if column 2 is then column 1 must be . Otherwise, if column 2 is then column 1 is constrained to be . To start us off, Lemma 5.2 lets us constrain column to lie in . We can then use Lemma 5.2 again to constrain the sum of columns and to lie in . If we focus on columns and , the solutions to this specific constraint lie in an affine subspace that starts with either

 ⎡⎢⎣||v1+v2→0||⎤⎥⎦or⎡⎢⎣||v1+w1+v2+w2→0||⎤⎥⎦

and then adds the same vector to both columns. To lie in the intersection of the solutions we already have, we will need to set column to either be or such that the the whole set of columns from to lies in either

 ⎡⎢⎣||||v2v3…vk||||⎤⎥⎦or⎡⎢⎣||||v2+w2v3+w3…vk+wk||||⎤⎥⎦.

as described before. The only way to get to the first value is to add

 ⎡⎢⎣||v2v2||⎤⎥⎦

to either starting point to get

 ⎡⎢⎣||v1v2||⎤⎥⎦and⎡⎢⎣||v1+w1+w2v2||⎤⎥⎦

respectively. Examining the constraints on the first column, since so if the second column is then the first column must be as desired.

To get to as the value in the second column we instead need to add

 ⎡⎢⎣||v2+w2v2+w2||⎤⎥⎦

to either starting points to get

 ⎡⎢⎣||v1+w2v2+w2||⎤⎥⎦and⎡⎢⎣||v1+w1v2+w2||⎤⎥⎦

respectively. Again, because . Thus if the second column is then the first column must be .

Collectively, we achieve our goal of constraining the entire solution to lie in . We used samples at the first step and for every inductive step after, giving us a total number of samples of . Since each step was efficient than the whole process takes polynomial in time to generate all of the samples. ∎

## 6 On the -Completeness of NonSingularity

###### Definition 6.1.

Given matrices over some field , NonSingularity is the problem of deciding if there exists such that results in a non-singular matrix.

###### Theorem 6.2 (Buss et al. [17]).

NonSingularity over is -Complete.

The high level idea of the proof is to first reduce a 3SAT instance over variables to solving an arithmetic formula . The formula is then turned into a weighted directed graph whose adjacency matrix has a determinant that is equal to the formula , where has entries from , and can thus be viewed as an affine subspace over .

While we will not prove the correctness of this statement, we will want to ascertain exactly what kind of are formed through the reduction. We now describe the construction of the graph (see Fig. 1 and Fig. 2 for relevant illustrations):