1 Introduction
A large body of recent work has tried to identify concrete machine learning (ML) tasks for which quantum machine learning (QML) methods could demonstrate a welldefined and meaningful advantage over classical methods. In particular, it is known that if one allows finely tuned and highly structured datasets, as well as special purpose quantum learning algorithms (i.e., learners designed specifically for the finely tuned task), then there do indeed exist problems for which quantum learners can obtain meaningful advantages arunachalam2017guest,liu2021rigorous, Sweke2021quantumversus. However, ideally one would like to demonstrate that one can obtain an advantage for practically relevant problems, using “generic” quantum learning algorithms, preferably those which can be executed on nearterm devices in the hybrid quantum classical framework bharti2021noisy,cerezo2020variational,benedetti2019parameterized. While a large proportion of recent QML research has been focused on supervised learning, one area that has seemed particularly promising for demonstrating such quantum/classical separations is unsupervised generative modelling.
In an unsupervised generative modelling problem, one is given some type of oracle access to the unknown target distribution. The goal of the learning algorithm is to output, with high probability, an approximate generator for the target distribution – i.e., an algorithm for generating samples from some distribution which is sufficiently close to the target distribution Sweke2021quantumversus, Kearns:1994:LDD:195058.195155. Many highly relevant practical ML problems are of this type, and as such the development and application of classical methods for this problem – such as generative adversarial networks (GANs) goodfellow2014generative, variational autoencoders kingma2013auto and normalizing flows Kobyzev_2020 – is a highly active research topic. Given this, the development of quantum models and algorithms for generative modelling is of natural interest and a variety of approaches, such as quantum circuit Born machines (QCBMs) coyle2020born, Liu_2018, Benedetti_2019, Gaoeaat9004, quantum GANs Dallaire_Demers_2018,Hueaav2761, lloyd2018quantum, chakrabarti2019quantum and quantum Hamiltonianbased models verdon2019quantum have also been proposed and implemented rudolph2020generation.
QCBMs are a particularly promising class of models, which are based on the simple observation that measuring the output state vector
of a quantum circuit , in the computational basis, provides a sample from the “Born distribution” defined by the circuit, i.e., the distribution over bit strings for which(1) 
Given this observation, QCBM based generative modelling algorithms typically work by iteratively updating the parameters of a parameterized quantum circuit
, until the Born distribution of the circuit matches as closely as possible – with respect to some loss function – the unknown target distribution coyle2020born,Liu_2018,Benedetti_2019,Gaoeaat9004.
In light of the known hardness of classically simulating certain classes of local quantum circuits Bremner_2010,terhal2002adaptive,aaronson2016complexitytheoretic, some recent works have conjectured or provided numerical evidence for the classical hardness of the generative modelling problem associated with QCBMs coyle2020born,niu2020learnability. More specifically, these works have suggested that learning a generator for the Born distributions of local quantum circuits, when given access to samples from such distributions, may be hard for classical learning algorithms. In contrast, it seems natural to conjecture that this particular generative modelling problem is computationally feasible for QCBM based learning algorithms. This is because such algorithms naturally use parameterized local quantum circuits as generators by construction, and need only to identify the correct circuit parameters. In particular, a separation between the power of quantum and classical generative modelling algorithms has already been established, using a highly fine tuned concept class and learning algorithm Sweke2021quantumversus. However, the hope has been that one could demonstrate a similar separation using a generic QCBM based learner by considering the learnabilility of QCBM distributions themselves. Moreover, as quantum circuit Born machines are known to be highly expressive glasser2019expressive, the hope has been that such a quantum/classical separation might translate to a quantum advantage for practical generative modelling problems. More specifically, when making the decision to use a QCBM for a practical probabilistic modelling problem, one is making the implicit assumption that the target distribution can indeed be well approximated by the Born distribution of some local quantum circuit. It is therefore well motivated to try prove a separation between the power of QCBM based algorithms and classical algorithms for learning the output distributions of QCBMs themselves. However, in order to demonstrate such a separation via QCBMs one requires two results: Firstly, a rigorous proof of the classical hardness of the generative modelling problem associated with the output distributions of local quantum circuits, and secondly, a rigorous proof of the efficiency of QCBM based algorithms for the same task.
1.1 Overview of this work
Motivated by these questions, we study in this work the learnability of the output distributions of local quantum circuits within the probably approximately correct (PAC) framework for probabilistic modelling Kearns:1994:LDD:195058.195155,Sweke2021quantumversus. Since its introduction, Valiant’s model of PAC learning valiant1984theory, along with a variety of natural extensions and modifications, has provided a fruitful framework for studying both the computational and statistical aspects of machine learning kearns1994introduction,shalev2014understanding, and for the rigorous comparison of quantum and classical learning algorithms arunachalam2017guest. In addition to providing results for generative modelling, we also study the related problem of density modelling. In this setting, the goal of the learner is not to generate new samples from the target distribution, but to output, with high probability, a sufficiently accurate algorithm for evaluating the probabilities of events with respect to the target distribution – i.e. an algorithm which when given an event outputs the associated probability . We refer to such an algorithm as an evaluator for the target distribution.
Moreover, we study both of these probabilistic modelling problems with respect to two different models of access to the unknown target distribution. The first model we call the sample model, as we assume in this model that the learner has access to a sample oracle which provides samples from the unknown target distribution. The second model is the statistical query (SQ) model
, which has originally been introduced by Kearns in Ref. kearns1998efficient as a natural restriction of the sample model, and which in the context of supervised learning of Boolean functions, guarantees noiserobustness of the associated learning algorithm. In the SQ model, learners do not have access to samples from the target distribution, but only to approximate averaged statistical properties of the unknown target distribution. More specifically, learners have access to an
SQ oracle, which when queried with a function, provides an approximation to the expectation value of the output of that function, with respect to inputs drawn from the unknown target distribution. Since the SQ model is a strict restriction of the sample model, hardness of learning in the SQ model does notimply hardness of learning in the sample model. Still, within the probabilistic modelling context, hardness results in the SQ model are of interest for two important reasons. Firstly, the SQ model provides a natural way to restrict one’s attention to learning algorithms which, if given access to a sample oracle, always use their samples from the target distribution to calculate approximate expectation values of functions via sample mean estimates. As we will show, many
generic implicit generative modelling algorithms – i.e. those which are not designed to exploit a particular structure in the target distribution class – are of this type, including those for training quantum circuit Born machines Liu_2018,coyle2020born,mohamed2017learning. As such, hardness results in the SQ model apply to many implicit generative modelling algorithms of practical interest, and in particular to those which are often used for the concept class of interest in this work. Secondly, while it is often easier to obtain lower bounds on the query complexity of learning algorithms in the SQ model – via constructive quantities known as statistical dimensions – there are very few examples of learning problems which are known to be hard in the SQ model, but easy in the sample model Feldman2016,feldman2017general. As such, hardness in the SQ model is often taken as strong evidence for hardness in the sample model.In summary, we study in this work the following problems, which are stated more formally in Section 3:
Problems: PAC probabilistic modelling of quantum circuit Born machines (informal).
Let be the set of output distributions corresponding to a class of local quantum circuits. Given either sampleoracle or SQoracle access to some unknown distribution , output, with high probability, either
 generative modelling

an efficient generator, or
 density modelling

an efficient evaluator
for a distribution which is sufficiently close to .
If there exists either a sample or computationally efficient algorithm which, with respect to either the sample oracle or the SQ oracle, solves the generative (density) modelling problem associated with a given set of distributions , then we say that is sample or computationally efficiently generator (evaluator) learnable within the relevant oracle model. We are particularly interested in this work in establishing the existence or nonexistence, of efficient quantum or classical learning algorithms, for the output distributions of various classes of local quantum circuits, within both the sample and statistical query model.
1.2 Main results
Given this motivation and context, we provide two main results, which stated informally, are as follows:
Result 1 (Informal version of Corollary 1).
The concept class consisting of the output distributions of superlogarithmic depth nearest neighbour Clifford circuits is not sample efficiently PAC generatorlearnable or evaluatorlearnable, in the statistical query model.
Result 2 (Informal version of Theorem 2).
The concept class consisting of the output distributions of nearest neighbour Clifford circuits is both sample and computationally efficiently classically PAC generatorlearnable and evaluatorlearnable, in the sample model.
These results provide some first concrete insights into the learnability of the output distributions of local quantum circuits from a probabilistic modelling perspective, and are of interest for a variety of reasons. Firstly, we note that Result 1
applies not just to Clifford circuits: it implies the hardness of learning the output distributions of any nearest neighbour quantum circuit whose gates come from some gate set which includes the twoqubit Clifford group. However, we choose to stress the special case of local Clifford circuits in our statement of Result
1, as it allows us to highlight the fact that the generative modelling problem associated with a class of local quantum circuits can be hard, even when the class of circuits are efficiently classically simulatable! More specifically, local Clifford circuits are known to be classically efficiently simulatable, in the sense that given a description of the quantum circuit, there exist classically efficient algorithms both to evaluate the probabilities of events, and to sample from the associated Born distribution gottesman1998heisenberg,Aaronson_2004. As such, while the probabilistic modelling problems we consider are naturally analogous to classical simulation problems – but with SQ access to the distribution as input rather than a circuit description – our first result establishes that learning both generators and evaluators for the output distribution of a local quantum circuit from SQ queries can be hard, even when outputting a generator or an evaluator from a circuit description can be done efficiently.Secondly, we stress that as Result 1 provides a query complexity lower bound, it holds for both quantum and classical learners. As such, this result directly implies that, at least in the statistical query model, one cannot use the concept class of local quantum circuit output distributions to demonstrate a meaningful separation between the power of quantum and classical generative modelling algorithms. More specifically, as mentioned before, any such separation requires both a classical hardness result – i.e. a proof that a given concept class is not efficiently learnable via classical learning algorithms – and a quantum learnability result – i.e. an explicit efficient quantum learning algorithm for the given concept class. However, our work establishes that, at least in the SQ model, efficient quantum learnability of the output distributions of (superlogarithmically deep) local quantum circuits is not possible, even for classically simulatable circuit classes. This result therefore provides a direct obstacle to the goal of proving an exponential quantum advantage for generative modelling via QCBMs as (a) learning algorithms for QCBMs typically use statistical queries and (b) the concept class of output distributions of local quantum circuits is certainly the most natural set of distributions with which to try prove an advantage for QCBMs.
Additionally, as mentioned before, hardness results in the SQ model are often taken as strong evidence for computational hardness in the sample model. However, as Result 2 covers all local Clifford circuits, and in particular those of superlogarithmic depth, we see that the distribution concept class of Result 1 provides an interesting example of a generative modelling problem which is hard in the SQ model, but computationally efficient in the sample model. It is important however to stress that, in order to exploit individual samples from the target distribution, the efficient learning algorithm implied by Result 2 relies heavily on knowledge of the algebraic structure of stabilizer states (the output states of Clifford circuits). As such it remains an interesting open problem to understand whether the output distributions of more generic local quantum circuits are also learnable in the sample model, despite being hard to learn in the SQ model.
Finally, we stress that while our work provides some first concrete insights into the learnability of the output distributions of local quantum circuits, there remain a variety of interesting open questions. In particular, there are many combinations of circuit depth, gatetype, oracle model, and learnertype which are not covered by our results. In light of this, we provide in Section 7 a detailed description of some of the open questions prompted by this work, along with multiple explicit conjectures.
1.3 Proof Techniques
Finally, before proceeding we mention briefly some of the proof techniques involved in establishing our results. For Result 1, we exploit the fundamental conceptual observation from property testing, which is that testing properties of an object can sometimes be easier than learning an object, and as such one can often lower bound the query complexity of a learning problem by lower bounding the query complexity of a suitable property testing problem canonne2020survey,goldreich2017introduction. In our case, we observe that lower bounds on the query complexity of identity testing, with the additional promise that the unknown distribution is from the concept class to be learned, is sufficient to prove query complexity lower bounds for the probabilistic modelling problems we are interested in. By phrasing the problem of identity testing with a promise as a decision problem (as defined in Ref. feldman2017general), we are then able to exploit existing results from Feldman feldman2017general, who has shown that the query complexity of any decision problem in the SQ model can be completely characterized by the randomized statistical dimension
of the problem. As such, our main technical result is a lower bound for the randomized statistical dimension of a suitably constructed decision problem. Specifically, this decision problem encodes the problem of testing the identity of some distribution, which is promised to be the output distribution of a local quantum circuit. In order to obtain this lower bound, we rely on techniques for calculating moments over the unitary group low2010pseudorandomness,hunterjones2019unitary,barak2021spoofing,dalzell2020random. For Result
2, we exploit the known relationship between stabilizer states – the output states of Clifford circuits – and affine subspaces of (thedimensional vector space over the finite field of two elements). More specifically, we use the observation that the Born distribution of any Clifford circuit is the uniform distribution over some affine subspace of
Dehaene_2003,montanaro2017learning. We then show that one can efficiently recover a description of an affine subspace of when given samples from the uniform distribution over that space.1.4 Structure of this work
This work is structured as follows: We begin by discussing in Section 2
the relation of our work to existing work. In particular, we use this section to stress the distinctions between the problem we consider and a variety of related problems in quantum computing and computational learning theory. In particular, we wish to make clear how the results we obtain here do not follow immediately from known results in related areas. With this context in hand we then introduce formally in Section
3 all the preliminaries necessary for this work. In particular, we define the PAC model for distribution learning, the distribution concept class of local quantum circuits, decision problems in the SQ model, and fundamental linear algebra over . These preliminaries allow us to present Result 1 in Section 4 and Result 2 in Section 6. Finally we conclude in Section 7 with a discussion and an explicit list of open questions and conjectures.2 Relation to existing work
The probabilistic modelling problems we study in this work – defined informally in Section 1.1 – are closely related to, but distinct from, a variety of different computational problems in quantum information and computational learning theory. In order to make these distinctions clear, and to clarify the extent of some potential reductions between learnability results for probabilistic modelling and known results in related areas, we provide here a brief and informal discussion of these related problems. These related problems are also illustrated in Figure 1.
Classical simulation of quantum circuits: Given a specific class of quantum circuits, it is of fundamental interest to understand whether, and in which sense, quantum circuits from the given class are efficiently classically simulatable. Typically one differentiates between the notions of weak classical simulation and strong classical simulation. As illustrated in Figure 1, in both instances one is given as input an efficient classical description of the quantum circuit. Using the language of probabilistic modelling, in a weak classical simulation the task is to output a generator for the Born distribution of the output state of the quantum circuit, while for strong classical simulation the task is to output an evaluator for the Born distribution of the circuit. For both strong and weak simulation, if there exists an efficient algorithm which can succeed for all circuits in the specific class, then one says the class of circuits is either weakly or strongly efficiently classical simulatable. If one can prove that no such efficient classical algorithm exists, then one says that classically simulating the given circuit class is worstcase hard. Alternatively, if one can prove that, with high probability when drawing a circuit randomly from the class, the simulation task cannot be performed efficiently, then one says that simulating the given class of circuits is averagecase hard. We stress that while the desired output of a strong/weak classical simulation is the same as the desired output of the associated density/generative modelling problem, the inputs differ significantly. More specifically, in the case of classical simulation one is given a description of the circuit as input, while in the probabilistic modelling setting we are concerned with here, one is given only some sort of oracle access to the Born distribution of the output state of the circuit. Moreover, while there is currently a large interest in establishing the averagecase hardness of weak classical simulation for certain classes of local quantum circuits bouland2019complexity,movassagh_cayley_2019,bouland_noise_2021, typically in the probabilistic modelling setting one is concerned with establishing worstcase hardness. We note that while multiple previous works have conjectured implications between the hardness of weakclassical simulation of quantum circuits and the hardness of the associated generative modelling problem Liu_2018,coyle2020born, our results establish firmly that, at least in the statistical query model, the generative modelling problem associated with a class of local quantum circuits can be computationally hard, even when the weakclassical simulation problem is easy. As such, despite what has been previously suggested,, one cannot straightforwardly use the hardness of a classical simulation for a given class of quantum circuits to prove the hardness of the associated generative modelling problem. Indeed, as stressed above, these are completely different computational problems.
Distribution testing and verification of quantum circuit sampling:
The field of distribution testing is concerned with the development of algorithms for testing whether or not an unknown probability distribution has a given property canonne2020survey. Given that
learning a complete description of a distribution often allows one to test properties of the distribution, lower bounds on the query complexity of testing algorithms often imply lower bounds on the query complexity of learning algorithms goldreich2017introduction. One particularly important distribution testing problem is that of identity testing: Given a complete description of some known distribution , as well as some type of oracle access to an unknown distribution , decide whether or is at least a certain distance away. Using optimality results for distribution identity testing valiant2017automatic, one can show that, for certain classes of local quantum circuits, there exists no sampleefficient classical algorithm for testing, from samples, whether or not the samples are from the Born distribution of a given quantum circuit hangleiter2019sample,hangleiter2020sampling. Using the standard intuition from property testing – namely that learning algorithms often imply testing algorithms – one might think that the existence of a sampleefficient algorithm for learning a generator for the Born distributions of local quantum circuits would imply the existence of a sampleefficient algorithm for testing whether or not samples, which come from some generator, are indeed coming from the Born distribution of a given local quantum circuit. Indeed, if this was the case, then one could use the known hardness results for testing the Born distributions of local quantum circuits to rule out the existence of efficient generatorlearning algorithms for the same class of circuits. Unfortunately – and interestingly – however, this is not the case. One can show that in order to obtain hardness of learning results, one requires hardness results not for the standard distribution identity testing problem, but rather for the problem of distribution identity testing with the additional promise that the samples to be tested are coming from some distribution in the concept class of the learner. In this more restricted problem, the distribution has to be distinguished from fewer distributions as compared to the unrestricted identity testing problem. It is this fact which motivates our reduction between generatorlearning and a specific decision problem feldman2017general, which as explained in Section 3.3 can be viewed precisely as distribution identity testing with an additional promise.Learning quantum states: There exist a wide variety of different notions of what it means to “learn a quantum state”. Perhaps most intuitive is that of quantum state tomography, in which given the ability to perform arbitrary efficient measurements on multiple identical copies of an unknown state, one would like to learn a full classical description of the state BenchmarkingReview. As obtaining a full classical description of a quantum state, in the general case, precludes efficient algorithms, multiple refinements of quantum state tomography have been introduced, in which the goal is only to predict some properties of the unknown quantum state, such as expectation values of particular observables. Examples of such refinements include Aaronson’s extension of the PAC framework for quantum states Aaronson_2007, Aaronson’s shadow tomography framework aaronson2019shadow and classical shadow learning Huang_2020. In the case when the quantum state to be learned is the output state of a local quantum circuit – as for example in previous works on learning stabilizer states Rochhetto,gollakota2021hardness – the above mentioned statelearning problems are similar in some respects to the probabilistic modelling problem we consider here, while differing in a few essential ways. Most importantly, in the state learning setting one typically has access to the outcomes of a variety of different types of measurements, where as in the probabilistic modelling setting one only has oracle access to the Born distribution of the unknown state  i.e. to the outcomes of measurements in the computational basis. Similarly, in the probabilistic modelling setting we are only concerned with obtaining either a generator or evaluator for the Born distribution of the state, as opposed to either a full classical description of the quantum state, or an algorithm for predicting the expectation values of different observables.
Distribution learning: Given the fundamental importance of probabilistic modelling for a wide variety of applications, there is by now a large body of results on the learnability of different classes of probability distributions Kearns:1994:LDD:195058.195155, canonne2020short,diakonikolas2016learning,kamath2015learning,de2014learning. While the majority of such work has been in the sample oracle model, recent work has also started to explore such questions in the statistical query model diakonikolas2017statistical. Up until now however, there has been no work on the learnability of Born distributions of local quantum circuits. As such, while we rely on similar techniques to previous works on probabilistic modelling in the PAC framework – namely reductions from property testing and lower bounds via statistical dimensions – our work is distinct by virtue of the class of distributions we consider, which is motivated by the desire to understand the potential advantages quantum probabilistic modelling algorithms may offer over classical approaches.
3 Preliminaries
We denote the set of all distributions over as . We denote the uniform distribution over as . If is implicit from the context, or not important for an argument, we will often omit the subscript. We will often consider subsets, which we refer to as distribution concept classes. Given some distribution concept class , a reference distribution and some , we use to denote the epsilon ball, with respect to the total variation distance, around in , i.e.
(2) 
Once again, when is clear from the context the subscript will be omitted. We denote the set of all probability measures over a set by . We will use the notation to denote the uniform measure over a set , but for convenience we will often use the shorthand to denote sampled from . Given some oracle , and a randomized algorithm , we will use the notation to mean with query access to . We denote the unitary group of degree by . Finally, as expectation values of function outputs with respect to randomly drawn inputs are a central aspect of this work, we define the following shorthand notation, which is used frequently:
Definition 1 (Expectation values of function outputs).
Given some function , as well as some , we use the notation to denote the expectation value , i.e.
(3) 
3.1 PAC framework for probabilistic modelling
We formalize in this section the PAC framework for probabilistic modelling, building on and refining the definitions from Refs. Kearns:1994:LDD:195058.195155,coyle2020born,Sweke2021quantumversus. In order to build such a framework, the first thing we require is a meaningful notion of “access to a distribution”. We achieve this via the following oracles:
Definition 2 (Distribution oracles).
Given we define the sample oracle as the oracle which, when queried, provides a sample from . We denote this via
(4) 
Additionally, given some , we define the statistical query oracle as the oracle which, when queried via some efficiently computable function , responds with some such that . We denote this via
(5) 
We stress that for any distribution , the oracle is specified via a tolerance parameter , which determines the accuracy of the expectation values provided by . In particular, we note that for any which decays at most inverse polynomially in – i.e. – one can straightforwardly use access to to efficiently simulate access to . Specifically, given any appropriate efficiently computable function , one simply outputs the sample mean of the output of on polynomially many samples drawn from Feldman2016,kearns1998efficient. In light of this, one typically considers statistical query oracles with at best inverse polynomial accuracy, as in this regime the statistical query model provides a natural framework for studying the complexity of algorithms which always use sample access to a distribution to calculate expectation values of efficiently computable functions.^{1}^{1}1More specifically, in the regime of inverse polynomially accurate queries, i.e., , any statistical query algorithm (no matter its query complexity) yields a sample efficient algorithm in the sample model, as all queries to can be simulated using the same set of samples from diakonikolas2017statistical. This is why lower bounds on the query complexity of an SQ algorithm do not correspond to informationtheoretic obstacles. They rather yield lower bounds on the computational complexity of “generic” algorithms in the sample model, i.e. those algorithms that simply simulate SQ access via access to the sample oracle. We stress however that the opposite is not true, and that one cannot simulate a sample oracle with a statistical query oracle, and therefore in principle it is possible that, for some computational problem, there exist sample efficient algorithms in the sample model but not in the statistical query model.
Having fixed the different notions of access to a distribution that we will consider, we now define what it means to “learn a distribution”. In particular, as we have already mentioned, there are two distinct notions one could meaningfully consider. Informally, given some unknown target distribution , we could ask that a learning algorithm, when given either SQ or sample oracle access to , outputs an evaluator for – i.e. some function which on input outputs an estimate for , and therefore provides an approximate description of the distribution. This is perhaps the most intuitive notion of what it means to learn a probability distribution, and one which is often referred to as density modelling
, due to the fact that the evaluator allows one to model the probability density function of the unknown target distribution. However, in many practical settings one might not be interested in learning a full description of the probability distribution (an evaluator for the probability of events) but rather in being able to generate samples from the distribution. As such, instead of asking for an evaluator of the target distribution we could ask that the learning algorithm outputs a
generator for – i.e. a probabilistic (quantum or classical) algorithm which when run generates samples from . This task is often referred to as generative modelling, due to the fact that the generator provides a model of the process via which samples from are generated. In order to formalize this, we start with the following definition of evaluators and generators.Definition 3 (Generators and evaluators).
Given some probability distribution , we say that a classical (or quantum) algorithm is an efficient classical (quantum) generator for if produces samples in according to , using computational resources. In the case of a classical generator, we allow the algorithm to receive as input uniformly random input bits. An algorithm is an efficient evaluator for if for all one has that , and uses only computational resources.
Given the above definitions, we are now able to define formally the PAC framework for probabilistic modelling, which includes both generator and evaluator learning, and allows us to consider arbitrary models of oracle access to the unknown target distributions. Importantly, this framework also allows us to study the computational and statistical properties of probabilistic modelling algorithms, and to compare in a rigorous way quantum and classical learning algorithms. We start with the following definition of PAC generator and evaluator learners, which at a high level are learning algorithms which, when given oracle access to an unknown distribution, output with sufficiently high probability, a sufficiently accurate generator or evaluator.
Definition 4 (PAC generator and evaluator learners).
An algorithm is an PAC GENlearner (EVALlearner) for if for all , when given access to oracle , with probability at least , outputs a generator (evaluator ) for some satisfying .
Before proceeding, we reiterate a few important aspects of the above definition (which are also discussed in detail in Ref. Sweke2021quantumversus). Firstly, we note that the learning algorithm in the above definition could be either quantum or classical. Indeed, one of the primary motivations of this work is to understand the potential advantages quantum learners could offer over classical learners for probabilistic modelling problems. Additionally, in the case of generatorlearning the output generator could be either a quantum or classical generator, and we stress that one must be able to “unplug” this generator from the oracle used during training – i.e. the generator must be a completely independent algorithm for generating samples from the target distribution. For example, as mentioned in the introduction, QCBM based learning algorithms are a class of generative modelling algorithms whose output generator is a quantum circuit, which allows one to sample from the corresponding Born distribution by measuring the output state in the computational basis. Finally, we reiterate that while it is not perhaps apriori clear why we would consider learning algorithms with access only to statistical queries, we note that many generic implicit generative modelling algorithms, both quantum and classical, when given access to a sample oracle use this access to approximate expectation values of functions (see Appendix A). As such, at least in the case of generative modelling, the statistical query model provides a natural framework for studying the complexity of learning problems with respect to known algorithms and methods. With this in hand, we can now define a variety of notions of efficient PAC learnability of a distribution concept class.
Definition 5 (Efficiently learnable distribution concept classes).
Given a distribution concept class we define the randomized query complexity () as the smallest number of queries required by any PAC GENlearner (EVALlearner) for . We say that is sampleefficiently PAC GENlearnable (EVALlearnable) with respect to oracle if for all
(6) 
We say that a distribution concept class is computationallyefficiently PAC GENlearnable (EVALlearnable) with respect to oracle if it is sampleefficiently PAC GENlearnable (EVALlearnable) with respect to oracle , and in addition the sampleefficient learning algorithm also runs in time for all .
3.2 Local quantum circuit based distribution concept classes
In this work we will be primarily concerned with the PAC learnability of the distributions obtained by measuring, in the computational basis, the output states of a specific class of local quantum circuits. In order to define this distribution concept class in a rigorous way, we start with the following definition of “brickwork” quantum circuits, which is also illustrated in Figure 2.
Definition 6 (Brickwork quantum circuits with gate set ).
Given some twoqubit gateset we denote by the set of unitaries which can be realized by a depth quantum circuit on qubits consisting only of nearest neighbour gates from .
While the above definition allows for an arbitrary twoqubit gateset, we will predominantly be concerned with the twoqubit Clifford group, which we denote with . In general we denote the qubit Clifford group as . Given this definition, we proceed to define the classical probability distribution obtained by measuring the output state of a local quantum circuit in the computational basis. As the probabilities of events are derived from the amplitudes of the measured quantum state via the Born rule, we refer to this distribution as the Born distribution of the unitary which prepares the state.
Definition 7 (Born distribution).
Given some qubit unitary , we define the “Born distribution” via
(7) 
for all  i.e. is the probability of obtaining when measuring in the computational basis.
With these definitions in hand, we can finally define the concept class of central interest to this work, namely the set of distributions obtained by measuring the output states of brickwork quantum circuits in the computational basis. Additionally, we will define the set of Born distributions corresponding to global Clifford unitaries, as we will later have reason to make use of this class of distributions.
Definition 8 (Concept class and ).
Given some gateset , for all we define via
(8) 
Additionally, we define via
(9) 
We note that implies and that for all . Finally, our proofs will often rely heavily on the fact that we can build a measure over the set of probability distributions by drawing gates of a circuit architecture uniformly at random, and then outputting the Born distribution of the global circuit unitary. In order to facilitate this, we define the following measure over which is induced by drawing gates in the circuit uniformly at random from the relevant gate set.
Definition 9 (Induced measure over ).
We define as the measure over which is induced by drawing gates from uniformly.
3.3 Decision problems in the statistical query model
As mentioned in the very brief sketch of proof techniques given in the introduction, in order to obtain our first result we will rely heavily on a reduction between probabilistic modelling and a specific type of decision problem, defined in Ref. feldman2017general, as follows:
Definition 10 ( distributiondecision problem feldman2017general).
Given a set of distributions , a reference distribution , and some , we say that an algorithm solves the distributiondecision problem , with probability , using oracle access to , if for all

when then ,

when then .
We define the randomized query complexity as the smallest number of queries necessary for a randomized algorithm to solve the decision problem , with probability , using oracle access to .
In order to gain some intuition for this type of decision problem we note that, given some , when , then the decision problem is essentially equivalent to the problem of testing whether an unknown distribution is equal to the reference distribution or far from , but with the additional promise that the unknown distribution is an element of the distribution concept class . This observation allows us to use the standard property testing insight that learning is generically harder than testing to build a reduction between probabilistic modelling and a specific decision problem of the form just introduced. In particular, it is straightforward to show that “learning implies deciding”, i.e. that one can lower bound the randomized query complexity of learning via the randomized query complexity of the decision problem .
Lemma 1 (Learning implies deciding).
Assume , , and . Then, for all the following two inequalities hold
(10)  
(11) 
Proof.
See Appendix B. ∎
Our motivation for such a reduction comes from the fact that it allows us to exploit existing results from Feldman feldman2017general, which show that, in the statistical query model, the query complexity of a given decision problem is completely determined by the randomized statistical dimension of the problem, which is defined as follows:
Definition 11 (Randomized statistical dimension feldman2017general).
Given some , we define the randomized statistical dimension of the distributiondecision problem via
(12) 
where the supremum is over all probability measures over the set , and is defined via
(13) 
where we have again used the shorthand notation .
We note that a statistical query via some function allows one to distinguish, from the reference distribution , all those distributions that satisfy . Hence, we can think of each distinguishing function as covering a certain fraction of the class . In Ref. feldman2017general, Feldman proved that the randomized statistical dimension as defined above equals precisely the size of a randomized cover on the whole class by a measure over distinguishing functions . However, the utility of the randomized statistical dimension stems from the fact that Feldman was able to come up with a dual formulation in terms of a measure over the class of distributions , rather than over the distinguishing functions . To illustrate this, note that the expression appearing in curly brackets in Eq. (13),
(14) 
is the probability that a distribution , drawn randomly according to the measure on , can be distinguished from the reference distribution via a query to some fixed . For our purposes, it is helpful to point out the role of the measure when it comes to lower bounding the randomized statistical dimension: Due to the supremum in Eq. (12), any particular choice of measure leads to a value for , which in turn yields a lower bound on the randomized statistical dimension. However, to obtain the best possible bound, intuitively, we should choose the measure such that it is concentrated on distributions that are “maximally hard” to distinguish from . Such distributions will typically each require their own individual query in order to be distinguished from the reference distribution.
In light of this, we make particular use of the following lemma, which shows that the randomized query complexity of a decision problem can be lower bounded by the randomized statistical dimension.
Lemma 2 (Randomized statistical dimension lower bounds randomized query complexity feldman2017general).
(15) 
As such, by combining Lemma’s 1 and 2 we see that in order to lower bound the query complexity of learning a given distribution concept class , it is sufficient to lower bound the randomized statistical dimension of the decision problem . Additionally, we will also make use of the following observation that increasing the size of the set of distributions defining a decision problem can only increase the randomized statistical dimension, and therefore the randomized query complexity, of the associated decision problem.
Observation 1 (Randomized statistical dimension grows with the size of the concept class).
Given two sets of distributions , a reference distribution , and some , we have that
(16) 
3.4 Linear algebra over
In order to prove Result 2 we exploit fundamental connections between the Born distributions of local Clifford circuits, and affine subspaces of the vector space . As such we review the required preliminaries here. In particular, we denote by the finite field of two elements. is then the finite dimensional vector space over the field , whose elements are bit strings in , equipped with entry wise addition modulo 2, which we denote with . We note that any dimensional subspace of is isomorphic to , and can be described by a (nonunique) binary matrix of full rank (containing basis vectors for the subspace). Additionally, we recall the following definition of an affine subspace.
Definition 12 (Affine subspace).
Let be a vector space over the field . A subset is called an affine subspace of if and only if there exists a vector and a subspace such that
(17) 
We define the dimension of via .
Given the above definition, we note that every dimensional affine subspace of is fully specified by a (nonunique) tuple , where is an fullrank binary matrix specifying the dimensional subspace , and is the (nonunique) offset vector. More specifically, we say that such a tuple describes an affine subspace if
(18) 
Finally, given an dimensional affine subspace of , specified by the tuple , we denote by the uniform distribution over elements of , i.e., the distribution for which for all
(19)  
(20) 
4 Hardness of PAC learning the output distributions of local quantum circuits in the SQ model
We present in this section our first main result – a formal version of Result 1 – which is given below as Corollary 1. In order to establish this result, we begin with the following theorem (whose proof is given in Section 5) which provides a lower bound on the randomized query complexity, in the statistical query model, of both generatorlearning and evaluatorlearning the output distributions of Clifford brickwork quantum circuits.
Theorem 1 (Lower bound on the query complexity of learning local Clifford circuits in the SQ model).
For all large enough, all , and for all ,
(21)  
(22) 
We note that, perhaps surprisingly, the asymptotic query complexity lower bounds we obtain above are independent of the accuracy parameter , provided it is suitably bounded. Additionally, we note that the lower bounds of Theorem 1 depend on both the circuit depth and statistical query tolerance . However, as mentioned and motivated in Section 3.1, we are most naturally interested in the complexity of learning algorithms with respect to SQ oracles which provide expectation values of at best inverse polynomial accuracy – i.e. in the setting where . In this setting, we have that superlogarithmic circuit depth is sufficient to obtain a superpolynomial lower bound on the query complexity of both PAC generator and evaluator learners for the output distributions of brickwork Clifford circuits. This then immediately implies the hardness, in the inverse polynomially accurate SQ model, of PAC learning the output distributions of superlogarithmically deep brickwork Clifford circuits. However, as increasing the size of a concept class can only increase the required query complexity, this also implies the hardness of PAC learning the output distributions of superlogarithmically deep brickwork circuits using any gate set which includes the Clifford group. These observations are formalized in the following Corollary.
Corollary 1 (Hardness of PAC learning local quantum circuits with inverse polynomially accurate statistical queries).
Let be any twoqubit gateset satisfying . Then, for all large enough, for all , and for all , the distribution concept class is not sampleefficiently PAC GENlearnable or EVALlearnable with respect to the oracle.
Proof.
Consider first the case . Using, Theorem 1 and taking and as per the statement of the Corollary immediately gives a superpolynomial lower bound (asymptotically with respect to ) for both and , which implies the statement of the Corollary for . The case then follows from the observation that, for any two distribution concept classes and satisfying , both a GEN or EVAL learner for is immediately a learner for , and therefore
(23)  
(24) 
∎
Let us stress that, as Corollary 1 is concerned with query complexity, it applies to both classical and quantum learning algorithms which use statistical queries. As discussed in Appendix A, many generic generative modelling algorithms of practical interest can be efficiently simulated in the SQ model, and are therefore under the domain of applicability of this result. As such, Corollary 1 strongly limits the potential for using the output distributions of local quantum circuits to provide a separation between the power of quantum and classical generative modelling algorithms. Additionally, we mention again that the concept class of superlogarithmically deep nearest neighbour Clifford circuits is classically simulatable gottesman1998heisenberg. As such, Corollary 1 establishes that learning a generator or an evaluator from statistical queries can be hard, for both quantum and classical learning algorithms, even when outputting a classical generator from a circuit description can be done efficiently! Finally, as has been noted in the proof of Corollary 1, for inverse polynomially accurate SQ queries, superlogarithmic circuit depth is enough to ensure a superpolynomial lower bound on the randomized query complexity  which then implies the nonexistence of efficient learning algorithms. However we note, as illustrated in Fig. 3, that that combination of inverse polynomially accurate SQ queries and polynomial depth circuits would give rise to an exponential lower bound on the query complexity.
Remark 1 (Generalization to universal gate sets).
Corollary 1 establishes the hardness of SQ learning the Born distributions of local quantum circuits that use gates from any twoqubit gate set that contains the twoqubit Clifford group . A natural followup question is whether similar hardness results can be obtained for alternative gate sets not containing the Clifford group. Many interesting such gate sets exist. In fact, it is known that any entangling twoqubit gate, together with arbitrary singlequbit gates, is universal brylinski2002universal, bremner2002practical. Here, we remark that the proof techniques we use to establish the query complexity lower bounds given in Theorem 1 are indeed sufficiently general to be adapted to any universal gate set . This is because our proof of Theorem 1 does not rely on the algebraic properties of Clifford circuits and their Born distributions. Rather, it relies on the fact that the Clifford group is sufficiently evenly distributed over the unitary group. More specifically, we use that it forms a unitary 2design (see Appendix C for a definition). Additionally, we use that any global Clifford unitary can be implemented via a nearestneighbor Clifford circuit in linear depth Bravyi_2021. It is known that local random quantum circuits with gates drawn from any universal gate set also converge (at least approximately) to a unitary design in linear depth, more precisely at depth brandaoLocalRandomQuantum2016,harrow2018approximate,haferkampImprovedSpectralGaps2021. Given this, and using higher moments, our proof techniques can be adapted to local quantum circuits based on any universal gate set. Here, we choose to restrict the presentation to gate sets satisfying for reasons of brevity and clarity.
5 Proof of Theorem 1
We provide in this section a proof for the lower bounds on
(25)  
(26) 
given in Theorem 1. As discussed in Section 3.3, these query complexity lower bounds can be obtained by proving a lower bound on the randomized statistical dimension of a suitably constructed decision problem
The above decision problem singles out one Born distribution as a reference distribution. The task is then to decide, given SQ oracle access to an unknown distribution , whether or is at least far from in total variation distance^{2}^{2}2It is important to note that this decision problem is close, but not quite the same, as the verification problem studied in Ref. hangleiter2019sample. In particular, here we have a promise that the unknown distribution is an element of the concept class . This promise is essential to the reduction between learning and deciding, but precludes the use of optimality results from identity testing valiant2017automatic, in which there is no such promise.. We note that, as a result of the reduction from deciding to learning given in Lemma 1 of Section 3.3, we are free to choose whichever reference distribution allows us to obtain the tightest lower bound. We hence aim to choose such that it is “maximally hard” to distinguish from the rest of the concept class via statistical queries. Intuitively, this will be the case whenever the concept class contains many distributions that cannot be distinguished from via the same statistical queries.
A natural candidate for a good reference distribution is the uniform distribution . Intuitively, this is because our concept class contains a large number of “flat” distributions that will each require their own statistical query in order to be distinguished from the uniform distribution. Indeed, it is known that the Born distributions of local quantum circuits are typically exponentially flat at sufficient circuit depth hangleiter2019sample. Additionally, we note that as the uniform distribution can be straightforwardly generated by measuring the output state of a Clifford circuit of unit depth, we indeed have that for all . Given this intuition, we will therefore proceed by lower bounding the randomized statistical dimension (RSD) of the decision problem , which we denote as
While one would naturally expect the RSD to depend on the accuracy of the statistical queries , the circuit depth , number of qubits and accuracy , we will find an aymptotic lower bound which, provided is suitably bounded, depends only on and . As mentioned before, we are naturally most interested in setting , because this accuracy corresponds to the regime in which one can simulate statistical queries via polynomially many samples. With “fixed” to , our primary goal is then to determine the smallest depth giving rise to superpolynomial (in ) query complexity lower bounds, and hence to hardness of SQ learning. While Theorem 1 establishes this hardness for superlogarithmic depth circuits, we will prove this result in two steps:

Linear depth: As a warmup, in the first part, we consider output distributions of nearestneighbor Clifford circuits of linear depth. In particular, we will choose as it is known that this depth is sufficient to implement any Clifford unitary exactly in a nearestneighbor circuit architecture (Bravyi_2021, ). In our notation, this implies that, for all , we have that
and therefore that . Exploiting this completeness property of the concept class, along with properties of the Born distributions of global Clifford unitaries, we are able to prove a lower bound on the randomized statistical dimension which grows exponentially in , whenever .

Extension to sublinear depth: The exponential lower bound from the lineardepth case suggests that one might be still be able to achieve a superpolynomial lower bound at sublinear circuit depths. In the second part of the proof, we will show that this is indeed the case. In fact, we demonstrate that we are able to trade off circuit depth against query complexity. The technical difficulty arises from the fact that when we reduce the depth to , then some global Clifford unitaries cannot be implemented anymore so that
Consequently, also the concept class containing the corresponding Born distribution of the circuits gets smaller. Characterizing precisely which Clifford Born distributions drop out and which are still present at a certain depth seems difficult. Instead, building on the insights from the lineardepth case, we will show two different approaches for extending our bounds to sublinear depth that get around this difficulty. Both approaches let us establish a tradeoff between the depth – and hence the size of the concept class – and the number of required statistical queries.
5.1 Warm up: Linear circuit depth
In this section, we will prove the following lemma, which – as illustrated in Figure 4 – after applying the reductions from Lemmas 1 and 2, gives rise to a version of Theorem 1, restricted to the setting of linear depth local quantum circuits .
Lemma 3 (Restriction of Theorem 1 for linear depth quantum circuits).
For all large enough, all and all it holds that
(27) 
As stressed before, this immediately implies, for all , statistical query complexity lower bounds for learning which are exponential in , whenever . This choice of depth is deliberate, since it follows from Ref. Bravyi_2021 that by using a nearestneighbour Clifford circuit of depth at most one can implement any global Clifford unitary . Hence, we have that, for all ,
(28) 
Thus, when proving Lemma 3, the decision problem of interest is with respect to the Born distributions of the global qubit Clifford group
(29) 
From Definition 11, it follows that the RSD of this decision problem is lower bounded by
(30) 
for any measure over . This is because the RSD is actually defined as the supremum of the RHS of Eq. (30) over all possible measures over . When proving Lemma 3, we will take to be the measure induced by drawing a uniformly random global Clifford and postselecting on its Born distribution being at least far from uniform in total variation distance. That is, is can be defined via the following procedure for sampling from :

Draw .

If , output .

Else, if , resample from the uniform measure over .

It follows from the definition of the conditional probability that, for all ,
(31) 
Our goal is to upper bound the fraction appearing on the right hand side of Eq. (31). We will do so by finding bounds for the denominator and the numerator separately. Note that, due to our choice of , the fraction involves probabilities over unitaries drawn uniformly at random from the Clifford group . It turns out that expectation values of the form can be evaluated exactly analytically as long as is a polynomial of degree at most in the entries of and its complex conjugate . This will allows us to greatly simplify our computations when bounding the numerator. More specifically, we will make use of the following expressions for the first and second moment of output probabilities .
Lemma 4 (Clifford moments).
(32)  
(33) 
Proof.
See Appendix C. ∎
Given these expressions, we can now start to bound the terms appearing in Eq. (31). Let us start with the numerator. We prove the following upper bound.
Lemma 5 (Probability of distinguishing from – numerator of Eq. (31)).
Assume large enough and . Then for all one has that
(34) 
Proof.
For the denominator, we show the following lower bound:
Lemma 6 (Global random Clifford output distributions are far from uniform).
Assume . Then for any ,
(41) 
Proof.
See Appendix D. ∎
Finally, with all the pieces in place, the proof of Lemma 3 is straightforward. One simply substitutes the expressions from Lemma 5 and Lemma 6 into Eq. (31). As illustrated in Fig. 4, one then obtains a restricted version of Theorem 1 by first using the relationship between the randomized statistical dimension and the randomized query complexity of decision problems (Lemma 2), and then applying the reduction between learning and deciding (Lemma 1).
Additionally, following up on Remark 1, we point out that the crucial ingredient to prove Lemma 5 is the 2design property of the Clifford group, which leads to the moments given in Lemma 4. Hence, an analogous version of this Lemma could be derived for random circuits based on any universal gate set, since such circuits converge to unitary designs in depth , for any constant . Furthermore, an analogous version of Lemma 6 can be derived whenever the underlying circuit ensemble forms at least an approximate unitary 4design.
5.2 Extension to sublinear circuit depth
In the previous section we established a lower bound for the randomized statistical dimension of a suitable decision problem, which leads to a restriction of Theorem 1 which holds only for linear depth circuits. In this section we provide two different techniques for proving Theorem 1 by extending these lower bounds to the case of sublinear depth circuits, at the cost of decreased query complexity. As mentioned before, the primary difficulty we tackle here is the fact that for sublinear circuit depth the concept class is a strict subset of the set of global Clifford Born distributions, i.e.,
and we therefore cannot rely straightforwardly on properties of the global Clifford unitaries. While the first approach detailed below provides a slightly weaker (but still superpolynomial) query complexity lower bound than that stated in Theorem 1, we provide two alternative approaches due to the fact that extensions of either of these techniques may facilitate progress on the open questions and conjectures listed in Section 7.
5.2.1 First approach: sublinear circuit depth Clifford moments via random circuit techniques
The first approach follows essentially the same steps as for the linear depth case, i.e. we aim to obtain a lower bound on the RSD of the decision problem
which is applicable even in the case when . The only difference arises from the issue that, in the case when , we need to come up with a different measure that is only supported on elements that are still present in the concept class . A natural choice for is the measure introduced in Definition 9. Essentially, instead of drawing a global Clifford unitary as considered in the lineardepth case, we now draw a random Clifford circuit of depth by drawing the individual gates in the circuit architecture independently from the 2qubit Clifford group . To be precise, we will take to be the measure over the set defined by the following procedure for sampling from :

Draw

If , output .

Else, if , reject and resample from .

As in the lineardepth case, we now have that
(42) 
where in contrast to Eq. (31), the probabilities in the fraction on the RHS of Eq. (42) are now with respect to a randomly drawn nearestneighbor Clifford circuit rather than a randomly drawn global Clifford unitary. Luckily, it turns out that moments with respect to random nearestneighbor Clifford circuits, i.e. moments with respect to , can be bounded using techniques from the existing literature on random circuits. Specifically, slightly modifying results on the collision probability of random quantum circuits barak2021spoofing,dalzell2020random allows us to compute the following first and second moment bounds, from which bounds on the numerator and denominator follow as per the linear depth case. In particular, the following lemma is adapted from Section 6.3 of Ref. barak2021spoofing.
Lemma 7 (Restricted depth random nearestneighbor Clifford circuit moments – adapted from Ref. barak2021spoofing).
(43)  
(44) 
Proof.
See Appendix C. ∎
Using these expressions, bounding the randomized statistical dimension proceeds completely analogously to the lineardepth case. In particular, we can directly prove the following bound on the numerator of Eq. (42).
Lemma 8 (Probability of distinguishing from – numerator of Eq. (42)).
Assume large enough and . Then for all one has that
(45) 
Proof.
For the denominator we find the same expression as in the linear depth case.
Lemma 9 (Local random Clifford circuit output distributions are far from uniform).
Assume . Then for any ,
(46) 
Proof.
See Appendix D. ∎
Once again, Eq. (42), then allows us to obtain the following lower bound for the randomized statistical dimension of the decision problem of interest.
Lemma 10.
For all large enough, all and all it holds that
(47) 
5.2.2 Second approach: embedding strategy
The second approach we detail here is conceptually different. It is based on the observation that, even at sublinear circuit depths, one can implement global Clifford unitaries on sufficiently small subsets of qubits. More specifically, for any circuit depth , as long as , one can implement any on the first qubits of a brickwork Clifford circuit. This observation allows us to consider a decision problem with respect to qubit quantum circuits, in which we have embedded a smaller qubit version of the globalClifford decision problem considered in Section 5.1. Due to the nature of the embedding, the randomized statistical dimension of this decision problem will be the same as that of the globalClifford decision problem from Section 5.1, rescaled to qubits.
To make the argument concrete, we start by considering the subset of unitaries that arises from local nearest neighbor Clifford circuits of depth , in which as illustrated in Figure 5, only the first qubits are acted on nontrivially. More specifically, we define
(48) 
Further, we denote by