Learning DNFs under product distributions via μ-biased quantum Fourier sampling

02/15/2018 ∙ by Varun Kanade, et al. ∙ University of Oxford UCL 0

We show that DNF formulae can be quantum PAC-learned in polynomial time under product distributions using a quantum example oracle. The best classical algorithm (without access to membership queries) runs in superpolynomial time. Our result extends the work by Bshouty and Jackson (1998) that proved that DNF formulae are efficiently learnable under the uniform distribution using a quantum example oracle. Our proof is based on a new quantum algorithm that efficiently samples the coefficients of a μ-biased Fourier transform.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Whether the class of boolean functions that can be expressed as polynomial size formulae in disjunctive normal form (DNF) is probably approximately correct (PAC) learnable in polynomial time, or not, is one of the central unresolved questions in the PAC learning framework introduced by  Valiant [1984]. Currently, the best classical algorithm for this problem has running time  [Klivans and Servedio, 2001]. A number of variants of this problem have been studied, by relaxing the requirements, e.g. learning with respect to restricted classes of distributions, and/or by enhancing the power of the learning algorithms, e.g. providing access to membership query, random walk, or a quantum example oracle [Jackson, 1997, Awasthi et al., 2013, Bshouty et al., 2005, Bshouty and Jackson, 1998].

Two cases in which it is possible to show DNF learnability under specific assumptions are particularly relevant to our setting. First, when the distribution is uniform, or more generally, a product distribution, a quasi-polynomial algorithm is known [Verbeurgt, 1990]. Second, in the membership query (MQ) model, where the learner can query an oracle for a value of the unknown function at a given point in the domain, Jackson [1997] gave a polynomial time algorithm for DNFs that works over both the uniform and product distributions.

The question of PAC learning has been extended to learners with access to quantum resources by Bshouty and Jackson [1998]. The main results in quantum learning theory are reviewed by Arunachalam and de Wolf [2017b]. The two main requirements for a quantum PAC-learner are the ability to query an oracle that can provide examples in quantum superposition and access to a quantum computer to run the learning algorithm. Two measures of interest in the PAC framework are the sample complexity and time complexity. The sample complexity is the worst-case number of examples required to learn a class of functions. The time complexity is the worst-case running time of a learner for that function class; clearly the sample complexity is at most the time complexity, but may in principle be significanly smaller. It has been shown that the quantum PAC model gives only a constant factor advantage in terms of sample complexity with respect to the classical analogue [Arunachalam and de Wolf, 2017a]. Certain results suggest that a classical/quantum separation exits when considering the time complexity of some learning problems. When learning with respect to the uniform distribution, the class of polynomial-size DNF formulae [Bshouty and Jackson, 1998] and k-juntas [Atıcı and Servedio, 2007] under the uniform distribution are known to be efficiently quantum PAC-learnable (note that the learnability of k-juntas is implied by the result on DNFs). In the classical setting, in both these cases, current best known algorithms are quasi-polynomial time algorithms (assuming ); while no formal hardness results are known in the standard PAC framework, it would be highly surprising if a polynomial time algorithm for these algorithms in the classical setting was discovered. Information-theoretic lower bounds are known in more restricted models, such as the statistical query model, which suggest these classes cannot be learnt in polynomial time [Blum et al., 1994]. In the context of learning in the presence of noise, Cross et al. [2015] proved that parity functions under the uniform distribution can be efficiently learned using a quantum example oracle. Classically, the problem is widely believed to require subexponential, but superpolynomial, time [Blum et al., 2003, Lyubashevsky, 2005]. The result of Cross et al. [2015] has been extended to linear functions and to more complex error models by Grilo and Kerenidis [2017].

1.1 Overview of our results

We show that DNF formulae under the product distribution can be learned in polynomial time in the quantum PAC model. Our proof builds on the work by Feldman [2012] for learning DNFs under the product distribution using membership queries. Feldman’s proof is in turn based on a result by Kalai et al. [2009]

that shows that DNFs can be approximated by heavy low-degree Fourier coefficients alone. Notably, Feldman’s result also applies to learning settings where the examples are drawn from a product distribution, i.e. a distribution that factorises over the elements of the input vector.

The only part of Feldman’s algorithm that makes use of MQ is the subroutine that approximates the Fourier spectrum of . The approximation is obtained using the Kushilevitz-Mansour (KM) algorithm [Kushilevitz and Mansour, 1993], for the case of uniform distributions, and the extended Kushilevitz-Mansour (EKM) algorithm [Kalai et al., 2009], for the case of product distributions). Bshouty and Jackson showed that it is possible to approximate the Fourier coefficients of using quantum Fourier sampling. This technique, introduced by Bernstein and Vazirani [1997], allows one to sample efficiently from the distribution defined by using the Quantum Fourier Transform (QFT).

In order to extend the result by Bshouty and Jackson [1998] for learning under product it is necessary to find a quantum technique to sample according to the coefficients of a Fourier transform defined over an inner product where each term is weighted according to the product distribution. Bahadur [1961] and Furst et al. [1991] showed that the Fourier transform can be extended to product distributions, thus defining the –biased Fourier transform.

In this work we introduce the –biased quantum Fourier transform

. We show the validity of our construction in two steps. First, we explicitly construct a unitary that implements the single qubit transform. Then we argue that this construction can be efficiently implemented on a quantum circuit with logarithmic overhead. By exploiting the factorisation of product distributions, we show how to build an

-qubit transform as a tensor product of

single qubit transforms.

The main technical contribution of this paper is a quantum algorithm to approximate the heavy –biased low-degree Fourier spectrum of without using membership queries. This can be interpreted as a quantum version of the EKM algorithm for approximating the low-degree Fourier coefficients of

. We provide rigorous upper bounds on the scaling of the algorithm using the Dvoretzky–Kiefer–Wolfowitz theorem, a concentration inequality that bounds the number of samples required to estimate a probability distribution in infinity norm. The learnability of DNFs under the product distribution immediately follows from an application of the quantum EKM algorithm to Corollary 5.1 in 

Feldman [2012].

1.2 Related work

The learnability of DNF formulae under the uniform distribution using a quantum example oracle was first studied by Bshouty and Jackson [1998], who in the same paper, also introduced the quantum PAC model. Their approach to learning DNF was built on the harmonic sieve algorithm previously developed by Jackson [1997]. At the core of Jackson’s algorithm lies a useful property of DNFs which guarantees that, for every -term DNF and for every probability distribution , there exists a parity such that . This implies that for every and there exists a parity that weakly approximates . In the harmonic sieve algorithm, the boosting algorithm of Freund [1995], is then used to turn the weak learner into a strong one. The only part of the harmonic sieve algorithm that requires membership queries is the KM algorithm used to find the weakly approximating parity function. Bshouty and Jackson consider the setting where the examples are given by a quantum example oracle and replace the KM algorithm with quantum Fourier sampling due to Bernstein and Vazirani [1997].

Jackson et al. [2002] studied the learnability of DNFs in the quantum membership model (where the the quantum example oracle is replaced by an oracle that returns for a given ). By using the quantum Goldreich-Levin algorithm developed by Adcock and Cleve [2002], they were able to obtain a better bound on the query complexity with respect to the best classical algorithm. We recall that the classical KM algorithm can be derived from the Goldreich-Levin theorem, an important result that reduces the computational problem of inverting a one-way function to the problem of predicting a given hard-predicate associated with that function [Goldreich and Levin, 1989]. The result of Adcock and Cleve [2002] shows that this reduction can be obtained more efficiently when considering quantum functions and quantum hard–predicates. A different quantum implementation of the Goldreich-Levin algorithm was given by Montanaro and Osborne [2010].


We describe notation and important background concepts in Section 2. In Section 4 we define the –biased quantum Fourier transform and discuss some of its properties. In Section 5 we introduce an efficient quantum algorithm to sample from the Fourier coefficients of the –biased Fourier transform and show how this can be used to prove the PAC-learnability of DNF formulae under product distributions. We conclude in Section 6 where we discuss how to implement the oracle. In Appendix A we bound the error introduced by approximating .

2 Preliminaries

2.1 Notation

We denote vectors with lower-case letters. For a vector , let denotes the -th element of . A vector is sparse if most of its entries are . If is sparse we can describe it using only its non-zero coefficients. We call this the succint representation of . For an integer , let denotes the set . We use the following standard norms. The norm , the norm , and the norm .

Let and . We use to indicate that the asymptotic scaling of is upper-bounded, up to a constant factor, by . Similarly, indicates that the asymptotic scaling of is lower-bounded, up to a constant factor, by . The notation indicates that is bounded both above and below by asymptotically. The notations and hide logarithmic factors.

For a set and we denote by a probability distribution over . The notation indicates that is sampled according to

. The expected value of a random variable

is denoted as . We often use to indicate . If has a subscript, as in , we write to indicate . When is the uniform distribution we omit the distribution in the subscript and use . The probability that an event occurs is denoted by .

2.2 Fourier analysis over the Boolean cube

Let and let and be real-valued functions defined over the Boolean hypercube . The space of real functions over the Boolean hypercube is a vector space with inner product where the expectation is taken uniformly over all . A parity function labels a according to a characteristic vector and is defined as . The set of parity functions forms an orthonormal basis for the space of real-valued functions over the Boolean hypercube. This fact implies that we can uniquely represent every function as a linear combination of parities, the Fourier transform of . The linear coefficients, known as the Fourier coefficients, are given by the projections of the function into the parity base and are denoted with . We say that a Fourier coefficient is “heavy” if it has large magnitude . The set of Fourier coefficients is called the Fourier spectrum of and is denoted by , which can also be seen as a dimensional vector in . For a set , denotes the vector of all Fourier coefficients with indices in . The degree of a Fourier coefficient is . Let . We denote by vector of all degree– coefficients of . The squared Fourier coefficients are related by Parseval’s identity . This implies that for any , (the equality holds if is Boolean–valued).

The Fourier spectrum of a function can be approximated using the KM algorithm. The KM algorithm, based upon a celebrated result by Goldreich and Levin [1989], requires membership query (MQ) access to (i.e. it requires an oracle that for every returns ).

Theorem 1 (KM algorithm).

Let be a real–valued function and let , . Then, there exists an algorithm with oracle access to that, with probability at least , returns a succinctly represented vector such that and . The algorithm runs in time and makes queries to .

2.3 –biased Fourier analysis

A product distribution over is characterised by a real vector . Such a distribution assigns values to each variable independently, so for we have and . We say that the distribution is -bounded if , where .

Bahadur [1961] and Furst et al. [1991] showed that the Fourier transform can be extended to product distributions, thus defining the -biased Fourier transform. The book by O’Donnell [2014] gives a brief introduction to –biased Fourier analysis and its applications. For an inner product , the set of functions , where forms an orthonormal basis for the vector space of real–valued functions on . In this way every function can be represented as , where . For vectors of –biased Fourier coefficients we extend the same notation introduced for standard Fourier coefficients. Parseval’s identity extends to product distributions . This implies that for any , .

The KM algorithm has been extended to product distributions by Bellare [1991], Jackson [1997] and Kalai et al. [2009]. We follow the presentation of Feldman and give the version of Kalai et al. [2009].

Theorem 2 (EKM algorithm).

Let be a real–valued function and let , , . Then, there exists an algorithm with oracle access to that, with probability at least , returns a succinctly represented vector such that and . The algorithm runs in time polynomial in , and .

2.4 Quantum computation and quantum Fourier transform

A generic -qubit state is a complex vector, also known as the state vector, acting on a Hilbert space of dimension equipped with an Hermitian scalar product . We use the Dirac notation to denote quantum states and write to denote the quantum state . Given a basis the elements of correspond to its projections over the basis elements. Each element of corresponds to a different measurable outcome. The probability of measurement outcome is , where is the projection of onto . Let and be two interacting quantum states, their joint description is given by the tensor product of the respective state vectors .

The evolution of quantum states is governed by quantum operators. Quantum operators acting on a -qubit states are dimensional unitary matrices and are denoted with capital letters. Let , the QFT over is defined as , where is the Hadamard transform .

We often work in the computational basis where, for an -qubit system, each basis element corresponds to an -bit string. A single qubit system can take two values and . When working on the Boolean hypercube we take and . A quantum register is a collection of qubits. Given a Boolean–valued function that can be efficiently computed by a classical circuit, a quantum membership oracle is a unitary map that applied on qubits acts as follow: . By combining a membership oracle with the Hadamard transform it is possible to produce a quantum phase oracle , where . This operation is also known as phase kickback. For ease of notation, in the following we will not write explicitly the ancilla register .

Given a probability distribution whose density is efficiently integrable there exists an efficient technique developed by Grover and Rudolph [2002] to generate a quantum superposition which approximate the distribution.

Lemma 1.

Let over be a probability distribution over whose density is efficiently integrable. Then, there exists an efficient quantum algorithm that returns the quantum state

2.5 PAC learning and quantum PAC learning

In the PAC model developed by Valiant [1984] a learner tries to approximate with high probability an unknown concept from a training set of random labelled examples . The examples are given by an example oracle that returns an example , where is randomly sampled from a probability distribution over . A concept class is a set of concepts. A learning algorithm gets as input the training set and outputs a hypothesis that is a good approximation of with probability . We say that a concept class is PAC-learnable if, for every , , , , when running a learning algorithm on examples generated by , we have that, with probability at least ,

. PAC theory introduces two parameters to classify the efficiency of a learner. The first one,

, is information-theoretic and determines the minimum number of examples required to PAC-learn the class . We refer to as the sample complexity of the concept class . The second parameter, the time complexity, is computational and corresponds to the runtime of the best learner for the class . We say that a concept class is efficiently PAC-learnable if the running time of is polynomial in , and .111We assume that the size of the concept is polynomially bounded in and ignore this dependence.

Two extensions of the PAC model are relevant for our purposes. In the MQ model the learner has access, in addition to the example oracle , to a membership oracle that for every returns the value . In the quantum PAC model, the examples are given by a quantum example oracle that returns the superposition . It has been proven [Bshouty and Jackson, 1998] that membership queries are strictly more powerful than a quantum example oracle (i.e. a quantum example oracle can be simulated by a membership oracle but the converse is not true). When is the product distribution we use and .

A DNF formula is a disjunction of terms where each term is a conjunction of Boolean literals and a literal is either a variable or its negation (e.g., ). The size of a DNF is the number of terms

3 Overview of Feldman’s algorithm

Our proof of the learnability of DNFs under the product distribution builds on an algorithm by Feldman [2012] that greatly simplified the learnability of DNFs. At the core of Feldman’s algorithm lies a result by Kalai et al. [2009] that shows that DNFs can be approximated by heavy low-degree Fourier coefficients alone. More formally, they proved that, for any -term DNF , it is possible to find a function that is -close to provided that . This fact gives a direct learnability condition and avoids an involved boosting procedure to turn a weak learner into a strong one (as in the harmonic sieve algorithm by Jackson [1997]). Feldman further refined this fact about DNFs

Theorem 3 (Theorem 3.8 in [Feldman, 2012]).

Let be a constant, be a -bounded distribution and . For an integer let be an -term DNF. For and every bounded function ,

By this theorem the learnability of DNF reduces to constructing a that approximates the heavy low-degree Fourier spectrum of . This is exactly the approach followed by Feldman that we now proceed to sketch.

The first step of the procedure is to run the EKM algorithm to estimate the heavy Fourier spectrum of . The EKM algorithm returns a succinct representation of the spectrum and the learner selects only the coefficients that have degree degree . This is the only step of the algorithm that requires membership queries and is the subroutine that will be replaced by the quantum EKM algorithm that will be derived in Section 5.

Once the learner has estimated the Fourier spectrum of , it proceeds with the construction of . The procedure is simple and based on an iterative process. Note that by Parseval


Starting with a such that it is possible to construct a such that is closer than to in norm with the following rule:

Then by Eq. 1 we have that

The problem with this procedure is that the function might have value outside but Feldman showed that the function can be adjusted to the right range by performing a single projection after all the updates.

Once a precision has been reached such that an application of Theorem 3 gives , the algorithm outputs as hypothesis. From this, we get the following in regards to learning ,

The running time of all the above operations is polynomial in and inverse polynomial in the error parameters resulting in the following corollary

Corollary 4 (Corollary 5.1 in [Feldman, 2012]).

Let compute an -term DNF. Let be a constant and let be a -bounded probability distribution. Let be an example oracle and a membership oracle. Then, there exists an algorithm with and access that efficiently PAC learns over .

Finally, we note that the requirement of -bounded distributions is imposed in order to control the magnitude of modulus of the -biased Fourier basis that, otherwise, would diverge for .

4 Quantum –biased Fourier transform

In this section we introduce the –biased quantum Fourier transform and show how this can be used to derive a quantum algorithm for sampling from the probability distribution defined by the Fourier coefficients of the –biased transform. We recall that the –biased Fourier transform is defined as


where , , and . Our construction of the -qubit –biased QFT exploits a fundamental property of product distributions, namely that the orthonormal basis it defines can be factorised on the individual bits. This fact allows us to give an explicit form of the -qubit transform as a tensor product of single qubit transforms. We begin by constructing the single qubit transform. Later we will show how to construct efficiently an -qubit transform out of single qubit ones. In the following we assume that the function is Boolean–valued. Our results can be extended to real–valued functions over the Boolean hypercube using a discretisation procedure. As shown in [Bshouty and Jackson, 1998] the error induced by the approximation can be controlled.

The action of the single qubit -biased QFT can be explicitly constructed


Here we defined as the single qubit –biased QFT operator whose description in the computational basis is readily given by:

By taking the functional forms of and we can write

It is easy to verify that this matrix is unitary and positive semidefinite. We also note that, as consequence of the Solovay-Kitaev theorem [Kitaev, 1997], it is possible to approximate from a fixed finite set of universal quantum gates with logarithmic overhead.

We can construct the extension of the –biased QFT to the case of qubits by taking the tensor product of single qubit operators. Let and , if we denote as the -th digit of , as the probability associated to the -th bit, and its respective basis element, we can write:

By exploiting the product structure of and that is, and we can write the qubit –biased QFT as:


We remark that it is possible to construct the qubit transform only because the product distribution and the basis factorises. Without this factorisation we could still write Eq. 4 but we would not know how to implement this transformation efficiently on a quantum computer (the Solovay-Kitaev theorem guarantees that only single qubit unitary can be efficiently approximated by a universal set of gates).

Finally, we note that the construction of the -biased transform assumes knowledge of the vector . It is possible to estimate for each using random samples from . In Appendix A, we prove that the error introduced by this approximation can be controlled if is -bounded.

As a simple application of the –biased QFT, we show how to sample from the probability distribution defined by the coefficients of the single bit –biased Fourier transform (recall that Parseval’s equality holds in the –biased setting).

Lemma 2 (–biased quantum Fourier sampling).

Let be a Boolean–valued function. Then, there exists a quantum algorithm with quantum membership oracle access that returns with probability . The algorithm requires exactly query and gates.


Starting with the state, apply an Hadamard transform to get . Then apply Lemma 1 to get . By querying the quantum membership oracle , one obtains . Finally, applying the –biased QFT results in

Measuring the state, one obtains with probability . ∎

In order to use this result in the context of quantum PAC learning we need to replace the membership oracle with a quantum example oracle. The following lemma, that extends Lemma  in [Bshouty and Jackson, 1998] to the -biased case, serves this purpose. Differently from Lemma 2 we present directly the -dimensional case.

Lemma 3.

Let be a Boolean-valued function. Then, there exists a quantum algorithm with quantum example oracle access that returns with probability . The algorithm requires exactly query and gates.


Let be the truth table representation of with . Given access to it is always possible to construct an oracle for (this is equivalent to a relabelling of the qubits). Apply on a to get . Then apply on the first register:

An application of the standard QFT on the second register gives:

where we used the orthonormality of the basis and . Measuring the first register we obtain with probability

5 Quantum computation of –biased Fourier spectrum

In this section we give a quantum algorithm to approximate the -biased Fourier spectrum of a function. This can be interpreted as a quantum version of the EKM algorithm. As a simple application of the quantum EKM algorithm we obtain the learnability of DNFs under product distributions in the quantum PAC model.

We will make repeated use of the Dvoretzky-Kiefer-Wolfowitz (DKW) theorem, a concentration inequality that bounds the number of samples required to estimate a cumulative distribution in norm. The DKW Theorem was first proposed by Dvoretzky et al. [1956] with an almost tight bound. Birnbaum and McCarty [1958] conjectured that the inequality was tight. This conjecture was proved by Massart et al. [1990]. The DKW theorem is usually given for continuous probability distribution but its validity extends also to discrete distributions (a detailed discussion can be found in [Kosorok, 2007]).

Let be a sequence of i.i.d. random variables drawn from a distribution on with Cumulative Distribution Function (CDF) , and let be their realizations. Given a set the indicator function takes values if and if . We denote the empirical probability distribution associated to as and its empirical cumulative distribution as . The DKW theorem states that:

Theorem 5 (Dvoretzky-Kiefer-Wolfowitz).

For any i.i.d. sample with cumulative distribution and empirical cumulative distribution

To make notation consistent, we write instead of . By using the DKW theorem we can prove a useful lemma that bounds the number of samples needed to estimate a probability distribution in norm.

Lemma 4.

Let be a probability distribution over and let , . Then, there exits an algorithm that with probability outputs such that using samples.


Let be an ordering of elements of the Boolean hypercube . We have that

An application of the triangle inequality gives

By Theorem 5 we have that, with probability , . Let , then

from which it is easy to that . ∎

The combined application of Lemma 3 and Lemma 4 allows us to prove the following result:

Theorem 6 (Quantum EKM algorithm).

Let be a Boolean–valued function and let , , . Then, there exists a quantum algorithm with access that, with probability at least , returns a succinctly represented vector , such that and . The algorithm requires QEX queries and gates.


We begin by estimating the ’s related to the heavy Fourier coefficients of . Let be the probability distribution defined by the –biased Fourier coefficients of . Lemma 3 gives a procedure that, with query and gates, samples with probability and with probability . Applying Lemma 4 on the distribution with we obtain that samples are required to have an estimate in high probability. This implies that . By selecting the characteristic vectors that correspond to coefficients such that (and discarding the element ) we can output a list of ’s such that, with probability , all the corresponding Fourier coefficients have and there are no coefficients such that . By Parseval’s equality this implies that the list may contain at most elements.

The final step requires the estimation of the Fourier coefficients. For a given , the Fourier coefficient can be obtained by sampling using the QEX oracle to simulate EX (to get an example it would suffice to measure a state prepared with QEX) in time (the number of examples required for the estimate is a standard application of the Chernoff bound).

Combining the bounds to estimate the ’s to the one to estimate the Fourier coefficients we obtain that the algorithm requires QEX queries and gates (each estimate of the must be repeated times). ∎

Theorem 6 can be straightforwardly used in the method developed by Feldman [2012, Corollary 5.1] to obtain the learnability of DNF under product distributions.

Corollary 7.

Let compute an -term DNF. Let be a constant, let be a -bounded probability distribution and let be a quantum example oracle. Then, there exists a quantum algorithm with access that efficiently PAC learns over .

We recall that the collection of the heavy Fourier coefficients of the DNF is the only step of Feldman’s algorithm that requires MQ. The remaining of the algorithm makes use of the coefficients to construct a function that approximates .

6 Construction of quantum example oracles

A large class of quantum algorithms for learning problems involving classical data or functions require quantum data oracles that can efficiently update the training set in superposition. Here, we define efficient as logarithmic in the dimension of the training set or in the dimension of the support of the function. The oracle is one of such oracles and can be implemented using the quantum random access memory (QRAM) [Giovannetti et al., 2008]. The QRAM is a quantum procedure that allows one to encode in superposition data points into qubits in time . More specifically, let be the content of a memory structure with elements. The action of the QRAM on a state is

Let be a Boolean-valued function . The action of a oracle for a target concept and probability distribution is

In order to use the QRAM to update classical data we need a classical memory structure that stores the truth table of . Provided this classical memory, we can construct the oracle in two steps. Starting from the state , use Lemma 1 on the first register to obtain . Finally, an application of the QRAM returns the state . Note that because the QRAM can load data in logarithmic time, it is possible to build a superposition that encodes a Boolean function supported on in polynomial time. In this sense, the total runtime of the algorithm for learning DNFs remains polynomial.

A possible drawback of a QRAM implementation is that it is not clear whether a QRAM can actually be built. A discussion of the challenges related to the construction of a QRAM can be found in the review article by Ciliberto et al. [2018].

The QRAM can be substituted with a standard quantum membership oracle in the procedure above. Although the use of the membership oracle would be limited to the construction of the oracle (which Bshouty and Jackson proved to be unable to simulate efficiently a classical membership oracle) it is still unclear whether a oracle can be built without using a membership oracle.


We thank Carlo Ciliberto for elucidative comments on the DKW inequality, Leonard Wossnig for helpful conversations on the implementation of the oracle, and Matthias Caro for comments on an earlier draft. AR is supported by an EPSRC DTP Scholarship and by QinetiQ Ltd. SS is supported by the Royal Society, EPSRC, the National Natural Science Foundation of China, and the grant ARO-MURI W911NF-17-1-0304 (US DOD, UK MOD and UK EPSRC under the Multidisciplinary University Research Initiative).


  • Adcock and Cleve [2002] Mark Adcock and Richard Cleve. A quantum Goldreich-Levin theorem with cryptographic applications. In Annual Symposium on Theoretical Aspects of Computer Science, pages 323–334. Springer, 2002.
  • Arunachalam and de Wolf [2017a] Srinivasan Arunachalam and Ronald de Wolf. Optimal quantum sample complexity of learning algorithms. In 32nd Computational Complexity Conference, CCC 2017, July 6-9, 2017, Riga, Latvia, 2017a.
  • Arunachalam and de Wolf [2017b] Srinivasan Arunachalam and Ronald de Wolf. Guest column: A survey of quantum learning theory. SIGACT News, 48(2):41–67, 2017b.
  • Atıcı and Servedio [2007] Alp Atıcı and Rocco A Servedio. Quantum algorithms for learning and testing juntas. Quantum Information Processing, 6(5):323–348, 2007.
  • Awasthi et al. [2013] Pranjal Awasthi, Vitaly Feldman, and Varun Kanade. Learning using local membership queries. In Conference on Learning Theory, pages 398–431, 2013.
  • Bahadur [1961] Raghu Raj Bahadur.

    A representation of the joint distribution of responses to n dichotomous items.

    Studies in item analysis and prediction, 6:158–168, 1961.
  • Bellare [1991] Mihir Bellare. The spectral norm of finite functions. Technical report, Cambridge, MA, USA, 1991.
  • Bernstein and Vazirani [1997] Ethan Bernstein and Umesh Vazirani. Quantum complexity theory. SIAM Journal on Computing, 26(5):1411–1473, 1997.
  • Birnbaum and McCarty [1958] ZW Birnbaum and RC McCarty. A distribution-free upper confidence bound for , based on independent samples of and . The Annals of Mathematical Statistics, pages 558–562, 1958.
  • Blum et al. [1994] Avrim Blum, Merrick Furst, Jeffrey Jackson, Michael Kearns, Yishay Mansour, and Steven Rudich. Weakly learning DNF and characterizing statistical query learning using Fourier analysis. In

    Proceedings of the twenty-sixth annual ACM symposium on Theory of computing

    , pages 253–262. ACM, 1994.
  • Blum et al. [2003] Avrim Blum, Adam Kalai, and Hal Wasserman. Noise-tolerant learning, the parity problem, and the statistical query model. Journal of the ACM (JACM), 50(4):506–519, 2003.
  • Bshouty and Jackson [1998] Nader H Bshouty and Jeffrey C Jackson. Learning DNF over the uniform distribution using a quantum example oracle. SIAM Journal on Computing, 28(3):1136–1153, 1998.
  • Bshouty et al. [2005] Nader H Bshouty, Elchanan Mossel, Ryan O’Donnell, and Rocco A Servedio. Learning DNF from random walks. Journal of Computer and System Sciences, 71(3):250–265, 2005.
  • Ciliberto et al. [2018] Carlo Ciliberto, Mark Herbster, Alessandro Davide Ialongo, Massimiliano Pontil, Andrea Rocchetto, Simone Severini, and Leonard Wossnig.

    Quantum machine learning: a classical perspective.

    In Proc. R. Soc. A, volume 474, page 20170551. The Royal Society, 2018.
  • Cross et al. [2015] Andrew W Cross, Graeme Smith, and John A Smolin. Quantum learning robust against noise. Physical Review A, 92(1):012327, 2015.
  • Dvoretzky et al. [1956] Aryeh Dvoretzky, Jack Kiefer, and Jacob Wolfowitz. Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. The Annals of Mathematical Statistics, pages 642–669, 1956.
  • Feldman [2012] Vitaly Feldman. Learning DNF expressions from fourier spectrum. In COLT, volume 8, pages 8–4, 2012.
  • Freund [1995] Yoav Freund. Boosting a weak learning algorithm by majority. Information and computation, 121(2):256–285, 1995.
  • Furst et al. [1991] Merrick L Furst, Jeffrey C Jackson, and Sean W Smith. Improved learning of AC functions. In COLT, volume 91, pages 317–325, 1991.
  • Giovannetti et al. [2008] Vittorio Giovannetti, Seth Lloyd, and Lorenzo Maccone. Quantum random access memory. Physical review letters, 100(16):160501, 2008.
  • Goldreich and Levin [1989] Oded Goldreich and Leonid A Levin. A hard-core predicate for all one-way functions. In Proceedings of the twenty-first annual ACM symposium on Theory of computing, pages 25–32. ACM, 1989.
  • Grilo and Kerenidis [2017] Alex B Grilo and Iordanis Kerenidis. Learning with errors is easy with quantum samples. arXiv preprint arXiv:1702.08255, 2017.
  • Grover and Rudolph [2002] Lov Grover and Terry Rudolph. Creating superpositions that correspond to efficiently integrable probability distributions. arXiv preprint quant-ph/0208112, 2002.
  • Jackson [1997] Jeffrey C Jackson. An efficient membership-query algorithm for learning DNF with respect to the uniform distribution. Journal of Computer and System Sciences, 55(3):414–440, 1997.
  • Jackson et al. [2002] Jeffrey C Jackson, Christino Tamon, and Tomoyuki Yamakami. Quantum DNF learnability revisited. Lecture notes in computer science, pages 595–604, 2002.
  • Kalai et al. [2009] Adam Tauman Kalai, Alex Samorodnitsky, and Shang-Hua Teng. Learning and smoothed analysis. In Foundations of Computer Science, 2009. FOCS’09. 50th Annual IEEE Symposium on, pages 395–404. IEEE, 2009.
  • Kitaev [1997] A Yu Kitaev. Quantum computations: algorithms and error correction. Russian Mathematical Surveys, 52(6):1191–1249, 1997.
  • Klivans and Servedio [2001] Adam R Klivans and Rocco Servedio. Learning DNF in time . In Proceedings of the thirty-third annual ACM symposium on Theory of computing, pages 258–265. ACM, 2001.
  • Kosorok [2007] Michael R Kosorok. Introduction to empirical processes and semiparametric inference. Springer Science & Business Media, 2007.
  • Kushilevitz and Mansour [1993] Eyal Kushilevitz and Yishay Mansour.

    Learning decision trees using the fourier spectrum.

    SIAM Journal on Computing, 22(6):1331–1348, 1993.
  • Lyubashevsky [2005] Vadim Lyubashevsky. The parity problem in the presence of noise, decoding random linear codes, and the subset sum problem. In

    Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques

    , pages 378–389. Springer, 2005.
  • Massart et al. [1990] Pascal Massart et al. The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. The annals of Probability, 18(3):1269–1283, 1990.
  • Montanaro and Osborne [2010] Ashley Montanaro and Tobias J. Osborne. Quantum boolean functions. Chicago Journal of Theoretical Computer Science, 2010(1), January 2010.
  • O’Donnell [2014] Ryan O’Donnell. Analysis of boolean functions. Cambridge University Press, 2014.
  • Valiant [1984] Leslie G Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, 1984.
  • Verbeurgt [1990] Karsten A Verbeurgt. Learning DNF under the uniform distribution in quasi-polynomial time. In COLT, pages 314–326, 1990.

Appendix A Error analysis

In the main section we assumed that the vector parametrising the product distribution was given to the learner. Here we prove that, if is -bounded, it is possible to estimate introducing an error that can be made small at a cost that scales polynomially in . We recall that , , and . A simple application of the Chernoff bound gives that, with probability , samples give an estimate such that .

We want to estimate the error introduced by approximating with