# On the Sample Complexity of PAC Learning Quantum Process

We generalize the PAC (probably approximately correct) learning model to the quantum world by generalizing the concepts from classical functions to quantum processes, defining the problem of PAC learning quantum process, and study its sample complexity. In the problem of PAC learning quantum process, we want to learn an ϵ-approximate of an unknown quantum process c^* from a known finite concept class C with probability 1-δ using samples {(x_1,c^*(x_1)),(x_2,c^*(x_2)),...}, where {x_1,x_2, ...} are computational basis states sampled from an unknown distribution D and {c^*(x_1),c^*(x_2),...} are the (possibly mixed) quantum states outputted by c^*. The special case of PAC-learning quantum process under constant input reduces to a natural problem which we named as approximate state discrimination, where we are given copies of an unknown quantum state c^* from an known finite set C, and we want to learn with probability 1-δ an ϵ-approximate of c^* with as few copies of c^* as possible. We show that the problem of PAC learning quantum process can be solved with O(|C| + (1/ δ)/ϵ^2) samples when the outputs are pure states and O(^3 |C|( |C|+(1/ δ))/ϵ^2) samples if the outputs can be mixed. Some implications of our results are that we can PAC-learn a polynomial sized quantum circuit in polynomial samples and approximate state discrimination can be solved in polynomial samples even when concept class size |C| is exponential in the number of qubits, an exponentially improvement over a full state tomography.

## Authors

• 6 publications
• 5 publications
• ### The Optimal Sample Complexity of PAC Learning

This work establishes a new upper bound on the number of samples suffici...
07/02/2015 ∙ by Steve Hanneke, et al. ∙ 0

• ### The Learnability of Unknown Quantum Measurements

Quantum machine learning has received significant attention in recent ye...
01/03/2015 ∙ by Hao-Chung Cheng, et al. ∙ 0

• ### Quantum process tomography with unknown single-preparation input states

Quantum Process Tomography (QPT) methods aim at identifying, i.e. estima...
09/18/2019 ∙ by Yannick Deville, et al. ∙ 0

• ### Shadow Tomography of Quantum States

We introduce the problem of *shadow tomography*: given an unknown D-dime...
11/03/2017 ∙ by Scott Aaronson, et al. ∙ 0

• ### On Statistical Learning of Simplices: Unmixing Problem Revisited

Learning of high-dimensional simplices from uniformly-sampled observatio...
10/18/2018 ∙ by Amir Najafi, et al. ∙ 0

• ### Quantum statistical query learning

We propose a learning model called the quantum statistical learning QSQ ...
02/19/2020 ∙ by Srinivasan Arunachalam, et al. ∙ 0

• ### Online Learning of Quantum States

Suppose we have many copies of an unknown n-qubit state ρ. We measure so...
02/25/2018 ∙ by Scott Aaronson, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Machine learning gained huge popularity ever since Google beats GO’s world champion with its AI. In this paper, we generalize the PAC (probably approximately correct) learning model [Val84], a well studied learning model in classical computer science, to the quantum world. We did so by generalizing the concepts in a PAC learning model from classical functions to quantum processes, defining the problem of PAC learning quantum process. The problem of PAC learning quantum process is detailed as follows: Let the concept class be a finite set of known to dimensional quantum process. We are trying to learn an unknown quantum process, the target concept . In order to do this, we are given samples , where are inputs to the quantum process and are the corresponding quantum states outputted by . The inputs are drawn from an unknown distribution . Because of the no-cloning theorem, it is hard to justify holding both the inputs and outputs as unknown quantum states, so we restricted the inputs to computational basis states and keep the outputted states as unknown quantum states, meaning that we hold a copy of the quantum state rather than the full classical description of it. A proper111Proper means that the hypothesis must be inside the concept class unlike the improper case where can be any density matrix. All learner in this paper are proper, and we might the term “proper”. -PAC learner for the concept class of quantum processes is a quantum algorithm that takes the description of and samples as input and outputs a hypothesis that is -close to the target concept with probability for any target concept and input distribution , where the distance between two concepts depends on the input distribution and is defined as , the expected trace distance between the outputs averaged over . We show that the problem of PAC learning quantum process can be solved with

 O(log|C|+log(1/δ)ϵ2)

samples when the outputs are pure states and

 O(log3|C|(log|C|+log(1/δ))ϵ2)

samples if the outputs can be mixed.

Other than a generalization of the classical PAC learning model, PAC learning quantum process can be viewed as an efficient way to do quantum process tomography when we know that the target quantum processes comes from a finite set. For example, if we try to PAC-learn a polynomial sized quantum circuit of -qubits, since there are only possible polynomial sized circuits, our result shows that we can learn it in samples, an exponential improvement over a full process tomography.

Since our samples consist of unknown quantum states, a challenging part of the problem is how to extract information from those quantum states to distinguish the concepts. In fact, this is most of the challenge the problem, and we can isolate the challenge by focusing on the special case of constant input. In this special case, the problem of PAC learning quantum process becomes an interesting hybrid of quantum state discrimination and quantum state tomography, and we called it the approximate state discrimination problem. The approximate state discrimination problem is detailed as follows: Let be a finite set of -dimensional density matrices. We want to learn a target state using as few identical copies of as possible. A quantum algorithm is an -approximate discriminator of if it takes the description of and copies of as input and with probability outputs a state with , for any .

Since it is a special case of PAC learning quantum process, the approximate state discrimination problem can also be solved with

 O(log|S|+log(1/δ)ϵ2)

samples if is consisted of pure states and

 O(log3|S|(log|S|+log(1/δ))ϵ2)

samples if the states in can be mixed.

To the knowledge of the authors, the approximate state discrimination problem has not been studied in the literature. It is illuminative to compare the approximate state discrimination problem to other well studied problems in the literature that try to learn/distinguish quantum states while given multiple copies of an unknown -dimensional state . In the following paragraphs we compare approximate state discrimination to quantum state tomography, quantum state discrimination, and quantum property testing.

In the problem of quantum state tomography, we want to get an -approximation of the unknown state . Compared to quantum state tomography, approximate state discrimination has the same goal of finding an -close output, but we are given a promise that the unknown state comes from a known finite set . As a result, the sample complexity of our algorithms is and independent of , the dimension of the target state. This means this we have an speedup if the concept class is not too large. For example, if the size of is exponential in the number of qubits, , the approximate state discrimination problem can be solved in samples, an exponential improvement over full state tomography which uses samples.

In the quantum state discrimination problem [AM14, Mon08, Mon06, BK17, TADR18], which is also called as quantum detection problem [Mon06] or quantum hypothesis testing [TADR18], we are promised that the state comes from a known finite set and we want to find out what is exactly. Compared to the state discrimination problem, approximate state discrimination has the same promise of finite input set, but allows an approximate output instead of finding the exact answer. Therefore, the promise on the minimum distance between inputs can be removed, and the simple state discrimination algorithm of taking several copies to amplify the minimum distance then taking a PGM (pretty good measurement) does not work on the approximate state discrimination problem, since the error probability of PGM is not bounded when some of the states are close to each other.

In the quantum property testing problem [MdW16, HLM17], we are given copies of an unknown quantum state and want to determine whether , where is a known (possibly infinite) set quantum states, or is -far from anything in . Harrow, Cedric, and Montanaro [HLM17] give an upper bound on the quantum property testing problem in the special case where is a finite set of pure states. Comparing to [HLM17], our pure state algorithm has essentially the same sample complexity. Note that in quantum property testing, the unknown state does not always come from , and we only want a decision answer instead of finding a state, so it is pretty different from approximate state discrimination. Also note that the quantum property testing result of [HLM17] cannot be generalized to arbitrary mixed states, as [BOW17] shows that to certify a mixed state requires samples. In the quantum state certification problem, we are given copies of an unknown quantum state and ask whether is equal to some known state or far from it, so it is obviously a special case of quantum property testing with , and the lower bound of is much larger than the sample complexity of in [HLM17] unless is exponential or more in .

There are also several works in the literature that study the sample complexity of PAC learning with different ways of generalization to quantum computation. Aaronson [Aar07] studies the problem of PAC learning an arbitrary unknown quantum state, where the inputs are binary outcome measurements with full classical description and the outputs are the measurement outcomes. They show that sample complexity is linear to the number of qubits of the concepts. Cheng, Hsieh, and Yeh [CHY15] studies the sample complexity of PAC learning arbitrary two outcome measurements, where the inputs are quantum states, and the learner has complete classical description of it. They show an upper of sample complexity linear in the dimension of the Hilbertspace. Note that one can trivially get a lowerbound of similar order by noticing that Boolean functions is a subset of two outcome measurements. Arunachalam and de Wolf [AdW17b] studies the sample complexity of PAC learning classical functions with quantum samples and shows that there is no quantum speed up. See [AdW17a] for a survey of quantum learning theory.

## 2 Preliminary

Through out this paper, is base 2.

We use to denote the trace norm . We use or to denote the Frobenius norm .

Denote the trace distance and fidelity between two distribution as and . Denote the trace distance and fidelity between two quantum states as and . For a quantum state and a quantum measurement , denote

as the output probability distribution when applying

on .

Note that fidelity and trace distance are related by

 1−F≤Δtr≤√1−F2.

For two quantum process concepts , define the distance between them as

 Δ(c1,c2)=Ex∈D[Δtr(c1(x),c2(x))].

We say that are -close if and -far if . For two sets of concepts and , define the distance between them as .

### 2.1 Pretty Good Measurement

The pretty good measurement (PGM) is defined as follows:

###### Definition 1 (pretty good measurement).

Let be a set of density matrices and the set of corresponding probabilities. Define

 Ai=piσi,A=∑iAi. (2.1)

The PGM associated with is a POVM with

 Ei=A−1/2AiA−1/2. (2.2)

## 3 Definition of PAC Learning Quantum Process and Approximate State Discrimination

In this section we describe the model of PAC learning quantum process and approximate state discrimination.

### 3.1 PAC Learning Quantum Process

Let the concept class be a finite set of known to dimensional quantum process. A learner trying to learn the target concept is given samples , where are inputs to the quantum process and are the corresponding quantum density matrices outputted by . The inputs are computational basis states drawn from an unknown distribution . The outputted states are unknown quantum states, meaning that we hold a copy of the quantum state rather than the full classical description of it.

A proper -learner for the concept class is a quantum algorithm that takes the description of and samples as input and outputs a hypothesis such that iss with probability for any target concept and input distribution . The sample complexity of a learner is the maximum of the the number of sample it took over and . The sample complexity of a concept class is the minimum sample complexity over all learners

### 3.2 Approximate State Discrimination

Let be a finite set of -dimensional density matrices. We want to learn a target state using as few identical copies of as possible. A quantum algorithm is an -approximate discriminator of if it takes the description of and copies of as input and with probability outputs a state with , for any .

Note that approximate state discrimination can be viewed as a special case of PAC learning quantum process with constant input, so the algorithms for PAC learning quantum process in Section 4 and Section 5 trivially works for approximate state discrimination.

## 4 PAC Learning Quantum Process with Pure State Output

The algorithm follows ideas by Sen [Sen05], who shows that random orthonormal measurement preserves trace distance between pure states. One can then apply random orthonormal measurements on each sampled output and take enough samples to amplify the distance between -far concepts to

and show that the probability for the maximumly likelihood estimate to select a

-far concept over the target concept is less than . Take a union bound and we have a bounded error probability.

###### Theorem 2.

Algorithm 3 is a proper -PAC learner for any concept class of quantum processes with pure state outputs, using

 O((log|C|)+log(1/δ)ϵ2)

samples.

###### Algorithm 3 (algorithm for pure state output).

1. take samples

2. do a random orthonormal measurement on each output state

3. output the concept that is most likely to give the measured result of step 2:

 h=argmaxc∈CΠi∈[T]Pr[Mi(c(xi))=Mi(σi)] (4.1)

We need the following lemmas to prove the correctness of Algorithm 3. First we state the result 1 of [Sen05] (lemma 4 of arxiv version):

###### Lemma 4 (random orthonormal measurement).

Let be two density matrices in . Define . There exists a universal constant such that if then with probability at least over the choice of a random orthonormal measurement basis in , .

Note that if are pure states, for large enough and so that .

The following lemma shows how trace distance of the measured result grows when we take multiple samples.

###### Lemma 5 (trace distance amplification).

Let be independent distributions and so are

. Denote the joint distribution

as and as . Suppose that

 ∑iΔtr(Xi,Yi)=Tϵ, (4.2)

then

 Δtr(X,Y)≥1−2−Ω(Tϵ2) (4.3)
###### Proof.

By Cauchy-Schwarz inequality,

 ∑i(Δtr(Xi,Yi))2≥Tϵ2, (4.4)

Then the joint fidelity is bounded by

 F(X,Y) =ΠiF(Xi,Yi) ≤Πi√1−(Δtr(Xi,Yi))2 ≤exp[−12∑i(Δtr(Xi,Yi))2]=2−Ω(Tϵ2), (4.5)

where last inequality is true because . And the joint trace distance is

 Δtr(X,Y)≥1−F(X,Y)=1−2−Ω(Tϵ2). (4.6)

The following lemma analyzes the effectiveness of maximum likely estimate.

###### Lemma 6.

For any two distributions have trace distance ,

###### Proof.
 0 ≤∑i:D(i)≤D∗(i)D(i) =∑i:D(i)≤D∗(i)D(i)−D∗(i)+∑i:D(i)≤D∗(i)D∗(i) =−t+Pri∼D∗(D(i)≤D∗(i)) (4.7) ⇒ Pri∼D∗(D(i)≤D∗(i))≥t

Now we have everything to prove theorem 2.

###### Proof.

Let be the target concept and be the hypothesis guessed by Algorithm 3. Let . Recall that we took

 T=Θ(log|C|+log(1/δ)ϵ2)

samples. For all , apply Lemma 4 to the pair of states , we get that with probability over random orthonormal measurements ,

 EMiΔtr(Mi(c∗(xi)),Mi(h(xi)))>k/√2Δtr(c∗(xi),h(xi)), (4.8)

where

is a universal constant. Since you can pad some aniclla states to increase

without changing trace distnaces if is not small enough, we ignore this term. By Chernoff bound, with probability at least over sampled from ,

 E{xi},{Mi}∑iΔtr(Mi(c∗(xi)),Mi(h(xi)))>E{xi}∑ik/√2Δtr(c∗(xi),h(xi))≥k2√2Tϵ. (4.9)

So we can apply Lemma 5 to get that with probability at least ,

 E{xi},{Mi}[Δtr({Mi(c∗(xi))},{Mi(h(xi))})]≥1−2−Ω(Tϵ′2). (4.10)

Finally we apply Lemma 6 and union bound to get

 Pr[Δ(c∗,h)>ϵ]≥1−(2−Ω(Tϵ2)+2−Ω(Tϵ))⋅|C|≥1−δ. (4.11)

## 5 PAC Learning Quantum Process with Mixed State Output

The random orthonormal measurement approach in Section 4 does not work since two high dimensional mixed states with constant trace distance between them can have negligible Frobenius distance between them. Instead, we show that if we apply PGM222Technically the measurement we applied is not a PGM but a minimax measurement strategy whose worst case error probability is upper bounded by the error probability of PGMs. over a carefully chosen subset of , we can rule out the possibility of the target concept being inside some subset whose size is constant fraction of . We then repeat this procedure times to pinpoint .

Before we show the procedures about PGM, let us first show that we can efficiently amplify the distance between concepts by taking more samples.

###### Lemma 7 (concept distance amplification).

Let be a quantum process concept -far from the target concept . Let be inputs drawn from the distribution . With probability over , we have

 F⎛⎝⨂i∈[T]c(xi),⨂i∈[T]c∗(xi)⎞⎠≤2−Ω(Tϵ2) (5.1)

and

 Δtr⎛⎝⨂i∈[T]c(xi),⨂i∈[T]c∗(xi)⎞⎠≥1−2−Ω(Tϵ2). (5.2)
###### Proof.

By Chernoff bound, with probability ,

 ∑iΔtr(c(xi),c∗(xi))≥12Tϵ. (5.3)

Then by Cauchy-Schwarz Inequality,

 ∑i(Δtr(c(xi),c∗(xi)))2≥14Tϵ2. (5.4)

Then the amplified fidelity is bounded by

 F(⨂ic(xi),⨂ic∗(xi)) =ΠiF(c(xi),c∗(xi)) ≤Πi√1−(Δtr(c(xi),c∗(xi)))2 ≤exp[−12∑i(Δtr(c(xi),c∗(xi)))2]=2−Ω(Tϵ2), (5.5)

where last inequality is true because . And the amplified trace distance is

 Δtr⎛⎝⨂i∈[T]c(xi),⨂i∈[T]c∗(xi)⎞⎠≥1−F(⨂ic(xi),⨂ic∗(xi))=1−2−Ω(Tϵ2). (5.6)

Lemma 7

means that we can amplify the distance between tensor products of samples from quantum processes as efficient as we do on samples of fixed quantum states. This means that PAC learning quantum process is really similar to approximate state discrimination even in the mixed state case.

Now back to the topic of PGM. When trying to apply PGM to approximate state discrimination, the main difficulty is that there is no restriction on the concept class, so the distance between two concepts can be arbitrarily small, and this poses a difficulty to PGM, who tries to distinguish every concept. We might even get a pathological case where two orthogonal concepts and be connected by a chain of close concepts, making PGM unable to distinguish those orthogonal cnocepts. To combat these issues, we partition the concept class into three subsets , chosen so that the distance333Recall that the distance between two sets of concepts are defined as between and is , a number to be chosen later. The idea is that we give up gaining information about the concepts in , in exchange for a good “binary distinguishment” between and . We apply PGM444Again, technically the measurement we applied is not a PGM. just to get an answer: if our measurement result is , we know the target concept is in ( or ) and thus not in , and vice versa, so we can either rule out the possibility that the target concept is in or the possibility that the target concept is in . We pick and so that their size are both a constant fraction of the size of the concept class except when an “extreme case” is found, so we can always rule out a constant fraction of the concept class after the PGM measurement. Repeat times and we found the target concept. A careful reader might have already recognized an extreme case: it is not possible to have constant sized and separate by a gap if every concept in is literally on top of each other. But note that in this case, we can output anything in as the hypothesis and it will be -close to . More generally, our partition algorithm will not be able to reserve a constant sized if a significant fraction of is clustered around a concept. In such an extreme case, we choose the cluster as with an -thick “shell” of around it. If we measured , we can rule out , which is a constant fraction of . If we measured yes, we can output the center of the cluster as the hypothesis, and we tune so that everything in the cluster or the -shell is -close to the center.

The measurement we use to distinguish and is derived from pretty good measurement and minimax theorem. First by slightly modifying a result of [BK02] and [AM14], we got a lemma about the distinguishing power of PGM on two disjoint sets:

###### Lemma 8 (Binary distinguishment power of PGM).

Let be a set of density matrices and corresponding probabilities where and . 555We will slightly abuse the notation and write or instead of or . When we do a PGM on