Inductive supervised quantum learning

05/24/2016 ∙ by Alex Monràs, et al. ∙ 0

In supervised learning, an inductive learning algorithm extracts general rules from observed training instances, then the rules are applied to test instances. We show that this splitting of training and application arises naturally, in the classical setting, from a simple independence requirement with a physical interpretation of being non-signalling. Thus, two seemingly different definitions of inductive learning happen to coincide. This follows from the properties of classical information that break down in the quantum setup. We prove a quantum de Finetti theorem for quantum channels, which shows that in the quantum case, the equivalence holds in the asymptotic setting, that is, for large number of test instances. This reveals a natural analogy between classical learning protocols and their quantum counterparts, justifying a similar treatment, and allowing to inquire about standard elements in computational learning theory, such as structural risk minimization and sample complexity.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


Appendix A Supplemental material: Proof of our main result

Our main result, Theorem 3 (Theorem 1 in the main text), consists in showing that for every non-signalling CPTP map there is a symmetric one-way LOCC map that approximately reproduces all local expectation values, and is non-signalling by construction. The backbone of our result is the quantum de Finetti theorem, specifically in its form as it appears in Christandl et al. (2007), which we restate here:

Theorem 1 (Quantum de Finetti theorem Christandl et al. (2007)).

Let and be quantum systems and let be a symmetric quantum state under exchange of the systems. If admits a symmetric extension then there is a set , a POVM over on , and a map such that


where , , , only depends on the -extension and, in particular, is independent of . denotes the trace-norm of operator . In general, one can take . G and the accuracy of the approximation is independent of the dimension of .

In order to apply Theorem 1 to our problem, we also use the Choi-Jamiolkowski identification between quantum states and quantum channels Bengtsson and Życzkowski (2006).

Theorem 2 (Choi).

Every CP map can be represented by a positive semidefinite operator , such that


where , and . In addition, for any we have


The adjoint map is given by (we use the customary identification between and induced by the Hilbert-Schmidt product)


In addition, if is trace-preserving, then .

This allows us to characterize properties of channels by referring to properties of their respective Choi matrices. The non-signalling property of a quantum channel has a direct relation with the reduced states of its Choi matrix:

Lemma 1.

Let be a non-signalling quantum channel, and let be its Choi matrix. Then


and is the Choi matrix of the induced channel .

Lemma 1 is proved by straightforward evaluation.

Applying Theorem 1 to the Choi matrix of the CPTP map , , we get an approximation to as described by the Choi matrix


For the approximation is exact, so , therefore is a POVM. The positive semidefinite quantum states describe a family of completely positive maps .

The state does not, however, represent a quantum operation which is deterministically realizable, in the first place because may not be , as is required for a trace-preserving channel. Furthermore, a quantum channel can be implemented by 1-way LOCC iff its Choi matrix is of the form


where , for all . This would ensure that all corresponding CP maps are trace-preserving, and thus the channel described by can be implemented by first performing measurement on and then applying on each of the systems .

Although one does not expect that each in Eq. (9) satisfies


on average they approximately do. More importantly, we now show that the outcomes are concentrated with high probability on those which almost satisfy the condition. Let be the trace-norm and be the operator norm.

Lemma 2.

Let be a non-signalling CPTP map with Choi matrix , and let and be such that

is a separable approximation of such that


Define for all and for any subset ,


Then, the following holds

  1. For any , let , . Then


Consider the measurement is performed on the state yielding outcome , and is to be applied on each of the test instances. Of course, for this to be deterministically implementable, one needs that , which amounts to . If this condition is met approximately, one can implement a suitably modified map at the expense of actually implementing a slightly worse approximation to . However, if the condition is not met even approximately, the implementation cannot be expected to approximate . Lemma 2 shows that this case is unlikely to occur, since


Hence, one can slightly modify the operators into in order to satisfy Eq. (11) and ensure that in all cases, either and are close enough, or is unlikely enough so that the approximation still converges in to the actual channel given by . We call this a 1-way LOCC approximation.

Lemma 3 (1-way LOCC approximation).

Let be a symmetric, non-signalling CPTP map with Choi matrix . Then there is a POVM and there are states such that and the quantum state


is a separable approximation to ,


where is a constant depending on and .

Proof of Lemma 3.

Let and be the factors in the de Finetti approximation to , which admits a symmetric -extension by assumption. Then they satisfy Eq. (12) with . From Statement 1 in Lemma 2 we have


so that


Therefore, for we have


Thus, we can ensure that all satisfy . We can define


where is the Choi matrix of any CPTP map . By definition every has , and using we can write


Thus, Lemma 6 shows that for all ,


and the subadditivity of the trace distance () leads to


Combining this with for all ,


and the triangle inequality we get


Taking and using the triangle inequality we get


Chosing and expanding around up to leading order we get


which using leads to


the desired result. ∎

Having established a 1-way LOCC approximation bound for any symmetric non-signalling channel, we can now proceed to prove our main result (Theorem 1 in the main text):

Theorem 3 (Main result).

Let be a non-signalling quantum channel, and let be a local operator. Then, there exists a POVM on and a set of quantum channels such that the quantum channel ,



Proof of Theorem 3.

We want to obtain approximation bounds for


The specific form of is irrelevant for our purposes, besides symmetry among the parties. Expressing in terms of the symmetrized local channel , and in turn, in terms of its Choi matrix, we have


To ease the notation, it is convenient to define , so that Eq. (A) reads


Using Lemma 3 we can replace by its 1-way LOCC approximation ,


which satisfies


Finally, we can absorb the constant into the factors preceeding . ∎

Appendix B Proofs of Lemmas and Theorem 1

We restate and prove Lemma 1 in the main text. We also mention that a related but more general result on a de Finetti theorem for non-signalling classical conditional probability distributions can be found in 

Christandl and Toner (2009).

Lemma 4.

For every inductive learning protocol that assigns labels to test instantes , there exists a set of classifying functions and stochastic maps , such that the inductive protocol

has expected risk for all .

Proof of Lemma 4.

Consider the expected risk of protocol . Let be any permutation of elements, and let the be the accordingly permuted protocol


Furthermore, let be the symmetrized protocol,


It follows trivially that


One can define the marginal maps , which are all equal, so we refer to them as ,


Since is non-signalling, so is , namely satisfies the condition


and so we can simply write . The conditional expected risk can be expressed in terms of ,


Considering fixed, is a stochastic map from to , and thus it is a convex combination of deterministic maps for some set of functions , i.e.


where is a probability measure that depends on . Then


Thus, the stochastic maps and can be combined into the protocol


which achieves


The following proof of Theorem 1 reproduces that of the original paper Christandl et al. (2007), where, as suggested, a probability measure is replaced by an operator-valued measure.

Proof of Theorem 1.

Let us start by assuming admits a pure state extension . Then


where is the symmetric subspace of . Let also .

Let be a generic element, a reference state in , and the Haar measure on . Let and use .

For any , let be a POVM in , such that


This allows to write


where is the residual state on when measuring


Then is close to a convex combination of separable and -iid states , with a distribution independent of , namely


is close to zero in trace-norm. The operator-valued measure is given by


We now bound , where


One can readily check that


On the other hand,


Notice that this is an operator in . With this we have