Inductive supervised quantum learning

05/24/2016 ∙ by Alex Monràs, et al. ∙ 0

In supervised learning, an inductive learning algorithm extracts general rules from observed training instances, then the rules are applied to test instances. We show that this splitting of training and application arises naturally, in the classical setting, from a simple independence requirement with a physical interpretation of being non-signalling. Thus, two seemingly different definitions of inductive learning happen to coincide. This follows from the properties of classical information that break down in the quantum setup. We prove a quantum de Finetti theorem for quantum channels, which shows that in the quantum case, the equivalence holds in the asymptotic setting, that is, for large number of test instances. This reveals a natural analogy between classical learning protocols and their quantum counterparts, justifying a similar treatment, and allowing to inquire about standard elements in computational learning theory, such as structural risk minimization and sample complexity.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

References

Appendix A Supplemental material: Proof of our main result

Our main result, Theorem 3 (Theorem 1 in the main text), consists in showing that for every non-signalling CPTP map there is a symmetric one-way LOCC map that approximately reproduces all local expectation values, and is non-signalling by construction. The backbone of our result is the quantum de Finetti theorem, specifically in its form as it appears in Christandl et al. (2007), which we restate here:

Theorem 1 (Quantum de Finetti theorem Christandl et al. (2007)).

Let and be quantum systems and let be a symmetric quantum state under exchange of the systems. If admits a symmetric extension then there is a set , a POVM over on , and a map such that

(4)

where , , , only depends on the -extension and, in particular, is independent of . denotes the trace-norm of operator . In general, one can take . G and the accuracy of the approximation is independent of the dimension of .

In order to apply Theorem 1 to our problem, we also use the Choi-Jamiolkowski identification between quantum states and quantum channels Bengtsson and Życzkowski (2006).

Theorem 2 (Choi).

Every CP map can be represented by a positive semidefinite operator , such that

(5)

where , and . In addition, for any we have

(6)

The adjoint map is given by (we use the customary identification between and induced by the Hilbert-Schmidt product)

(7)

In addition, if is trace-preserving, then .

This allows us to characterize properties of channels by referring to properties of their respective Choi matrices. The non-signalling property of a quantum channel has a direct relation with the reduced states of its Choi matrix:

Lemma 1.

Let be a non-signalling quantum channel, and let be its Choi matrix. Then

(8)

and is the Choi matrix of the induced channel .

Lemma 1 is proved by straightforward evaluation.

Applying Theorem 1 to the Choi matrix of the CPTP map , , we get an approximation to as described by the Choi matrix

(9)

For the approximation is exact, so , therefore is a POVM. The positive semidefinite quantum states describe a family of completely positive maps .

The state does not, however, represent a quantum operation which is deterministically realizable, in the first place because may not be , as is required for a trace-preserving channel. Furthermore, a quantum channel can be implemented by 1-way LOCC iff its Choi matrix is of the form

(10)

where , for all . This would ensure that all corresponding CP maps are trace-preserving, and thus the channel described by can be implemented by first performing measurement on and then applying on each of the systems .

Although one does not expect that each in Eq. (9) satisfies

(11)

on average they approximately do. More importantly, we now show that the outcomes are concentrated with high probability on those which almost satisfy the condition. Let be the trace-norm and be the operator norm.

Lemma 2.

Let be a non-signalling CPTP map with Choi matrix , and let and be such that

is a separable approximation of such that

(12)

Define for all and for any subset ,

(13)

Then, the following holds

  1. For any , let , . Then

    (14)

Consider the measurement is performed on the state yielding outcome , and is to be applied on each of the test instances. Of course, for this to be deterministically implementable, one needs that , which amounts to . If this condition is met approximately, one can implement a suitably modified map at the expense of actually implementing a slightly worse approximation to . However, if the condition is not met even approximately, the implementation cannot be expected to approximate . Lemma 2 shows that this case is unlikely to occur, since

(15)

Hence, one can slightly modify the operators into in order to satisfy Eq. (11) and ensure that in all cases, either and are close enough, or is unlikely enough so that the approximation still converges in to the actual channel given by . We call this a 1-way LOCC approximation.

Lemma 3 (1-way LOCC approximation).

Let be a symmetric, non-signalling CPTP map with Choi matrix . Then there is a POVM and there are states such that and the quantum state

(16)

is a separable approximation to ,

(17)

where is a constant depending on and .

Proof of Lemma 3.

Let and be the factors in the de Finetti approximation to , which admits a symmetric -extension by assumption. Then they satisfy Eq. (12) with . From Statement 1 in Lemma 2 we have

(18)

so that

(19)

Therefore, for we have

(20)

Thus, we can ensure that all satisfy . We can define

(21)

where is the Choi matrix of any CPTP map . By definition every has , and using we can write

(22)

Thus, Lemma 6 shows that for all ,

(23)

and the subadditivity of the trace distance () leads to

(24)

Combining this with for all ,

(25)

and the triangle inequality we get

(26)

Taking and using the triangle inequality we get

(27)

Chosing and expanding around up to leading order we get

(28)

which using leads to

(29)

the desired result. ∎

Having established a 1-way LOCC approximation bound for any symmetric non-signalling channel, we can now proceed to prove our main result (Theorem 1 in the main text):

Theorem 3 (Main result).

Let be a non-signalling quantum channel, and let be a local operator. Then, there exists a POVM on and a set of quantum channels such that the quantum channel ,

(30)

satisfies

Proof of Theorem 3.

We want to obtain approximation bounds for

(31)

The specific form of is irrelevant for our purposes, besides symmetry among the parties. Expressing in terms of the symmetrized local channel , and in turn, in terms of its Choi matrix, we have

(32)

To ease the notation, it is convenient to define , so that Eq. (A) reads

(33)

Using Lemma 3 we can replace by its 1-way LOCC approximation ,

(34)

which satisfies

(35)
(36)

Finally, we can absorb the constant into the factors preceeding . ∎

Appendix B Proofs of Lemmas and Theorem 1

We restate and prove Lemma 1 in the main text. We also mention that a related but more general result on a de Finetti theorem for non-signalling classical conditional probability distributions can be found in 

Christandl and Toner (2009).

Lemma 4.

For every inductive learning protocol that assigns labels to test instantes , there exists a set of classifying functions and stochastic maps , such that the inductive protocol

has expected risk for all .

Proof of Lemma 4.

Consider the expected risk of protocol . Let be any permutation of elements, and let the be the accordingly permuted protocol

(37)

Furthermore, let be the symmetrized protocol,

(38)

It follows trivially that

(39)

One can define the marginal maps , which are all equal, so we refer to them as ,

(40)

Since is non-signalling, so is , namely satisfies the condition

(41)

and so we can simply write . The conditional expected risk can be expressed in terms of ,

(42)

Considering fixed, is a stochastic map from to , and thus it is a convex combination of deterministic maps for some set of functions , i.e.

(43)

where is a probability measure that depends on . Then

(44)

Thus, the stochastic maps and can be combined into the protocol

(45)

which achieves

(46)

The following proof of Theorem 1 reproduces that of the original paper Christandl et al. (2007), where, as suggested, a probability measure is replaced by an operator-valued measure.

Proof of Theorem 1.

Let us start by assuming admits a pure state extension . Then

(47)

where is the symmetric subspace of . Let also .

Let be a generic element, a reference state in , and the Haar measure on . Let and use .

For any , let be a POVM in , such that

(48)

This allows to write

(49)

where is the residual state on when measuring

(50)

Then is close to a convex combination of separable and -iid states , with a distribution independent of , namely

(51)

is close to zero in trace-norm. The operator-valued measure is given by

(52)

We now bound , where

(53)
(54)

One can readily check that

(55)

On the other hand,

(56)

Notice that this is an operator in . With this we have