# Inductive supervised quantum learning

In supervised learning, an inductive learning algorithm extracts general rules from observed training instances, then the rules are applied to test instances. We show that this splitting of training and application arises naturally, in the classical setting, from a simple independence requirement with a physical interpretation of being non-signalling. Thus, two seemingly different definitions of inductive learning happen to coincide. This follows from the properties of classical information that break down in the quantum setup. We prove a quantum de Finetti theorem for quantum channels, which shows that in the quantum case, the equivalence holds in the asymptotic setting, that is, for large number of test instances. This reveals a natural analogy between classical learning protocols and their quantum counterparts, justifying a similar treatment, and allowing to inquire about standard elements in computational learning theory, such as structural risk minimization and sample complexity.

## Authors

• 1 publication
• 4 publications
• 13 publications
• ### A Survey on Quantum Channel Capacities

Quantum information processing exploits the quantum nature of informatio...
01/06/2018 ∙ by Laszlo Gyongyosi, et al. ∙ 0

• ### Inner bounds via simultaneous decoding in quantum network information theory

We prove new inner bounds for several multiterminal channels with classi...
06/19/2018 ∙ by Pranab Sen, et al. ∙ 0

• ### Quantum Programming with Inductive Datatypes: Causality and Affine Type Theory

Inductive datatypes in programming languages allow users to define usefu...
10/21/2019 ∙ by Romain Péchoux, et al. ∙ 0

• ### Supervised quantum gate "teaching" for quantum hardware design

We show how to train a quantum network of pairwise interacting qubits su...
07/20/2016 ∙ by Leonardo Banchi, et al. ∙ 0

• ### Distributional property testing in a quantum world

A fundamental problem in statistics and learning theory is to test prope...
02/02/2019 ∙ by András Gilyén, et al. ∙ 0

• ### Inadequacy of modal logic in quantum settings

We explore the extent to which the principles of classical modal logic c...
04/03/2018 ∙ by Nuriya Nurgalieva, et al. ∙ 0

• ### Coaxioms: flexible coinductive definitions by inference systems

We introduce a generalized notion of inference system to support more fl...
08/08/2018 ∙ by Francesco Dagnino, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## Appendix A Supplemental material: Proof of our main result

Our main result, Theorem 3 (Theorem 1 in the main text), consists in showing that for every non-signalling CPTP map there is a symmetric one-way LOCC map that approximately reproduces all local expectation values, and is non-signalling by construction. The backbone of our result is the quantum de Finetti theorem, specifically in its form as it appears in Christandl et al. (2007), which we restate here:

###### Theorem 1 (Quantum de Finetti theorem Christandl et al. (2007)).

Let and be quantum systems and let be a symmetric quantum state under exchange of the systems. If admits a symmetric extension then there is a set , a POVM over on , and a map such that

 ∥∥∥ωAB1:k−∫M(dg)dA⊗ϕg⊗k∥∥∥1≤4d2kn (4)

where , , , only depends on the -extension and, in particular, is independent of . denotes the trace-norm of operator . In general, one can take . G and the accuracy of the approximation is independent of the dimension of .

In order to apply Theorem 1 to our problem, we also use the Choi-Jamiolkowski identification between quantum states and quantum channels Bengtsson and Życzkowski (2006).

###### Theorem 2 (Choi).

Every CP map can be represented by a positive semidefinite operator , such that

 ϕ=idX⊗Φ(Ω), (5)

where , and . In addition, for any we have

 Φ(X)=dXtrX[ϕ⊤XX⊗\openoneY]. (6)

The adjoint map is given by (we use the customary identification between and induced by the Hilbert-Schmidt product)

 Φ∗(Y)=dXtrY[ϕ⊤X\openoneX⊗Y]. (7)

In addition, if is trace-preserving, then .

This allows us to characterize properties of channels by referring to properties of their respective Choi matrices. The non-signalling property of a quantum channel has a direct relation with the reduced states of its Choi matrix:

###### Lemma 1.

Let be a non-signalling quantum channel, and let be its Choi matrix. Then

 trYk+1:n[ωA(XY)1:n]=ωA(XY)1:k⊗\openoneXk+1:ndn−k (8)

and is the Choi matrix of the induced channel .

Lemma 1 is proved by straightforward evaluation.

Applying Theorem 1 to the Choi matrix of the CPTP map , , we get an approximation to as described by the Choi matrix

 ηA(XY)1:k=∫GM(dg)⊗ϕ⊗kg. (9)

For the approximation is exact, so , therefore is a POVM. The positive semidefinite quantum states describe a family of completely positive maps .

The state does not, however, represent a quantum operation which is deterministically realizable, in the first place because may not be , as is required for a trace-preserving channel. Furthermore, a quantum channel can be implemented by 1-way LOCC iff its Choi matrix is of the form

 ~ηA(XY)1:k=∫GM(dg)⊗~ϕ⊗kg, (10)

where , for all . This would ensure that all corresponding CP maps are trace-preserving, and thus the channel described by can be implemented by first performing measurement on and then applying on each of the systems .

Although one does not expect that each in Eq. (9) satisfies

 trY[ϕg]?=\openoneX/dX, (11)

on average they approximately do. More importantly, we now show that the outcomes are concentrated with high probability on those which almost satisfy the condition. Let be the trace-norm and be the operator norm.

###### Lemma 2.

Let be a non-signalling CPTP map with Choi matrix , and let and be such that

 ηA(XY)1:k=∫GM(dg)⊗ϕ⊗kg

is a separable approximation of such that

 ∥∥ηA(XY)1:k−ωA(XY)1:k∥∥1≤kδ. (12)

Define for all and for any subset ,

 (13)

Then, the following holds

1. For any , let , . Then

 E0[¯Rϵ]≤d2Xϵ2(2δ(1+1dX)+δ2). (14)

Consider the measurement is performed on the state yielding outcome , and is to be applied on each of the test instances. Of course, for this to be deterministically implementable, one needs that , which amounts to . If this condition is met approximately, one can implement a suitably modified map at the expense of actually implementing a slightly worse approximation to . However, if the condition is not met even approximately, the implementation cannot be expected to approximate . Lemma 2 shows that this case is unlikely to occur, since

Hence, one can slightly modify the operators into in order to satisfy Eq. (11) and ensure that in all cases, either and are close enough, or is unlikely enough so that the approximation still converges in to the actual channel given by . We call this a 1-way LOCC approximation.

###### Lemma 3 (1-way LOCC approximation).

Let be a symmetric, non-signalling CPTP map with Choi matrix . Then there is a POVM and there are states such that and the quantum state

 ηAXY=∫GM(dg)⊗~ϕg (16)

is a separable approximation to ,

 ∥ωAXY−ηAXY∥1≤cn−1/6+O(n−1/3). (17)

where is a constant depending on and .

###### Proof of Lemma 3.

Let and be the factors in the de Finetti approximation to , which admits a symmetric -extension by assumption. Then they satisfy Eq. (12) with . From Statement 1 in Lemma 2 we have

 ∥∥∥E1[G]−\openoneXdX∥∥∥∞≤∥∥∥E1[G]−\openoneXdX∥∥∥1≤δ, (18)

so that

 E1[G]≥(1dX−δ)\openoneX. (19)

Therefore, for we have

 g∈Rϵ⇒τg≡trY[ϕg] ≥E1[G]−ϵ\openone>0. (20)

Thus, we can ensure that all satisfy . We can define

 ~ϕg={1dX(τ−1/2g⊗\openoneY)ϕg(τ−1/2g⊗\openoneY)if g∈Rϵφif g∈¯Rϵ, (21)

where is the Choi matrix of any CPTP map . By definition every has , and using we can write

 tr[τ1/2g]2 ≥(tr√E1[G]−ϵ\openoneX)2 ≥(tr[(1dX−δ−ϵ)1/2\openoneX])2 =dX−d2X(δ+ϵ), (22)

Thus, Lemma 6 shows that for all ,

 ∥ϕg−~ϕg∥1≤√dX√ϵ+δ, (23)

 ∥ϕ⊗kg−~ϕ⊗kg∥1≤{k√dX√ϵ+δif g∈Rϵ2if g∈¯Rϵ. (24)

Combining this with for all ,

 ∥M(dg)⊗(ϕ⊗kg−~ϕ⊗kg)∥1 =tr[M(dg)]∥ϕ⊗kg−~ϕ⊗kg∥1, (25)

and the triangle inequality we get

 ∥∥∥∫GM(dg)⊗(ϕ⊗kg−~ϕ⊗kg)∥∥∥1 ≤∫Gtr[M(dg)]∥ϕ⊗kg−~ϕ⊗kg∥1 ≤k√dX√ϵ+δ∫Rϵtr[M(dg)]+2∫¯Rϵtr[M(dg)] ≤k√dX√ϵ+δ+2E0[¯Rϵ] ≤k√dX√ϵ+δ+2d2Xϵ2(2δ(1+1dX)+δ2). (26)

Taking and using the triangle inequality we get

 ∥ωAXY−ηAXY∥1 ≤∥∥∥ωAXY−∫GM(dg)⊗ϕg∥∥∥1+∥∥∥∫GM(dg)⊗(ϕg−~ϕg)∥∥∥1 ≤δ+√dX√ϵ+δ+2d2Xϵ2(2δ(1+1dX)+δ2). (27)

Chosing and expanding around up to leading order we get

 ∥∥∥ωAXY−∫GM(dg)⊗~ϕg∥∥∥1≤√dXδ1/6+O(δ1/3), (28)

 ∥∥∥ωAXY−∫GM(dg)⊗~ϕg∥∥∥1≤41/6d5/6Xd1/3Y1n1/6+O(n−1/3) (29)

the desired result. ∎

Having established a 1-way LOCC approximation bound for any symmetric non-signalling channel, we can now proceed to prove our main result (Theorem 1 in the main text):

###### Theorem 3 (Main result).

Let be a non-signalling quantum channel, and let be a local operator. Then, there exists a POVM on and a set of quantum channels such that the quantum channel ,

 ~Q=∫^M(dg)⊗Φ⊗ng, (30)

satisfies

###### Proof of Theorem 3.

We want to obtain approximation bounds for

 E[Q]≡tr[¯S(Q⊗idY′1:n)(ρA(XY′)1:n)]. (31)

The specific form of is irrelevant for our purposes, besides symmetry among the parties. Expressing in terms of the symmetrized local channel , and in turn, in terms of its Choi matrix, we have

To ease the notation, it is convenient to define , so that Eq. (A) reads

Using Lemma 3 we can replace by its 1-way LOCC approximation ,

which satisfies

Finally, we can absorb the constant into the factors preceeding . ∎

## Appendix B Proofs of Lemmas and Theorem 1

We restate and prove Lemma 1 in the main text. We also mention that a related but more general result on a de Finetti theorem for non-signalling classical conditional probability distributions can be found in

Christandl and Toner (2009).

###### Lemma 4.

For every inductive learning protocol that assigns labels to test instantes , there exists a set of classifying functions and stochastic maps , such that the inductive protocol

 ~P(y1:n|A,x1:n)=∑f[n∏i=1Q(yi|xi,f)]T(f|A)

has expected risk for all .

###### Proof of Lemma 4.

Consider the expected risk of protocol . Let be any permutation of elements, and let the be the accordingly permuted protocol

 P(σ)(y1:n|A,x1:n)=P(yσ(1):σ(n)|A,xσ(1):σ(n)). (37)

Furthermore, let be the symmetrized protocol,

 ¯P(y1:n|A,x1:n)=1n!∑σ∈SnP(σ)(y1:n|A,x1:n). (38)

It follows trivially that

 E[P|A]=E[P(σ)|A]=E[¯P|A],∀σ∈Sn,A. (39)

One can define the marginal maps , which are all equal, so we refer to them as ,

 ¯P1(y|A,x1:n)=∑y2,n¯P(y,y2:n|A,x1:n). (40)

Since is non-signalling, so is , namely satisfies the condition

 ¯P1(y|A,x1:n)=¯P1(yi|A,x,x′2:n),∀x′2:n (41)

and so we can simply write . The conditional expected risk can be expressed in terms of ,

 E[P|A]=∑x,y,y′¯δy,y′¯P1(y|A,x)PXY(x,y′). (42)

Considering fixed, is a stochastic map from to , and thus it is a convex combination of deterministic maps for some set of functions , i.e.

 ¯P1(y|A,x)=∑fμA(f)Qf(y|x), (43)

where is a probability measure that depends on . Then

 E[P|A]=∑fμA(f)E[Qf|A]. (44)

Thus, the stochastic maps and can be combined into the protocol

 ~P(y1:n|A,x1:n)=∑f[n∏i=1Q(yi|xi,f)]T(f|A), (45)

which achieves

 E[~P|A] =E⎡⎣∑fn∏i=1Q(yi|xi,f)T(f|A)∣∣ ∣∣A⎤⎦ =\ ∑x1:n,y1:n,y′1:n1n(sy1,y′1+⋯+syn,y′n)∑fn∏i=1PXY(x1,y′1)⋯PXY(xn,y′n) =∑x1,y1,y′1¯δy1y′1∑fQ(y1|x1,f)T(f|A)PXY(x1,y′1) =∑x1,y1,y′1¯δy1y′1¯P1(y1|A,X)PXY(x1,y′1) =E[¯P|A]. (46)

The following proof of Theorem 1 reproduces that of the original paper Christandl et al. (2007), where, as suggested, a probability measure is replaced by an operator-valued measure.

###### Proof of Theorem 1.

Let us start by assuming admits a pure state extension . Then

 ∣∣ΨAB1:n⟩∈HA⊗Hsym(n), (47)

where is the symmetric subspace of . Let also .

Let be a generic element, a reference state in , and the Haar measure on . Let and use .

For any , let be a POVM in , such that

 ∫dgEgk=\openoneHsym(k). (48)

This allows to write

 ωAB1:k=∫wgωgAB1:kdg, (49)

where is the residual state on when measuring

 wgωgAB1:k=trBk+1:n[\openoneA⊗\openoneB1:k⊗Egn−kΨAB1:n]. (50)

Then is close to a convex combination of separable and -iid states , with a distribution independent of , namely

 Δk=ωAB1:k−∫M(dg)⊗ϕ⊗kg (51)

is close to zero in trace-norm. The operator-valued measure is given by

 M(dg)=trB1:n[\openoneA⊗EgnΨAB1:n]dg. (52)

We now bound , where

 S (53) δ =(1−dimHsym(n−k)dimHsym(n))∫M(dg)⊗ϕ⊗kg. (54)