A fundamental task in quantum statistics is to distinguish between two (or multiple) non-orthogonal quantum states. After considerable efforts, the resource trade-off is by now well understood in the information-theoretic limit of asymptotically many copies and quantified by quantum Stein’s lemma HP91 ; ON00 , the quantum Chernoff bound NS09 ; ACMBMAV07 , as well as refinements thereof Nagaoka06 ; Audenaert2008 ; Mosonyi2015 .
As a natural extension of quantum state discrimination we study here the task of distinguishing between two quantum channels, in the information-theoretic limit of asymptotically many repetitions. Whereas the mathematical properties of states and channels are strongly intertwined, channel discrimination is qualitatively different from state discrimination for a variety of reasons. Most importantly, when distinguishing between two quantum channels one can employ adaptive protocols that make use of a quantum memory CDP08a . The physical scenario in which such adaptive protocols apply consists of a discriminator being given “black-box” access to uses of a channel or , and there is no physical constraint on the kind of operations that he is allowed to perform. In particular, the discriminator is allowed to prepare a quantum state with a quantum memory register that is arbitrarily large, perform adaptive quantum channels with arbitrarily large input and output quantum memories between every call to or , and finally perform an arbitrary quantum measurement on the final state. See Figure 1 for a graphical depiction.
For the finite, non-asymptotic regime, such protocols are then also known to give an advantage over non-adaptive protocols, which are restricted to picking a fixed input state and then executing standard state discrimination for the channel outputs. For an in-depth discussion of this phenomenon, we refer to the latest works Duan09 ; Harrow10 ; Puzzuoli2017 and references therein. In fact, the advantage of adaptive protocols in this regime already manifests itself for the discrimination of classical channels (Harrow10, , Section 5). Somewhat surprisingly, however, Hayashi showed that this advantage disappears for classical channel discrimination in the information-theoretic limit of a large number of repetitions Hayashi09 . In particular, the optimal exponential error rate for the discrimination of classical channels in the sense of Stein and Chernoff is achieved by just picking a large number of copies of the best possible product state input and then performing state discrimination for the product output states.
In contrast, in the quantum case, asymptotic channel discrimination has been studied much less systematically than the aforementioned finite, non-asymptotic regime. Notable exceptions include Cooney2016 involving replacer channels and PhysRevLett.118.100502 ; TW2016 about jointly teleportation-simulable channels. Moreover, references Yu17 ; Pirandola18 feature bounds for general quantum channels, but the exact quantitative performance of these bounds remains rather unclear in the asymptotic setting. We would also like to point to the very related quantum strategies framework of GW07 ; G09 ; G12 , as well as the quantum tester framework of CDP08b ; CDP08a .
In this paper, we extend some of the seminal classical results Hayashi09 to the quantum setting by providing a framework for deriving upper bounds on the power of adaptive protocols for asymptotic quantum channel discrimination. In particular, in order to quantify the largest distinguishability that can be realized between two quantum channels, we introduce the concept of amortized channel divergence. This then allows us to give converse bounds for adaptive channel discrimination protocols in the asymmetric hypothesis testing setting in the sense of Stein, as well as in the symmetric hypothesis testing setting in the sense of Chernoff. Now, whenever the amortized channel divergences collapse to the standard channel divergences PhysRevA.97.012332 , we immediately get single-letter converse bounds on the power of adaptive protocols for channel discrimination. Most importantly, we arrive at the characterization of the strong Stein’s lemma for classical-quantum channel discrimination. Namely, as a full extension of the corresponding classical result (Hayashi09, , Corollary 1), we have that picking many copies of the best possible product state input and then applying quantum Stein’s lemma for the product output states is asymptotically optimal. Other examples with tight characterizations include unitary and isometry channels Duan07 ; Duan09 , projective measurements Duan06 , replacer channels Cooney2016 , as well as environment-parametrized channels that are environment seizable, as given here in Definition 36 (the latter including the channels considered in PhysRevLett.118.100502 ; TW2016 ).
Intriguingly, we have to leave open the question whether adaptive protocols improve the exponential error rate for quantum channel discrimination in the asymmetric Stein setting. Even though we provide many classes of channels for which adaptive protocols do not give an advantage in the asymptotic limit, we suspect that in general such a gap exists. We emphasise that this might already occur for entanglement breaking channels or even quantum-classical channels (measurements). Moreover, this would also be consistent with the known advantage of adaptive protocols in the symmetric Chernoff setting Duan09 ; Harrow10 ; Duan16 . From a learning perspective and following Hayashi’s comments for the classical case (Hayashi09, , Section 1)
, this leaves open the possibility that quantum memory is asymptotically helpful for designing active learning protocols for inferring about unknown parameters of quantum systems.
Our paper is structured as follows. In Section II, we introduce our notation, and in Section III, we give the precise information-theoretic settings for asymptotic quantum channel discrimination. As our main technical tool, we then introduce amortized channel divergences and analyse their mathematical properties in Section IV. Based on this framework, we proceed to present various converse bounds on the power of adaptive protocols for quantum channel discrimination in Section V. This is followed by our main result in Section VI, the strong Stein’s lemma for classical-quantum channel discrimination. Section VII discusses various other examples for which tight characterisations are available. We end with Section VIII, where we conclude and discuss open questions.
Here we introduce our notation and give the relevant definitions needed later.
Throughout, quantum systems are denoted by , , and and have finite dimensions , , and , respectively. Linear operators acting on system are denoted by and positive semi-definite operators by . Quantum states of system are denoted by and pure quantum states by . A maximally entangled state of Schmidt rank is given by
where and are orthonormal bases. Quantum channels are completely positive and trace preserving maps from to and denoted by . Classical systems are denoted by , , and and have finite dimensions , , and , respectively. For the Schatten norms are defined for as
In this work, we also consider superchannels CDP08 , which are linear maps that take as input a quantum channel and output a quantum channel. Such superchannels have previously been considered in various contexts in quantum information theory LM15 ; WFD17 ; CG18 . To define them, let denote the set of all linear maps from to . Similarly let denote the set of all linear maps from to . Let denote a linear supermap, taking to . A quantum channel is a particular kind of linear map, and any linear supermap that takes as input an arbitrary quantum channel and is required to output a quantum channel should preserve the properties of complete positivity and trace preservation. Any such transformation that does so is called a superchannel. In CDP08 , it was proven that any superchannel can be physically realized as follows. If
for an arbitrary input channel and some output channel , then the physical realization of the superchannel is as follows:
where is a pre-processing channel, system corresponds to some memory or environment system, and is a post-processing channel.
ii.2 Quantum entropies
The quantum relative entropy for is defined as Ume62
whenever either and is not orthogonal to in Hilbert-Schmidt inner product or and . Otherwise we set . In the above and throughout the paper, we employ the convention that inverses are to be understood as generalized inverses. For , we define the Petz-Rényi divergence in the limit as
We have that
and for general density operators as the following limit: . An explicit expression for the limiting value above is available in (Mosonyi2017, , Lemma 3.1). In the limit , the log-Euclidean Rényi divergence converges to the quantum relative entropy (Mosonyi2017, , Lemma 3.4):
In analogy to the Chernoff divergence representation in (10) we also define the log-Euclidean Chernoff distance as
The log-Euclidean Rényi divergence comes up in our work due to the following “divergence sphere optimization,” holding for not too large and states and (Nagaoka06, , Remark 1):
All of the above quantum Rényi divergences reduce to the corresponding classical versions by embedding probability distributions into diagonal, commuting quantum states.
Iii Settings for asymptotic channel discrimination
In this section, we describe the information-theoretic settings for asymptotic quantum channel discrimination that we study. We emphasise that this is in contrast to most of the previous work that has focused on the finite, non-asymptotic regime.
iii.1 Protocol for quantum channel discrimination
The problem of quantum channel discrimination is made mathematically precise by the following hypothesis testing problems for quantum channels. Given two quantum channels and acting on an input system and an output system , a general adaptive strategy for discriminating them is as follows.
We allow the preparation of an arbitrary input state , where is an ancillary register. The th use of a channel accepts the register as input and produces the register as output. After each invocation of the channel or , an (adaptive) channel is applied to the registers and , yielding a quantum state or in registers , depending on whether the channel is equal to or . That is,
for every on the left-hand side, and for every on the right-hand side. Finally, a quantum measurement is performed on the systems to decide which channel was applied. The outcome corresponds to a final decision that the channel is , while the outcome corresponds to a final decision that the channel is . We define the final decision probabilities as
Figure 1 depicts such a general protocol for channel discrimination when the channel or is called three times.
In what follows, we use the simplifying notation to identify a particular strategy using channels and a final measurement . For simplicity, this shorthand also includes the preparation of the initial state , which can be understood as arising from the action of an initial channel for which the input systems and are trivial. This naturally gives rise to the two possible error probabilities:
In what follows, we discuss the behaviour of the type I and type II error probabilities in various asymmetric and symmetric settings.
In the above specification of quantum channel discrimination, the physical setup corresponding to it is that the discriminator has “black box” access to uses of the channel or , meaning that the channel is some device in the laboratory of the discriminator, he has physical access to both the input and output systems of the channel, and he is allowed to apply arbitrary procedures to distinguish them. As such, the above method of discriminating the channels is the most natural and general in this setting. Other physical constraints motivate different models of channel discrimination protocols, and in fact, there could be a large number of physically plausible channel discrimination strategies to consider, depending on the physical constraints of the discriminator(s). For example, if the channels being compared have input and output systems that are in different physical locations, as would be the case for a long-haul fiber optic cable, then it might not be feasible to carry out such a general channel discrimination protocol as described above (two parties in distant laboratories would be needed), and it would be meaningful to consider a different channel discrimination protocol. However, the channel discrimination described above is the most general, and if there is a limitation established for the distinguishability of two channels in this model, then the same limitation applies to any other channel discrimination model that could be considered.
Another kind of channel discrimination strategy often considered in the literature is a parallel discrimination strategy, in which a state is prepared, either the tensor-power channel
is prepared, either the tensor-power channelor is applied, and then a joint measurement is performed on the systems . As noted in CDP08a , a parallel channel discrimination strategy of the channels and is a special case of an adaptive channel discrimination strategy as detailed above. Indeed, the first state in an adaptive protocol could be with the system of identified with the systems of , and then the role of the first adaptive channel would be simply to swap in system of for the second channel call, the second adaptive channel would swap in system of for the third channel call, etc. As such, parallel channel discrimination is not the most general approach to consider, and as stated previously, any limitation placed on the distinguishability of the channels from an adaptive discrimination strategy serves as a limitation when using a parallel discrimination strategy.
To the best of our knowledge, adaptive quantum channel discrimination protocols were first studied by Chiribella et al. CDP08a , whereas the particular information-theoretic quantities that we introduce in the following Sections III.2–III.5 go back to Hayashi Hayashi09 for the classical case and to Cooney et al. Cooney2016 for the quantum case.
iii.2 Asymmetric setting – Stein
For asymmetric hypothesis testing, we minimize the type II error probability, under the constraint that the type I error probability does not exceed a constant . We are then interested in characterizing the non-asymptotic quantity
as well as the asymptotic quantities
iii.3 Strong converse exponent – Han-Kobayashi
The strong converse exponent is a refinement of the asymmetric hypothesis testing quantity discussed above. For , we are interested in characterizing the non-asymptotic quantity
as well as the asymptotic quantities
The interpretation is that the type II error probability is constrained to tend to zero exponentially fast at a rate , but then if is too large, the type I error probability will necessarily tend to one exponentially fast, and we are interested in the exact rate of exponential convergence. Note that this strong converse exponent is only non-trivial if is sufficiently large.
iii.4 Error exponent – Hoeffding
The error exponent is another refinement of asymmetric hypothesis testing, in the sense that the type II error probability is constrained to decrease exponentially with exponent . We are then interested in characterizing the error exponent of the type I error probability under this constraint. That is, we are interested in characterizing the non-asymptotic quantity
as well as the asymptotic quantities
Note that this error exponent is non-trivial only if is not too large.
iii.5 Symmetric setting – Chernoff
Here we are interested in minimizing the total error probability of guessing incorrectly, that is, symmetric hypothesis testing, which is sometimes also described as the Bayesian setting of hypothesis testing. Given an a priori probability that the first channel is selected, the non-asymptotic symmetric error exponent is defined as111The quantity underlying the non-asymptotic symmetric error exponent was previously studied in CDP08a ; G12 and shown to be related to the norm defined therein (see GW07 ; CDP08b ; G09 for related work).
Given that the expression above involves an optimization over all final measurements abbreviated by , we can employ the well known result relating optimal error probability to trace distance H69 ; H73 ; Hel76 to conclude that
with the equalities following, e.g., from (Yu17, , Theorem 12). That is, choosing a priori probabilities different from does not affect the asymptotic symmetric error exponent. We have the following relation between the asymptotic Hoeffding error exponent and the asymptotic symmetric error exponent:
iii.6 Energy-constrained channel discrimination
The protocols for quantum channel discrimination could be energy-constrained as well. This is an especially important consideration when discriminating bosonic Gaussian channels S17 , for which the theory could become trivial without such an energy constraint imposed. For example, if the task is to discriminate two pure-loss bosonic Gaussian channels of different transmissivities and there is no energy constraint, then these channels can be perfectly discriminated with a single call: one would send in a coherent state of arbitrarily large photon number, and then states output from the two different channels are orthogonal in the limit of infinite photon number (see, e.g., (Winter17, , Section 2)).
To develop the formalism of energy-constrained channel discrimination, let be a Hamiltonian acting on the channel input Hilbert space, and we take to be a positive semi-definite operator throughout for simplicity. Then, for the channel discrimination protocol described in Section III.1 to be energy-constrained, we demand that the average energy of the reduced states at all of the channel inputs satisfy
where . It then follows that an unconstrained protocol corresponds to choosing and , so that the “energy constraint” in (40) is automatically satisfied for all quantum states.
The resulting quantities of interest then depend on the Hamiltonian and energy constraint , and we write to denote the strategy employed. We write the type I and II error probabilities as and , respectively. The resulting optimized quantities of interest from the previous sections are then defined in the same way, but additionally depend on the Hamiltonian and energy constraint . We denote them by
Iv Amortized distinguishability of quantum channels
In order to analyse the hypothesis testing problems for quantum channels as discussed in Section III.1, we now introduce the concept of the amortized distinguishability of quantum channels. This allows us to reduce questions about the operational problems of hypothesis testing to mathematical questions about quantum channels, states, and distinguishability measures of them. In the following, we also detail many properties of the amortized distinguishability of quantum channels.
iv.1 Generalized divergences
From this inequality, we find in particular that for all states , , the following identity holds WWY14
and that for an arbitrary isometric channel , we have that WWY14
We call a generalized divergence faithful if the inequality holds for an arbitrary state , and strongly faithful if for arbitrary states we have if and only if . Moreover, a generalized divergence is sub-additive with respect to tensor-product states if for all and all we have
Examples of interest are in particular the quantum relative entropy, the Petz-Rényi divergences, the sandwiched Rényi divergences, or the Chernoff distance — as defined in Section II.
where is a probability distribution, is an orthonormal basis, and and are sets of states. We note that this property holds for trace distance, quantum relative entropy, and the Petz-Rényi and sandwiched Rényi quasi-entropies and , respectively. A generalized divergence is jointly convex if
Any generalized divergence is jointly convex if it satisfies the direct-sum property, a fact which follows by applying the defining property in (45) and data processing under partial trace.
Based on generalized divergences, one can define a generalized channel divergence as a measure for the distinguishability of two quantum channels PhysRevA.97.012332 . The idea behind the following measure of channel distinguishability is to allow for an arbitrary input state to be used to distinguish the channels:
Definition 2 (Generalized channel divergence PhysRevA.97.012332 ).
Let be a generalized divergence and . The generalized channel divergence of and is defined as
Note that it immediately follows from the axioms on generalized divergences together with the Schmidt decomposition that without loss of generality we can restrict the supremum to pure states and choose system isomorphic to system . Hence, if the channel input system is finite-dimensional, then the optimization problem in (51) becomes bounded. Particular instances of generalized channel divergences include the diamond norm of the difference of and K97 , as well as the Rényi channel divergence from Cooney2016 .
iv.2 Amortized channel divergence
We now define the amortized channel divergence as a measure of the distinguishability of two quantum channels. The idea behind this measure, in contrast to the generalized channel divergence recalled above, is to consider two different states and that can be input to the channels and , in order to explore the largest distinguishability that can be realized between the channels. However, from a resource-theoretic perspective, these initial states themselves could have some distinguishability, and so it is sensible to subtract off the initial distinguishability of the states and from the final distinguishability of the channel output states and . This procedure leads to the amortized channel divergence:
Definition 3 (Amortized channel divergence).
Let be a generalized divergence, and let . We define the amortized channel divergence as
Note that in general the supremum cannot be restricted to pure states only, and moreover, there is a priori no dimension bound on the system . Hence, the optimization problem in (52) can in general be unbounded.
We note here that the idea behind amortized channel divergence is inspired by related ideas from entanglement theory, in which one quantifies the entanglement of a quantum channel by the largest difference in entanglement between the output and input states of the channel BHLS03 ; LHL03 ; KW17a . Several properties of a channel’s amortized entanglement were shown in KW17a , and in the following sections, we establish several important properties of the amortized channel divergence, the most notable one being a data processing inequality, i.e., that it is monotone under the action of a superchannel. Some of these properties are related to those recently considered in Yuan2018 for the quantum relative entropy of channels defined in Cooney2016 ; PhysRevA.97.012332 . Moreover, very recently a special case of the amortized channel divergence was proposed in CE18 , and we discuss this more in Remark 22.
iv.3 Properties of amortized channel divergence
The generalized channel divergence is never larger than its amortized version:
Proposition 4 (Distinguishability does not decrease under amortization).
Let be a faithful generalized divergence and . Then we have that
The proof is immediate, following because we can choose in the optimization of to be equal to and then apply the faithfulness assumption. As we will see, what is fundamental to the problem of channel discrimination is to find instances of divergences and quantum channels for which we have the opposite inequality holding also
If this inequality holds, we say that there is an “amortization collapse,” due to the fact that, when combined with the inequality in (53), we would have the equality , understood as the amortized channel divergence collapsing to the generalized channel divergence.
Additionally, amortized channel divergences are faithful whenever the underlying generalized divergence is faithful, as the following lemma states.
Proposition 5 (Faithfulness).
If a generalized divergence is strongly faithful on states, then its associated amortized channel divergence is strongly faithful for channels, meaning that if and only if .
Suppose that the channels are identical: . Then we have that
This follows because
which follows from the data processing inequality. Equality is achieved by picking and invoking strong faithfulness of the underlying measure. Now suppose that . Then, we have by definition that
Since we have that
this means that
We could then pick equal to the maximally entangled state, and from faithfulness of the underlying measure, deduce that the Choi states are equal. But if this is the case, then the channels are equal. ∎
The data processing inequality is the statement that a distinguishability measure for quantum states should not increase under the action of the same channel on these states. It is one of the most fundamental principles of information theory, and this is the reason why the notion of generalized divergence is a useful abstraction. As an extension of this concept, here we prove a data processing inequality for the amortized channel divergence, which establishes that it does not increase under the action of the same superchannel on the underlying channels. The generalized channel divergence of PhysRevA.97.012332 satisfies this property (as announced in G18 ), and we show here that the amortized channel divergence satisfies this property as well.
Proposition 6 (Data processing).
Let . Let be a superchannel as described in (4). Then the following inequality holds
Set and as the respective channels that are output from the superchannel . Let and be arbitrary input states for and , respectively. Set and , where is the pre-processing quantum channel from (4). Then
The first inequality follows from data processing with the pre-processing channel . The next two equalities follow from definitions. The second-to-last inequality follows from data processing with the post-processing channel . The final inequality follows because the states and are particular states, but the amortized channel divergence involves an optimization over all such input states. Since the chain of inequalities holds for arbitrary input states and , we conclude the inequality in the statement of the proposition by taking a supremum over all such states. ∎
Joint convexity is a natural property that a measure of channel distinguishability should obey. The statement is that channel distinguishability should not increase under a mixing of the channels under consideration.
Proposition 7 (Joint convexity).
Let for all , and let be a probability distribution. Then if the underlying generalized divergence obeys the direct-sum property in (49), the amortized channel divergence is jointly convex, in the sense that
where and .
Let and be arbitrary states. Then, we have that
The first inequality follows from data processing. The first equality follows from the direct-sum property. The final inequality follows from optimizing. Since the inequality holds for an arbitrary choice of states and , we conclude the statement of the proposition. ∎
The following stability property is a direct consequence of the definition of amortized channel divergence.
Proposition 8 (Stability).
Let . Then, we have
where denotes the identity channel on a quantum system of arbitrary size.
For channels that are jointly covariant with respect to a group, we find that it suffices to optimize over and whose reduced states on satisfy the symmetry. As such, the following lemma represents a counterpart to (PhysRevA.97.012332, , Proposition II.4), which established a related statement for the generalized channel divergence.
Lemma 9 (Symmetries).
Let be jointly covariant with respect to a group , as defined above. It then suffices to optimize over states and such that
Each step in what follows is a consequence of data processing. Consider that