In the theory of data compression of classical information theory one wishes to encode a symbol set, , with a code, , which is a mapping from the symbol set to the set of all finite strings (or sequences) of elements from the alphabet , where is usually taken to be the binary alphabet . The set is frequently referred to as the codebook and its elements are called codewords. Since we compress long strings (sequences) of messages, concatenation is used to extend the code to the set containing all finite strings from the symbol set . This extension of is denoted by and it is called the extended code. A code is said to be uniquely decodable if its extended code is an injective function. In that case, the decoding function is the inverse of . If each symbol of the symbol set
that we wish to encode is always prepared with the same probability, independent of the string of symbols that have appeared earlier, then the sequence
of random variables which gives us the string of symbols to be encoded is independent and identically distributed (i.i.d) with values in the symbol setwith probability mass function equal to . If denotes any member of this sequence of random variables then its Shannon entropy is defined as
In Shannon’s original works on the subject ([33, 34]), the Noiseless Coding Theorem was proved which states that, for any , -many binary bits per symbol are sufficient in order to encode strings of symbols if each entry of the sequence is prepared in a i.i.d way, with probability of error tending to zero as the length of the strings tend to infinity. Moreover, Shannon showed that for any , if at most bits are used per symbol, then the probability of error tends to 1 as the length of the strings tend to infinity. Thus the Shannon entropy can be interpreted as the minimum expected number of binary bits per symbol that are necessary in order to encode strings of symbols with arbitrarily small error (i.e. asymptotically lossless coding) given that the elements of the string of symbols are encoded in an i.i.d. way.
The setting of quantum data compression for indeterminate-length quantum codes is similar to the setting of classical data compression. In this case, the symbol set
contains the symbol states which are normalized vectors spanning a Hilbert space. Here we only consider the compression of pure quantum states, therefore we restrict our attention to normalized vectors or pure states. The classical binary alphabet
is replaced by the set of qubitswhich is the standard orthonormal basis of the Hilbert space . The classical codebook is replaced by the free Fock space . A quantum code is a linear isometry , and the corresponding extended code is a map which is defined on the free Fock space
by “concatenation;” i.e. tensor products of the values ofin the free Fock space . The quantum code is called uniquely decodable if is also an isometry.
The Noiseless Coding Theorem was extended to indeterminate-length quantum codes in 1995 by Schumacher . Schumacher showed that, for any , -many qubits per symbol are sufficient in order to encode strings of symbol states if each entry of the sequence is prepared in a i.i.d way, with probability of error tending to zero as the length of the strings tends to infinity. Here is the ensemble state representing the quantum ensemble , and is the von Neumann entropy of the density matrix given by
Moreover, Schumacher showed that for any , if at most qubits are used per symbol, then the probability of error tends to 1 as the length of the strings tends to infinity. Thus the von Neumann entropy can be interpreted as the minimum expected number of qubits per symbol that are necessary in order to encode strings of symbol states with arbitrarily small error (i.e. asymptotically lossless coding) given that the elements of the string of symbol states are prepared in an i.i.d. way.
Indeterminate-length quantum codes were considered by Schumacher and Westmoreland in , and later by Müller, Rogers and Nagarajan in [24, 25]; and Bellomo, Bosyk, Holik and Zozor in . In all three of these papers, the authors prove a version of the quantum Kraft-McMillan Theorem which states that every uniquely decodable quantum code must satisfy an inequality in terms of the lengths of its eigenstates. Their presentations are very similar to that of the classical Kraft-McMillan Theorem ([12, Theorems 5.2.1 and 5.5.1]) except that these authors did not provide a converse statement. In Theorem 3.9, we present a modified version of the quantum Kraft-McMillan Theorem giving a converse statement, thus characterizing the uniquely decodable quantum codes. Our Theorem 3.9 comes in handy when we define an optimal quantum code that corresponds to a given ensemble.
In Subsection 3.2 we introduce the notion of quantum stochastic ensemble and Markov ensemble, allowing us to prepare strings of symbol states for quantum data compression such that the appearance of each symbol in the string may depend on the previous symbols; i.e. the strings of symbol states are not necessarily prepared in an i.i.d. way. A stochastic ensemble is a sequence , where for each such that is the probability mass function of a discrete stochastic process , is a collection of vector states referred to as the symbol states and is the probability that the string of quantum symbols is encoded, for each and . Our main results, Theorems 4.3 and 4.6, give quantum dynamical entropy interpretations for the average minimum codeword length per symbol as the length of strings of symbol states tend to infinity when the coding is assumed to be lossless. These results extend the result of Schumacher  and Bellomo et al.  which state that for an i.i.d. prepared quantum ensemble the optimal codeword length per symbol is equal to the von Neumann entropy of the initial ensemble state for asymptotically lossless coding. In our result we use the quantum Markov chain approach to quantum dynamical entropy originally introduced in .
2. Prerequisites: Two Notions of Non-commutative Random Walks
In this section we recall the definitions of open quantum random walks introduced by Attal et. al in  and quantum Markov chains introduced by Accardi in . Both are generalizations of random walks to the non-commutative (quantum) setting.
2.1. Open Quantum Random Walks
In this subsection we recall the definitions of open quantum random walks (OQRWs) introduced by Attal, et. al in . Quantum random walks (QRWs) are a generalization of classical random walks to quantum mechanics, the properties of which are significantly different from their classical courterparts (see e.g. ). Here we only consider discrete-time QRWs although the continuous-time version has been defined (see e.g. ). QRWs come in two flavors: unitary QRWs (UQRWs), introduced by Aharonov et. al in  and independently by Meyer in , used to describe closed system dynamics, and OQRWs for open system dynamics. The study of QRWs has enjoyed much interest in recent years [7, 14, 15, 16, 27, 36], and applications have been found in many areas including quantum computing [17, 18], the study of brain networks , and biology [28, 29].
We begin with a tensored Hilbert space , where the coin Hilbert space, , is meant to represent the number,
, of internal degrees of freedom (or chirality) for a walker and theposition Hilbert space, (or more generally if is countably infinite), is meant to represent the position of a random walker on an at most countable vertex set . The vertices will be represented by a fixed orthonormal basis of . A completely positive, trace-preserving (CP-TP) map (or quantum channel) , where denotes the space of trace-class operators on , is an open quantum random walk if it has the following Kraus decomposition:
where for some for each , where denotes the space of all bounded operators on . The operator is meant to describe the change in the coin-state degrees of freedom when the random walker moves from site to site .
It is clear that an OQRW, given by Equation (1), must satisfy
where denotes the identity operator on the Hilbert space , or equivalently
2.2. Quantum Dynamical Entropy via Quantum Markov Chains
In this subsection we recall the definition of quantum Markov chains (QMCs) and dynamical entropy thereon. The QMC was introduced by Accardi in  and its use for describing dynamical entropy was first introduced in  in terms of the Accardi-Ohya-Watanabe (AOW) entropy. Another QMC approach was introduced by Tuyls in  for the study of the Alicki-Fannes (AF) entropy, which was introduced in  and often referred to as ALF entropy to emphasize Lindblad’s contributions. Finally, a generalization of both QMC approaches was given in , where the authors introduced the Kossakowski-Ohya-Watanabe (KOW) entropy. Throughout this paper, we will follow mainly the terminology and notations of  and .
Let be a von Neumann algebraic system (algebraic probability space), where denotes the set of all normal states on the von Neumann algebra . Throughout this paper we will, for simplicity, ignore the GNS construction and assume that for some separable Hilbert space and we will identify each normal state, with its density operator , the space of trace class operators on , through the identification . An algebraic probability space together with an automorphism and an initial state will be denoted by the triple and referred to as a quantum dynamical system. We will be mainly interested in quantum dynamical systems whose dynamics are completely positive and unital maps.
Fix a quantum dynamical system . We will refer to any completely positive, unital map as a transition expectation. Let be an operational partition of unity; i.e. , for each , and . Following [35, Page 413] (see also [19, Equation 3.14]), we will consider the transition expectation given by the equation
for some fixed orthonormal basis of . If is a completely positive, unital map, we will also make use of the transition expectation
If is a Hilbert space, is the von Neumann algebra of all bounded operators on , is a density operator on and, for some , is a transition expectation, then the pair is called a quantum Markov chain. We will be specifically interested in quantum Markov chains whose transition expectation is given by Equation (4). Given a quantum Markov chain, we define the quantum Markov state on by the equation
for all and . Notice that the assumption that the transition expectation is unital implies that is compatible in the sense that
for all and . Moreover, it was shown in [2, Proposition 3.7] that the state on indeed exists.
The joint correlations for are given by the density matrices satisfying
for all and .
Finally, if is a completely positive, unital map on the von Neumann algebra and is a density operator on , then the dynamical entropy of with respect to is given by
where is the von-Neumann entropy and the transition expectation is given by Equation (4). Further, given a subalgebra of , the dynamical entropy of with respect to is given by
The dynamical entropy above is the generalized AF dynamical entropy as defined by the authors of . The description we give is very similar to that of the AF dynamical entropy given by Tuyls in ; however, we do not restrict ourselves to -automorphisms as does the standard construction of AF dynamical entropy.
3. Data Compression
In what follows, all codings will be done into strings of bits or strings of qubits for classical and quantum codes, respectively. Therefore all codewords will be strings of elements from a binary alphabet (in the classical case) or, possibly the superposition of, strings from a quantum binary alphabet which is an orthonormal basis of the Hilbert space (in the quantum case). The extensions to -bits or -qubits can easily be done in both cases.
3.1. Classical Codes and the Kraft Inequality
Let be a finite or countable set equipped with the power set -algebra , and let be a random variable with values in . The set will be referred to as the symbol set that we wish to encode. In the literature, the set is referred to as the set of objects, the message set, or sometimes even the index set. For any set , we will set equal to the set which is the collection of all possible finite strings from , where denotes the empty set (or empty string). Lastly, let be the binary alphabet. A code is a mapping from to , the set of finite strings with letters in the binary alphabet . The range of the code, , is referred to as the codebook and its elements are the codewords. Moreover, for each , we refer to as the codeword of the symbol . For each , we call the length of (denoted by ) the unique integer such that .
The expected length of a code on a symbol set is given by
where is the probability mass function (pmf) of the random variable and the expectation is taken with respect to .
We extend the code by concatenation to obtain the extended code, also called the extension of , . That is to say
and we define . We call the code uniquely decodable whenever its extension is injective; i.e. is uniquely decodable whenever all strings of symbols from are pairwise distinguishable. In lossless coding we are only interested in uniquely decodable codes.
An extremely useful class of uniquely decodable codes are the so-called instantaneous (or prefix-free) codes. A code is said to be prefix-free if no codeword is the prefix of another; i.e. for every distinct pair there is no such that . Prefix-free codes are called instantaneous because the decoder is able to read out each codeword from a string of codewords, instantaneously, as soon as she sees that word appear in a string (without waiting for the entire string).
The Kraft-McMillan Inequality is fundamental in classical data compression.
(Kraft-McMillan Inequality, [12, Theorems 5.2.1 and 5.5.1]) For any uniquely decodable code over a symbol set with cardinality , the codeword lengths must satisfy the inequality
Conversely, given a set of codeword lengths that satisfies this inequality, there exists an instantaneous code with these code lengths.
The Kraft-McMillan Inequality is sometimes referred to only as the Kraft Inequality. This is due to the fact that Kraft was the first to prove the inequality in , although his original result refers only to instantaneous codes. McMillan later extended Kraft’s work to include all uniquely decodable codes in . Furthermore, it is worth noting that the Kraft-McMillan inequality can be extended to a countable set of symbols (see Theorem 5.2.2 and the corollary following Theorem 5.5.1 in ). When including countable sets of symbols, the inequality is referred to as the Extended Kraft-McMillan Inequality.
An immediate corollary to the Kraft-McMillan Inequality is the following:
Given any uniquely decodable code with codeword lengths , there exists an instantaneous code with these same code lengths.
We call a uniquely decodable code optimal whenever the expected length is minimized; i.e. the optimal uniquely decodable code is given by
where the last equality follows from Theorem 3.1. We set the optimal expected length of the random variable . The results for the optimal expected length are summarized in the following:
([12, Theorem 5.4.1]) Let be a random variable with range in the symbol set . Then the optimal expected length of satisfies the inequality
where is the Shannon entropy of , i.e. where is the pmf of .
Well known examples of codes which satisfy the inequality of Theorem 3.4 are the so-called Huffman codes and Shannon-Fano codes.
In the above theorem, we are only interested in the compressability of single codewords. Suppose instead that we wish to compress strings of codewords with code distributions given by a stochastic process . Then, for each , Theorem 3.4 holds for the random vector , giving
For each , we set
to be the optimal expected codeword length per symbol for the first symbols. We can then express the optimal expected codeword length per symbol (over all symbols) in terms of the entropy rate, which is a dynamical entropy for stochastic processes. The entropy rate of a stochastic process is given by
whenever the limit exists. There are many instances when it is known that the above limit exists (e.g. stationary stochastic processes, see [12, Theorem 4.2.1]).
([12, Theorem 5.4.2]) The optimal expected codeword length per symbol for a stochastic process satisfies
Moreover, if is such that the limit defining entropy rate exists (e.g. is a stationary stochastic process), then
In particular, if consists of independent identically distributed (i.i.d.) copies of a random variable , then
This finishes our brief overview of data compression in classical information theory. For a more detailed exposition see [12, Chapter 5].
3.2. Quantum Data Compression
We begin with the description of indeterminate-length quantum codes, whose preliminary investigation began with Schumacher  and Braunstein et. al in , and they were formalized in . We may think of the codes introduced in the previous section as being varying-length codes; the term indeterminate-length is used to draw attention to the fact that a quantum code must allow for superpositions of codewords, including those superpositions containing codewords with different lengths. We will follow mainly the formalisms in  as opposed to the zero-extended forms of . A description of the connection between these two formalisms can be found in .
For any Hilbert space , we will denote by the free Fock space of , where . We will denote the scalar by and refer to it as the empty string. Let be an ensemble of pure states, or simply ensemble, where is the pmf of a random variable and is an element of a -dimensional Hilbert space , for each , such that . The collection will be referred to as the symbol states of the ensemble . An (indeterminate-length) quantum code, , over a quantum binary alphabet , which is an orthonormal basis for , is a linear isometry . The extended quantum code of is the linear mapping given by
The quantum code is said to be uniquely decodable if the extended quantum code is an isometry. Throughout this paper, we will restrict ourselves only to the situation where the range of is a subset of for some ; i.e. there is a finite upper bound on the length of all codewords.
The authors of  allow non-empty strings to map to the empty string. In their paper, the authors send along a classical side channel to give the lengths of the codewords and so that convention is possible. Without the classical side channel (as is the approach in the present paper) allowing non-empty strings to map to the empty string will cause the quantum code to not be uniquely decodable.
Let be an ensemble whose symbol states span a Hilbert space of dimension . Consider a classical uniquely decodable code, , on a symbol set, , with -many symbols. We will construct a corresponding uniquely decodable quantum code, , from by identifying the classical binary alphabet with the quantum binary alphabet and the symbol set, , with any orthonormal basis of ; this construction is given in . Fix an orthonormal basis of and define the quantum code by the equation
It is clear that , where is the length of , and that is an orthonormal set, so that is a linear isometry. Furthermore, since is uniquely decodable, the map defined by the equation
is a linear isometry for each . Since the extended quantum code is given by
we see that is a linear isometry and hence is uniquely decodable. We will refer to quantum codes constructed from classical ones by Equation (10) as classical-quantum encoding schemes (c-q schemes).
Notice that the symbol states of the ensemble are not directly encoded by the ’s unless and there exists a permutation of such that for every . In fact need not belong to for any , but can in general be in a superposition of different lengths. (Hence the term indeterminate-length quantum codes.)
The Kraft-McMillan Inequality (Theorem 3.1) was initially extended to the quantum domain in  and subsequently in  and . Before presenting (a slightly different) Quantum Kraft-McMillan Inequality, we will first introduce the length observable and quantum codes with length eigenstates. The length observable acting on is given by
where is the orthogonal projection onto the subspace of .
We say that a quantum code has length eigenstates if has the form
for some orthonormal basis of and some sequence such that, for each , for some .
Note that the ’s are orthogonal due to being a linear isometry. It is easy to see that every c-q scheme is a quantum code with length eigenstates.
Lastly, for each , we will refer to the elements of the set as the length eigenstates of and we will refer to , where, for each , , as the length eigenvalues of
length eigenvalues of.
The quantum versions of the Kraft-McMillan Inequality proved in [32, Section IIC] and [24, Theorem 3.6] are more general than the same proved in [9, Theorem 1], although the formalisms are quite different in all three. Our version of the quantum Kraft-McMillan Inequality, presented below, is a generalization of [9, Theorem 1], but is not quite in the full generality of [32, Section IIC] (in the forward direction) because we only consider uniquely decodable codes (as opposed to the more general notion called condensable codes considered in ). However, our version does have a converse statement, similar to the classical Kraft-McMillan Inequality, which is missing from the aforementioned quantum versions.
(Quantum Kraft-McMillan Inequality) Any uniquely decodable quantum code with length eigenstates over a binary alphabet must satisfy the inequality
Conversely, if is a linear isometry with length eigenstates satisfying the above inequality, then there exists a c-q scheme with the same number of length eigenstates for each .
For the forward direction we adapt the proof of [32, Subsection II.C.] to our formalism. Let be a uniquely decodable quantum code with length eigenstates of the form
and let be the length eigenvalues of . For each , let
be the collection of length strings consisting of -many codewords and let
be the number of length eigenstates of , for each . Then, by the unique decodability of , each element of has a unique representation as a string of codewords and the elements of are pairwise orthogonal, and hence we have
Set so that . Summing the above inequality over we obtain
Notice that the left-hand side of this inequality is exponential whereas the right-hand side is linear. This implies that the left-hand side is bounded above by 1. Hence we must have that
Notice that the inequality in Equation (13) is simply a restatement of the classical Kraft-McMillan inequality.
Conversely, suppose that is a linear isometry with length eigenstates satisfying the quantum Kraft-McMillan Inequality, and define , and as above. Then
and hence the classical Kraft-McMillan inequality is also valid. Thus, by the converse of the classical Kraft-McMillan Theorem, one can find a classical uniquely decodable code which has exactly -many codewords of length , for each . The c-q scheme constructed from this classical code has the desired properties. ∎
We would like to find a quantum code which minimizes the amount of resources required. Unfortunately there are numerous ways to define the length of a codeword for an indeterminate-length quantum code (e.g. base length , exponential length [9, Definition 6], etc.). Here, we follow [9, Definition 3] and define the length of a codeword , which is a normalized vector in given by for a unique symbol state , as the expectation with respect to the length observable in Equation (11). Explicitly, the length of a codeword will be given by a function , defined as follows:
where denotes the set of length eigenvalues of .
Again we follow  and, for any ensemble , we define the ensemble state of by
If is a quantum code on define the average codeword length with respect to the ensemble by
We denote by the optimal quantum code with length eigenstates for the ensemble if
where the second and third equality follow from Theorem 3.9, and the in the third equality denote the length eigenvalues of . The existence of follows from the existence of in Equation (8) by the backward direction of Theorem 3.9. The optimal average codeword length for the ensemble is given by
It is shown in [9, Theorem 2] that the optimal c-q scheme (and hence optimal quantum code with length eigenstates by the converse of Theorem 3.9) is given by the classical Huffman codes. The bounds on in terms of the von-Neumann entropy follow immediately.
The minimum average codeword length for an ensemble is bounded as follows,
See [9, Theorem 3]. ∎
Next, we wish to consider the optimal average codeword length per symbol for a collection of ensembles , where and probabilities given by the pmf of a stochastic process . We will refer to such collections of ensembles as stochastic ensembles. Note that, by the definitions of a stochastic process, a stochastic ensemble must be compatible in the following sense:
for all . Notice that we allow for the possibility that preparations of the ensemble at each time be dependent upon previous preparations. If the preparations of the ensemble are independent and identically prepared copies of ; i.e. the stochastic process is made up of i.i.d. copies of a random variable , then and , where . For each , let
be the optimal average codeword length per symbol for the first symbols with respect to the ensemble , where is given by Equation (15). Notice that the optimal average codeword length per symbol is defined analogously to the classical case in Equation (9). Then, from Theorem 3.10, we have
In the following section, we will relate the above quantities to the dynamical entropy of a QMC.
4. Optimal Data Compression via Quantum Dynamical Entropy
4.1. An open quantum random walk associated with a stationary Markov ensemble
Consider a Markov process with values in for some and with pmf . Define the associated stochastic ensemble by setting whose symbol states span and whose symbol states span for each . We will refer to a stochastic ensemble governed by a Markov process as the Markov ensemble governed by . Whenever the Markov process is stationary we will refer to the Markov ensemble as being stationary. Recall that a Markov process is stationary if and only if there exists a transition matrix and an initial distribution such that the pmf of is given by , for each and , and is invariant with respect to ; i.e. . In this subsection and the following one we consider only stationary Markov ensembles. Let be a stationary Markov ensemble governed by a stationary Markov process having transition matrix and initial distribution . Setting , so that for each , the following sequence of ensemble states which represent this collection of ensembles is defined:
and for each with ,
Let and , and define the OQRW over by
where , and is any unitary operator on satisfying , for all . It is clear that for each (i.e. Equation (2) is satisfied), and hence is an OQRW.
4.2. A quantum Markov chain representation of the above OQRW
For each we set
Consider the quantum dynamical system , with