Strong and Weak Optimizations in Classical and Quantum Models of Stochastic Processes

08/26/2018 ∙ by Samuel Loomis, et al. ∙ University of California-Davis 0

Among the predictive hidden Markov models that describe a given stochastic process, the ϵ-machine is strongly minimal in that it minimizes every Rényi-based memory measure. Quantum models can be smaller still. In contrast with the ϵ-machine's unique role in the classical setting, however, among the class of processes described by pure-state hidden quantum Markov models, there are those for which there does not exist any strongly minimal model. Quantum memory optimization then depends on which memory measure best matches a given problem circumstance.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

When studying classical stochastic processes, we often seek models and representations of the underlying system that allow us to simulate and predict future dynamics. If the process is memoryful, then models that generate it or predict its future actions must also have memory. Memory, however, comes at some resource cost; both in a practical sense—consider, for instance, the substantial resources required to generate predictions of weather and climate [1, 2]—and in a theoretical sense—seen in analyzing thermodynamic systems such as information engines [3]. It is therefore beneficial to seek out a process’ minimally resource-intensive implementations.

Predicting and simulating classical processes, and monitoring the memory required, led to a generalization of statistical mechanics called computational mechanics [4, 5, 6, 7]

. To date computational mechanics focused on discrete stochastic processes. These are probability measures

over strings of symbols taking values in a finite alphabet . The minimal information processing required to predict the sequence is represented by a type of hidden Markov model called the . The statistical complexity —the memory rate for  to simultaneously generate many copies of a process—is a key quantity and a proposed invariant for measuring the process’ structural complexity.

When simulating classical processes, quantum systems can be constructed that have smaller memory requirements than the  [8, 9]. The -machine is a particular implementation of quantum simulation that has shown advantage in memory rate over a wide range of processes; often the advantage is unbounded [10, 11, 12, 13]. For quantum models, the minimal memory rate has been determined in cases such as the Ising model [11] and the Perturbed Coin Process [14], where the -machine attains the minimum rate. And so, though a given -machine’s can be readily calculated [15], in many cases the absolute minimum is not known.

Properly accounting for memory requires an appropriate formalism for resources themselves. The field of resource theory has recently emerged in quantum information theory as a toolkit for addressing resource consumption in the contexts of entanglement, thermodynamics, and numerous other quantum and classical resources [16]. Its fundamental challenge is to determine when one system, or resource, can be converted to another using a predetermined set of free operations.

Resource theory is closely allied with two other areas of mathematics, namely majorization and lattice theory. Figure 1 depicts their relationships.

Figure 1: Triumvirate of resource theory, majorization, and lattice theory.

On the one hand, majorization is a preorder relation

on positive vectors (typically probability distributions) computed by evaluating a set of inequalities

[17]. If the majorization relations hold between two vectors, then one can be converted to the other using a certain class of operations. Majorization is used in some resource theories to numerically test for convertibility between two resources [18, 19, 20].

Lattice theory, on the other hand, concerns partially ordered sets and their suprema and infima, if they exist [21]. Functions that quantify the practical uses of a resource are monotonic with respect to the partial orders induced by convertibility and majorization. Optimization of practical measures of memory is then related to the problem of finding the extrema of the lattice. Majorization and resource convertibility are both relations that generate lattice-like structures on the set of systems.

Here, we examine the memory costs of classical and quantum models of stochastic processes via majorization. Using lattice-theoretic intuition, we then define the concept of strong optimization, which occurs when a particular model simultaneously optimizes all measures of memory via its extremal position in the lattice. We show that among classical predictive models, the is strongly minimal. Following this, we show that the is strongly maximal to a subset of quantum models but that no strongly minimal quantum model exists in some circumstances. These results constitute initial steps to a resource theory of memoryful information processing.

Ii Majorization and optimization

The majorization of positive vectors provides a qualitative description of how concentrated the quantity of a vector is over its components. For ease of comparison, consider vectors , , whose components all sum to some constant value, which we take to be unity:

and are nonnegative: . For our purposes, we interpret these vectors as probability distributions.

Our introduction to majorization here follows Ref. [17]. The historical definition of majorization is also the most intuitive, starting with the concept of a transfer operation.

[Transfer operation] A transfer operation on a vector selects two indices , such that , and transforms the components in the following way:

where , while leaving all other components equal; for .

Intuitively, these operations reduce concentration, since they act to equalize the disparity between two components, in such a way as to not create greater disparity in the opposite direction. This is the principle of transfers.

Suppose now that we have two vectors and and that there exists a sequence of transfer operations such that . We will say that majorizes ; denoted . The relation defines a preorder on the set of distributions, as it is reflexive and transitive but not necessarily antisymmetric.

There are, in fact, a number of equivalent criteria for majorization. We list three relevant to our development in the following composite theorem.

[Majorization Criteria] Given two vectors and with the same total sum, let their orderings be given by the permuted vectors and such that and the same for . Then the following statements are equivalent:

  1. Hardy-Littlewood-Pólya: For every ,

  2. Principle of transfers: can be transformed to via a sequence of transfer operations;

  3. Schur-Horn:

    There exists a unitary matrix

    such that , where , a uni-stochastic matrix.

The Hardly-Littlewood-Pólya criterion provides a visual representation of majorization in the form of the Lorenz curve. For a distribution , the Lorenz curve is simply the function . See Fig. 2. We can see that so long as the area under is completely contained in the area under .

Figure 2: and are comparable and the first majorizes the second: . (Here we chose and . Tick marks indicate kinks in the Lorenz curve.)

The Lorenz curve can be understood via a social analogy, by examining rhetoric of the form “The top % of the population owns % of the wealth”. Let be a function of in this statement, and we have the Lorenz curve of a wealth distribution. (Majorization, in fact, has its origins in the study of income inequality.)

If neither nor majorizes the other, they are incomparable. (See Fig. 3.)

As noted, majorization is a preorder, since there may exist distinct and such that and . This defines an equivalence relation between distributions. Every preorder can be converted into a partial order by considering equivalence classes .

Figure 3: and are incomparable. (Here we chose and .)

If majorization, in fact, captures important physical properties of the distributions, we should expect that these properties may be quantified. The class of monotones that quantify the preorder of majorization are called Schur-convex and Schur-concave functions.

[Schur-convex (-concave) functions] A function is called Schur-convex (-concave) if implies ().

An important class of Schur-concave functions consists of the Rényi entropies:

In particular, the three limits

Shannon entropy, topological entropy, and min-entropy, respectively—describe important practical features of a distribution. In order, they describe the asymptotic rate at which the outcomes can be accurately conveyed, the single-shot resource requirements for the same task, and the probability of error in guessing the outcome if no information is conveyed at all (or, alternatively, the single-shot rate at which randomness can be extracted from the distribution) [22, 23]. As such, they play a significant role in communication and memory storage.

The example of two incomparable distributions and can be analyzed in terms of the Rényi entropies if we plot and as a function of , as in Fig. 4.

Figure 4: Rényi entropies of the two incomparable distributions and from Fig. 3.

The central question we explore in the following is applying majorization to determine when it is possible to simultaneously optimize all entropy monotones or, alternatively, to determine if each monotone has a unique solution. This leads to defining strong maxima and strong minima.

[Strong maximum (minimum)] Let be a set of probability distributions. If a distribution satisfies (), for all , then is a strong maximum (minimum) of the set .

The extrema names derive from the fact that the strong maximum maximizes the Rényi entropies and the strong minimum minimizes them. One can extend the definitions to the case where , but is the least-upper-bound such that any other satisfying must obey . This case would be called a strong supremum (or in the other direction a strong infimum). However, these constructions may not be unique as is a preorder and not a partial order. However, if we sort by equivalence class, then the strongly maximal (minimal) class is unique if it exists.

In lattice-theoretic terms, the strong maximum is essentially the lattice-theoretic notion of a meet and the strong minimum is a join [21].

One example of strong minimization is found in quantum mechanics. Let be a state and be a maximal diagonalizing measurement. For a given measurement , let be the corresponding probability distribution that comes from measuring with . Then for all maximal projective measurements . (This follows from the unitary matrices that transform from the basis of to that of , and the Schur-Horn lemma.)

Another, recent example is found in Ref. [24], where the set of all distributions -close to under the total variation distance is considered:

This set has a strong minimum, called the steepest distribution , and a strong maximum, called the flattest distribution .

When a strong minimum or maximum does not exist, we refer to the individual extrema of the various monotones as weak extrema.

We close with a technical note on how to compare distributions over different numbers of events. There are generally two standards for such comparisons that depend on application. In the resource theory of informational nonequilibrium [20], one compares distributions over different numbers of events by “squashing” their Lorenz curves so that the -axis ranges from to . Under this comparison, the distribution has more informational nonequilibrium than . In the following, however, we adopt the standard of simply extending the smaller distribution by adding events of zero probability. In this, and are considered equivalent. This choice is driven by our interest in the Rényi entropic costs and not in the overall nonequilibrium. (The latter is more naturally measured by Rényi negentropies , where is the number of events.)

Iii Strong minimality of the

The general task we set ourselves is simulating classical processes.

[Bi-infinite process] A bi-infinite process over an alphabet is a probability measure over the set of all bi-infinite strings , where the past and the future are constructed by concatenating elements of .

Though defined over bi-infinite strings, the measure gives probabilities for seeing finite-length words , defined as . This can be taken as an alternate definition of the process measure.

Here, we focus on finite predictive models.

[Finite predictive model] A finite predictive model is a triplet of hidden states , an output alphabet , and nonnegative transition matrices with and , satisfying the properties:

  1. Irreducibility: is stochastic and irreducible.

  2. Unifilarity: for some conditional probability and deterministic function .

A finite predictive model is a type of hidden Markov model [25], whose dynamic is to transition between states at each timestep while emitting a symbol with probabilities determined by the transition matrices . Unifilarity ensures that, given the model state and symbol , the next state is unique.

Given a finite predictive model , the state transition matrix has a single left-eigenstate

of eigenvalue

, by the Perron-Frobenius theorem, satisfying . We call this state distribution the stationary state. Using it, we define the process generated by as , where and is the vector with all 1’s for its components. describes a stationary process. If we let represent the state-distribution that assigns the state probability , then is the probability of seeing word after starting in state .

Given a model with stationary distribution , we define the model’s Rényi memory as . This includes the topological memory , the statistical memory , and the min-memory . Given a process , we define the Rényi complexity as the minimal memory over all models that generate [4]. These include the topological complexity , the statistical complexity , and the min-complexity .

Among the class of finite predictive models, a particularly distinguished member is the [4]: [Generator ] A generator is a finite predictive model such that for each pair of distinct states , there exists a word such that . In other words, a generator must be irreducible, unifilar, and its states must be probabilistically distinct, so that no pair of distinct states predict the same future.

An important result of computational mechanics is that the generator is unique with respect to the process it generates [26]. This is a combined consequence of the equivalence of the generator definition with another, called the history , which is provably unique [6]. That is, given an , there is no other that generates . A further important result is that the minimizes both the statistical complexity and the topological complexity .

To fix intuitions, consider now several examples of models and their processes. First, consider the Biased Coin Process, a memoryless process in which, at each time step, a coin is flipped with probability of generating a and probability of generating a . Figure 5 displays three models for it. Model (a) is the process’ , and models (b) and (c) are each -state alternative finite predictive models. Notice that in both models (b) and (c), the two states generate equivalent futures.


Figure 5: (a) for a coin flipped with bias . (b) Alternate representation with to be in state and to be in state . (c) Alternate representation with biases to stay in current state and to switch states.

Continuing, Fig. 6

displays two alternative models of the Even-Odd Process. This process produces sequences formed by concatenating strings of an odd number of

s to strings of an even number of s. We see in (a) the process’ . In (b), we see an alternative finite predictive model, and notice that its states and predict the same futures, and so are not probabilistically distinct. We notice that they both play the role of state in the , in terms of the futures they predict.


Figure 6: (a) for Even-Odd Process. (b) Refinement of the Even-Odd Process , where the ’s state has been split into states and .

We can compare these examples using Lorenz curves of the state distributions, as shown in Fig. 7. Here, recall, we adopted the convention of comparing two distributions over a different number of states by extending the smaller system to include zero-probability states. We notice that the state distribution always majorizes the state distribution of the alternative machines.


Figure 7: (a) Lorenz curves for Fig. 5 (a)’s and Fig. 5 (b)’s alternative predictor of the Biased Coin Process. (b) Same comparison for the Even-Odd Process Fig. 6 (a) and alternative predictor Fig. 6 (b).

The key to formalizing this observation is the following lemma.

[State Merging] Let be a finite predictive model that is not an . Then the machine created by merging its probabilistically equivalent states is the of the process generated by .

Let be the equivalence relation if for all . Let consist of the set of equivalence classes generated by this relation. For a given class , consider the transition probabilities associated with each . For each such that , there is a outcome state . Comparing with another state in the same class , we have the set of outcome states . For the future predictions of both states and to be equivalent, they must also be equivalent after seeing the symbol . That is, for all also implies for all . But , and so we have for all .

The upshot of these considerations is that we can define a consistent and unifilar transition dynamic on given by the matrices for any and . It inherits unifilarity from the original model as well as irreducibility. It has probabilistically distinct states because we have already merged all of the probabilistically equivalent states. Therefore, the resulting machine is the of the process generated by .

The state-merging procedure here is an adaptation of the Hopcroft algorithm for minimization of deterministic (nonprobabilistic) finite automata, which is itself an implementation of the Nerode equivalence relation, [27]. It has been applied previously to analyze synchronization in [28].

Using Lemma 7, we can prove the main result of this section:

[Strong Minimality of ] Let be the of process and be any other finite generating machine. Let the stationary distributions be and , respectively. Then .

By Lemma 7, the states of the are formed by merging equivalence classes on the finite predictive model . Since the machines are otherwise equivalent, the stationary probability is simply the sum of the stationary probabilities for each , given by . That is:

One can then construct from by a series of transfer operations in which probability is shifted out of the state into new states . Since the two states are related by a series of transfer operations, .

It immediately follows from this that not only does the minimize the statistical complexity and the topological complexity , but it also minimizes every other Rényi complexity as well.

The uniqueness of the is extremely important in formulating this result. This property of  follows from the understanding of predictive models as partitions of the past and of the  as corresponding to the coarsest graining of these predictive partitions [6]. Other paradigms for modeling will not necessarily have this underlying structure and so may not have strongly minimal solutions. In the following, we see this is, in fact, the case for pure-state quantum machines.

Iv Strong quantum advantage

A pure-state quantum model can be generalized from the classical case by replacing the classical states with quantum-mechanical pure states and the symbol-labeled transition matrices with symbol-labeled Kraus operators .

[Pure-state quantum model] A pure-state quantum model is a quintuplet of a Hilbert space , an output alphabet , pure states corresponding to some set of state labels , and nonnegative Kraus operators with satisfying the properties:

  1. Completeness: The Kraus operators satisfy .

  2. Unifilarity: for some deterministic function .

This is a particular kind of hidden quantum Markov model [29] in which we assume the dynamics can be described by the evolution of pure states. This is practically analogous to the assumption of unifilarity in the classical predictive setting.

It is not necessarily the case that the states form an orthonormal basis; rather, nonorthonormality is the intended advantage [8, 9]. Overlap between the states allows for a smaller von Neumann entropy for the stationary state of the process. We formalize this shortly.

It is assumed that the Kraus operators have a unique stationary state . One way to compute it is to note that taking and the function determines a finite predictive model as defined above. The model’s stationary state is related to the stationary state of the quantum model via:

The process generated by a pure-state quantum model has the word distribution, for words :

The eigenvalues of the stationary state form a distribution . The Rényi entropies of these distributions form the von Neumann-Rényi entropies of the states:

We noted previously that for a given state these are strongly minimal over the entropies of all projective, maximal measurements on the state. Given a model with stationary state , we may simply write as the Rényi memory of the model. Important limits, as before, are the topological memory , the statistical memory , and the min-memory , which represent physical limitations on memory storage for the generator.

To properly compare pure-state quantum models and classical predictive models, we define the classical equivalent model of a pure-state quantum model.

[Classical equivalent model] Let be a pure-state quantum model, with probabilities and deterministic function and , respectively. Its classical equivalent is the classical finite predictive model with state set , alphabet and symbol-based transition matrices generated by the state-to-symbol probabilities and deterministic function .

We now prove that a finite classical predictive model strongly maximizes all pure-state quantum models of which it is the classical equivalent.

[Strong quantum advantage] Let be a pure-state quantum model with stationary state , and let be the classical equivalent model with stationary state (with ). Let and . (We have : if not, then we can take a smaller Hilbert space that spans the states.) Let be an -dimensional vector where the first components are the eigenvalues of and the remaining elements are . Then .

We know that:

where . However, we can also write in the eigenbasis:

where . Then the two sets of vectors can be related via:

where is a matrix comprised of rows of orthonormal -dimensional vectors [30]. Now, we have:

Note that is not square, but since we have taken for , we can simply extend into a square unitary matrix by filling out the bottom

rows with more orthonormal vectors. This leaves the equation unchanged. We can then write:

Then by Theorem 1, .

for all .

for all follows from the definitions of the von Neumann-Rényi entropies and the Schur-concavity of .

Many alternative pure-state quantum models may describe the same process. The “first mark”, so to speak, for quantum models is the -machine [9, 15], which directly embeds the dynamics of the  into a quantum system while already leveraging the memory advantage due to state overlap.

[-Machine] Given an  , where for some deterministic function , construct the corresponding -machine in the following way:

  1. The states are built to satisfy the recursive relation:

  2. is the space spanned by the states .

  3. The Kraus operators are determined by the relations:

One can check that this satisfies the completeness relations and has the correct probability dynamics for the process generated by the .

That the -machine offers statistical memory advantage with respect to the was previously shown in [31] and with respect to topological memory in [14]. Theorem IV and Corollary IV imply these as well as advantage with respect to other Rényi measures of memory.

V Weak quantum minimality

An open problem is to determine the minimal quantum pure-state representation of a given classical process. This problem is solved in some specific instances such as the Ising model [11] and the Perturbed Coin Process [14]. In these cases it is known to be the -machine. We denote the smallest value of the Rényi entropy of the stationary state as , called the quantum Rényi complexities, including the limits, the quantum topological complexity , the quantum min-complexity , and the quantum statistical complexity . If a strongly minimal quantum pure-state model exists, these complexities are all attained by the same pure-state model. One of our primary results in this section is that for some processes, this does not occur.

We start by examining two examples. The first, the MBW Process introduced in Ref. [29], demonstrates a machine whose -machine is not minimal in the von Neumann complexity. Consider the process generated by the 4-state MBW machine shown in Fig. 8.

Figure 8:

The 4-state MBW Process as a Markov chain (which is the ).

This process’ HMM is simply a Markov chain, and its representation in Fig. 8 is its . Denote this classical representation by . If we take as an orthonormal basis of a Hilbert space, we can construct the -machine with the states:

Since it is a Markov chain, we can write the Kraus operators as , where . This is a special case of the construction used in Ref. [13]. For -machines of Markov chains, then, the dual basis is just . We denote the -machine model of the 4-state MBW Process as .

Let’s examine the majorization between and the Markov model via the Lorenz curves of , the eigenvalues of , and the stationary state of the Markov chain. See Fig. 9.

Figure 9: Lorenz curves for the 4-state MBW   and the associated -machine .

It turns out that there is a smaller quantum model embedded in two dimensions, with states:

In this case, derives the -machine. This gives the proper transition probabilities for the 4-state MBW model. This dimensionally smaller model we denote . Figure 10 compares the Lorenz curve of its stationary eigenvalues to those of . One sees that it does not majorize the -machine, but it does have a lower statistical memory: and bit. (On the other hand, the -machine has a smaller min-memory, with and .)

Figure 10: Lorenz curves for the 4-state MBW -machine and a dimensionally smaller model .

Now consider something in the opposite direction. Consider the 3-state MBW model, denoted and displayed in Fig. 11. This is a generalization of the previous example to three states instead of four. We will compute the corresponding -machine and show that there also exists a dimensionally smaller representation . In this case, however, is not smaller in its statistical memory.

The -machine of this Markov chain is given by the states:

and Kraus operators defined similarly to before. We can examine the majorization between the -machine and the Markov model by plotting the Lorenz curves of , the eigenvalues of , and the stationary state of the Markov chain, shown in Fig. 12.

Figure 11: 3-state MBW Process as a Markov chain (which is the process’ ).
Figure 12: Lorenz curves for the 3-state MBW   and the associated -machine .

The lower-dimensional model is given by the states:

with . This gives the proper transition probabilities for the 3-state MBW model. Figure 13 compares the Lorenz curve of its stationary eigenvalues to that of . We see that it does not majorize . And, this time, this is directly manifested by the fact that the smaller-dimension model has a larger entropy: and bit.

Figure 13: Lorenz curves for the 3-state MBW -machine, and a dimensionally smaller model .

After seeing the ’s strong minimality with respect to other classical models and its strong maximality with respect to quantum models, it is certainly tempting to conjecture that a strongly minimal quantum model exists. However, the examples we just explored cast serious doubt. None of the examples covered above are strong minima. One way to prove that no strong minimum exists for, say, the 3-state MBW process requires showing that there does not exist any other quantum model in dimensions that generates the process. This would imply that no other model can majorize . And, since this model is not strongly minimal, no strongly minimal solution can exist.

Appendix A proves exactly this—thus, demonstrating a counterexample to the strong minimality of quantum models.

[Weak Minimality of ] The quantum model weakly minimizes topological complexity for all quantum generators of the 3-state MBW Process; consequently, the 3-state MBW Process has no strongly minimal model.

Figure 14: Proposed majorization saddle structure of model-space: The (labeled ) is located at a saddle-point with respect to majorization, where classical deviations (state-splitting) move up the lattice and quantum deviations (utilizing state overlap) move down the lattice.

Vi Concluding remarks

Majorizing states provides a means to compare a process’ alternative models in both the classical and quantum regimes. Majorization implies the simultaneous minimization of a large host of functions. As a result we showed that:

  1. The  majorizes all classical predictive models of the same process, and so simultaneously minimizes many different measures of memory cost.

  2. The -machine, and indeed any quantum realization of the , always majorizes the , and so simultaneously improves on all the measures of memory cost.

  3. For at least one process, there does not exist any quantum pure-state model that majorizes all quantum pure-state models of that process. Thus, while an -machine may be improved upon by different possible quantum models, there is not a unique one quantum model that is unambiguously the “best” choice.

Imagining the as an invariant “saddle-point” in the majorization structure of model-space, Fig. 14 depicts the implied geometry. That is, we see that despite its nonminimality among all models, the still occupies a topologically important position in model-space—one that is invariant to one’s choice of memory measure. However, no similar model plays the topologically minimal role for quantum pure-state models.

The quantum statistical complexity has been offered up as an alternative quantum measure of structural complexity—a rival of the statistical complexity [32]. One implication of our results here is that the nature of this quantum minimum is fundamentally different than that of . This observation should help further explorations into techniques required to compute and the physical circumstances in which it is most relevant. That the physical meaning of involves generating an asymptotically large number of realizations of a process may imply that it cannot be accurately computed by only considering machines that generate a single realization. This is in contrast to which, being strongly minimized, must be attainable in the single-shot regime along with measures like and .

In this way, the quantum realm again appears ambiguous. Ambiguity in structural complexity has been previously observed in the sense that there exist pairs of processes, and , such that but [33]. The classical and quantum paradigms for modeling can disagree on simplicity—there is no universal Ockham’s Razor. How this result relates to strong versus weak optimization deserves further investigation.

The methods and results here should also be extended to analyze classical generative models which, in many ways, bear resemblances in their functionality to the quantum models [34, 35, 36]. These drop the requirement of unifilarity, similar to how the quantum models relax the notion of orthogonality. Important questions to pursue in this vein are whether generative models are strongly maximized by the and whether they have their own strong minimum or, like the quantum models, only weak minima in different contexts.

To close, we only explored finite-state, discrete-time processes. Processes with infinite memory [37] and continuous generation [38, 39] are also common in nature. Applying our results to understand these requires further mathematical development.


The authors thank Fabio Anza, John Mahoney, Cina Aghamohammadi, and Ryan James for helpful discussions. As a faculty member, JPC thanks the Santa Fe Institute and the Telluride Science Research Center for their hospitality during visits. This material is based upon work supported by, or in part by, John Templeton Foundation grant 52095, Foundational Questions Institute grant FQXi-RFP-1609, the U.S. Army Research Laboratory and the U. S. Army Research Office under contract W911NF-13-1-0390 and grant W911NF-18-1-0028, and via Intel Corporation support of CSC as an Intel Parallel Computing Center.


  • [1] E. N. Lorenz. Deterministic nonperiodic flow. J. Atmos. Sci., 20:130, 1963.
  • [2] E. N. Lorenz. The problem of deducing the climate from the governing equations. Tellus, XVI:1, 1964.
  • [3] A. B. Boyd, D. Mandal, and J. P. Crutchfield. Identifying functional thermodynamics in autonomous Maxwellian ratchets. New J. Physics, 18:023049, 2016.
  • [4] J. P. Crutchfield and K. Young. Inferring statistical complexity. Phys. Rev. Let., 63:105–108, 1989.
  • [5] J. P. Crutchfield. The calculi of emergence: Computation, dynamics, and induction. Physica D, 75:11–54, 1994.
  • [6] C. R. Shalizi and J. P. Crutchfield. Computational mechanics: Pattern and prediction, structure and simplicity. J. Stat. Phys., 104:817–879, 2001.
  • [7] J. P. Crutchfield. Between order and chaos. Nature Physics, 8:17–24, 2012.
  • [8] M. Gu, K. Wiesner, E. Rieper, and V. Vedral. Quantum mechanics can reduce the complexity of classical models. Nature Comm., 3(762), 2012.
  • [9] J. R. Mahoney, C. Aghamohammadi, and J. P. Crutchfield. Occam’s quantum strop: Synchronizing and compressing classical cryptic processes via a quantum channel. Scientific Reports, 6:20495, 2016.
  • [10] C. Aghamohammadi, J. R. Mahoney, and J. P. Crutchfield. Extreme quantum advantage when simulating classical systems with long-range interaction. Scientific Reports, 7(6735), 2017.
  • [11] W. Y. Suen, J. Thompson, A. J. P. Garner, V. Vedral, and M. Gu. The classical-quantum divergence of complexity in modelling spin chains. Quantum, 1:25, 2017.
  • [12] A. J. P. Garner, Q. Liu, J. Thompson, V. Vedral, and M. Gu. Provably unbounded memory advantage in stochastic simulation using quantum mechanics. New J. Physics, 19:103009, 2017.
  • [13] C. Aghamohammadi, S. P. Loomis, J. R. Mahoney, and J. P. Crutchfield. Extreme quantum memory advantage for rare-event sampling. Phys. Rev. X, 8:011025, 2018.
  • [14] J. Thompson, A. J. P. Garner, J. R. Mahoney, J. P. Crutchfield, V. Vedral, and M. Gu. Causal asymmetry in a quantum world. Phys. Rev. X, 8:031013, 2018.
  • [15] P. M. Riechers, J. R. Mahoney, C. Aghamohammadi, and J. P. Crutchfield. Minimized state-complexity of quantum-encoded cryptic processes. Phys. Rev. A, 93(5):052317, 2016.
  • [16] B. Coecke, T. Fritz, and R. W. Spekkens. A mathematical theory of resources. Info. Comput., 250:59–86, 2016.
  • [17] A. W. Marshall, I. Olkin, and B. C. Arnold. Inequalities: Theory of Majorization and Its Applications. Springer, New York, NY, 3 edition, 2011.
  • [18] M. A. Nielsen. Conditions for a class of entanglement transformations. Phys. Rev. Lett., 83(436), 1999.
  • [19] M. Horodecki and J. Oppenheim. Fundamental limitations for quantum and nanoscale thermodynamics. Nature Comm., 4(2059), 2013.
  • [20] G. Gour, M. P. Müller, V. Narasimhachar, R. W. Spekkens, and N. Y. Halpern. The resource theory of informational nonequilibrium in thermodynamics. Phys. Rep., 583:1–58, 2015.
  • [21] G. Grätzer. Lattice Theory: Foundation. Springer, Basel, 2010.
  • [22] R. Renner and S. Wolf. Smooth Rényi entropy and applications. In IEEE Information Theory Society, editor, 2004 IEEE Intl. Symp. Info. Th.: Proceedings, page 232, Piscataway, N.J., 2004. IEEE.
  • [23] M. Tomamichel. A Framework for Non-Asymptotic Quantum Information Theory. PhD thesis, ETH Zurich, Zurich, 2012.
  • [24] M. Horodecki, J. Oppenheim, and C. Sparaciari. Extremal distributions under approximate majorization. J. Phys. A: Math. Theor., 51(305301), 2018.
  • [25] D. R. Upper. Theory and Algorithms for Hidden Markov Models and Generalized Hidden Markov Models. PhD thesis, University of California, Berkeley, 1997. Published by University Microfilms Intl, Ann Arbor, Michigan.
  • [26] N. F. Travers and J. P. Crutchfield. Equivalence of history and generator -machines. [math.PR].
  • [27] J. Hopcroft. An algorithm for minimizing states in a finite automaton. In A. Paz Z. Kohavi, editor, Theory of Machines and Computations, pages 189–196, New York, 1971. Academic Press.
  • [28] N. F. Travers and J. P. Crutchfield. Exact synchronization for finite-state sources. J. Stat. Physics, 145:1181–1201, 2011.
  • [29] A. Monras, A. Beige, and K. Wiesner. Hidden quantum Markov models and non-adaptive read-out of many-body states. Appl. Math. Comput. Sci., 3:93, 2011.
  • [30] L. P. Hughston, R. Jozsa, and W. K. Wootters. A complete classification of quantum ensembles having a given density matrix. Phys. Lett. A, 183:12–18, 1993.
  • [31] J. R. Mahoney, C. Aghamohammadi, and J. P. Crutchfield. Occam’s quantum strop: Synchronizing and compressing classical cryptic processes via a quantum channel. Scientific Reports, 6(20495), 2016.
  • [32] R. Tan, D. R. Terno, J. Thompson, V. Vedral, and M. Gu. Towards quantifying complexity with quantum mechanics. Eur. J. Phys. Plus, 129:191, 2014.
  • [33] C. Aghamohammadi, J. R. Mahoney, and J. P. Crutchfield. The ambiguity of simplicity in quantum and classical simulation. Phys. Lett. A, 381(14):1223–1227, 2017.
  • [34] W. Löhr and N. Ay. Non-sufficient memories that are sufficient for prediction. In J. Zhou, editor, Complex Sciences 2009, volume 4 of Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, pages 265–276. Springer, New York, 2009.
  • [35] W. Löhr and N. Ay. On the generative nature of prediction. Adv. Complex Sys., 12(02):169–194, 2009.
  • [36] J. B. Ruebeck, R. G. James, J. R. Mahoney, and J. P. Crutchfield. Prediction and generation of binary markov processes: Can a finite-state fox catch a markov mouse? Chaos, 28(013109), 2018.
  • [37] J. P. Crutchfield and S. Marzen. Signatures of infinity: Nonergodicity and resource scaling in prediction, complexity and learning. Phys. Rev. E, 91(050106), 2015.
  • [38] J. P. Crutchfield and S. Marzen. Structure and randomness of continuous-time, discrete-event processes. J. Stat. Phys., 169(2):303–315, 2017.
  • [39] T. J. Elliot, A. J. P. Garner, and M. Gu. Quantum self-assembly of causal architecture for memory-efficient tracking of complex temporal and symbolic dynamics.

Appendix A Appendix: Weak Minimality of

Here, we prove that is the unique 2D representation of the 3-state MBW process. We show this by considering the entire class of 2D models and applying the completeness constraint.

We note that a pure-state quantum model of the -state MBW process must have three states , , and , along with three dual states , , and such that:


We list the available geometric symmetries that leave the final stationary state unchanged:

  1. Phase transformation on each state, ;

  2. Phase transformation on each dual state, ; and

  3. Unitary transformation and .

From these symmetries we can fix gauge in the following ways:

  1. Set to be real and positive for all .

  2. Set .

  3. Set and set to be real and positive.

These gauge fixings allow us to write:

for , and and a phase .

That these states are embedded in a 2D Hilbert space means there must exist some linear consistency conditions. For some triple of numbers we can write:

Up to a constant, we use our parameters to choose:

Consistency requires that this relationship between vectors is preserved by the Kraus operator dynamic. Consider the matrix . The vector must be a null vector of ; i.e. . This first requires that be degenerate. One way to enforce this to check that the characteristic polynomial has an overall factor of . For simplicity, we compute the characteristic polynomial of :

To have an overall factor of , we need:

Typically, there will be several ways to choose phases to cancel out vectors, but in this case since the sum of the magnitudes of the complex terms is 8, the only way to cancel is at the extreme point where , , and and:

To recapitulate the results so far, has the form:

We now need to enforce that . We have the three equations: