I Introduction
When studying classical stochastic processes, we often seek models and representations of the underlying system that allow us to simulate and predict future dynamics. If the process is memoryful, then models that generate it or predict its future actions must also have memory. Memory, however, comes at some resource cost; both in a practical sense—consider, for instance, the substantial resources required to generate predictions of weather and climate [1, 2]—and in a theoretical sense—seen in analyzing thermodynamic systems such as information engines [3]. It is therefore beneficial to seek out a process’ minimally resourceintensive implementations.
Predicting and simulating classical processes, and monitoring the memory required, led to a generalization of statistical mechanics called computational mechanics [4, 5, 6, 7]
. To date computational mechanics focused on discrete stochastic processes. These are probability measures
over strings of symbols taking values in a finite alphabet . The minimal information processing required to predict the sequence is represented by a type of hidden Markov model called the . The statistical complexity —the memory rate for to simultaneously generate many copies of a process—is a key quantity and a proposed invariant for measuring the process’ structural complexity.When simulating classical processes, quantum systems can be constructed that have smaller memory requirements than the [8, 9]. The machine is a particular implementation of quantum simulation that has shown advantage in memory rate over a wide range of processes; often the advantage is unbounded [10, 11, 12, 13]. For quantum models, the minimal memory rate has been determined in cases such as the Ising model [11] and the Perturbed Coin Process [14], where the machine attains the minimum rate. And so, though a given machine’s can be readily calculated [15], in many cases the absolute minimum is not known.
Properly accounting for memory requires an appropriate formalism for resources themselves. The field of resource theory has recently emerged in quantum information theory as a toolkit for addressing resource consumption in the contexts of entanglement, thermodynamics, and numerous other quantum and classical resources [16]. Its fundamental challenge is to determine when one system, or resource, can be converted to another using a predetermined set of free operations.
Resource theory is closely allied with two other areas of mathematics, namely majorization and lattice theory. Figure 1 depicts their relationships.
On the one hand, majorization is a preorder relation
on positive vectors (typically probability distributions) computed by evaluating a set of inequalities
[17]. If the majorization relations hold between two vectors, then one can be converted to the other using a certain class of operations. Majorization is used in some resource theories to numerically test for convertibility between two resources [18, 19, 20].Lattice theory, on the other hand, concerns partially ordered sets and their suprema and infima, if they exist [21]. Functions that quantify the practical uses of a resource are monotonic with respect to the partial orders induced by convertibility and majorization. Optimization of practical measures of memory is then related to the problem of finding the extrema of the lattice. Majorization and resource convertibility are both relations that generate latticelike structures on the set of systems.
Here, we examine the memory costs of classical and quantum models of stochastic processes via majorization. Using latticetheoretic intuition, we then define the concept of strong optimization, which occurs when a particular model simultaneously optimizes all measures of memory via its extremal position in the lattice. We show that among classical predictive models, the is strongly minimal. Following this, we show that the is strongly maximal to a subset of quantum models but that no strongly minimal quantum model exists in some circumstances. These results constitute initial steps to a resource theory of memoryful information processing.
Ii Majorization and optimization
The majorization of positive vectors provides a qualitative description of how concentrated the quantity of a vector is over its components. For ease of comparison, consider vectors , , whose components all sum to some constant value, which we take to be unity:
and are nonnegative: . For our purposes, we interpret these vectors as probability distributions.
Our introduction to majorization here follows Ref. [17]. The historical definition of majorization is also the most intuitive, starting with the concept of a transfer operation.
[Transfer operation] A transfer operation on a vector selects two indices , such that , and transforms the components in the following way:
where , while leaving all other components equal; for .
Intuitively, these operations reduce concentration, since they act to equalize the disparity between two components, in such a way as to not create greater disparity in the opposite direction. This is the principle of transfers.
Suppose now that we have two vectors and and that there exists a sequence of transfer operations such that . We will say that majorizes ; denoted . The relation defines a preorder on the set of distributions, as it is reflexive and transitive but not necessarily antisymmetric.
There are, in fact, a number of equivalent criteria for majorization. We list three relevant to our development in the following composite theorem.
[Majorization Criteria] Given two vectors and with the same total sum, let their orderings be given by the permuted vectors and such that and the same for . Then the following statements are equivalent:

HardyLittlewoodPólya: For every ,

Principle of transfers: can be transformed to via a sequence of transfer operations;
The HardlyLittlewoodPólya criterion provides a visual representation of majorization in the form of the Lorenz curve. For a distribution , the Lorenz curve is simply the function . See Fig. 2. We can see that so long as the area under is completely contained in the area under .
The Lorenz curve can be understood via a social analogy, by examining rhetoric of the form “The top % of the population owns % of the wealth”. Let be a function of in this statement, and we have the Lorenz curve of a wealth distribution. (Majorization, in fact, has its origins in the study of income inequality.)
If neither nor majorizes the other, they are incomparable. (See Fig. 3.)
As noted, majorization is a preorder, since there may exist distinct and such that and . This defines an equivalence relation between distributions. Every preorder can be converted into a partial order by considering equivalence classes .
If majorization, in fact, captures important physical properties of the distributions, we should expect that these properties may be quantified. The class of monotones that quantify the preorder of majorization are called Schurconvex and Schurconcave functions.
[Schurconvex (concave) functions] A function is called Schurconvex (concave) if implies ().
An important class of Schurconcave functions consists of the Rényi entropies:
In particular, the three limits
—Shannon entropy, topological entropy, and minentropy, respectively—describe important practical features of a distribution. In order, they describe the asymptotic rate at which the outcomes can be accurately conveyed, the singleshot resource requirements for the same task, and the probability of error in guessing the outcome if no information is conveyed at all (or, alternatively, the singleshot rate at which randomness can be extracted from the distribution) [22, 23]. As such, they play a significant role in communication and memory storage.
The example of two incomparable distributions and can be analyzed in terms of the Rényi entropies if we plot and as a function of , as in Fig. 4.
The central question we explore in the following is applying majorization to determine when it is possible to simultaneously optimize all entropy monotones or, alternatively, to determine if each monotone has a unique solution. This leads to defining strong maxima and strong minima.
[Strong maximum (minimum)] Let be a set of probability distributions. If a distribution satisfies (), for all , then is a strong maximum (minimum) of the set .
The extrema names derive from the fact that the strong maximum maximizes the Rényi entropies and the strong minimum minimizes them. One can extend the definitions to the case where , but is the leastupperbound such that any other satisfying must obey . This case would be called a strong supremum (or in the other direction a strong infimum). However, these constructions may not be unique as is a preorder and not a partial order. However, if we sort by equivalence class, then the strongly maximal (minimal) class is unique if it exists.
In latticetheoretic terms, the strong maximum is essentially the latticetheoretic notion of a meet and the strong minimum is a join [21].
One example of strong minimization is found in quantum mechanics. Let be a state and be a maximal diagonalizing measurement. For a given measurement , let be the corresponding probability distribution that comes from measuring with . Then for all maximal projective measurements . (This follows from the unitary matrices that transform from the basis of to that of , and the SchurHorn lemma.)
Another, recent example is found in Ref. [24], where the set of all distributions close to under the total variation distance is considered:
This set has a strong minimum, called the steepest distribution , and a strong maximum, called the flattest distribution .
When a strong minimum or maximum does not exist, we refer to the individual extrema of the various monotones as weak extrema.
We close with a technical note on how to compare distributions over different numbers of events. There are generally two standards for such comparisons that depend on application. In the resource theory of informational nonequilibrium [20], one compares distributions over different numbers of events by “squashing” their Lorenz curves so that the axis ranges from to . Under this comparison, the distribution has more informational nonequilibrium than . In the following, however, we adopt the standard of simply extending the smaller distribution by adding events of zero probability. In this, and are considered equivalent. This choice is driven by our interest in the Rényi entropic costs and not in the overall nonequilibrium. (The latter is more naturally measured by Rényi negentropies , where is the number of events.)
Iii Strong minimality of the
The general task we set ourselves is simulating classical processes.
[Biinfinite process] A biinfinite process over an alphabet is a probability measure over the set of all biinfinite strings , where the past and the future are constructed by concatenating elements of .
Though defined over biinfinite strings, the measure gives probabilities for seeing finitelength words , defined as . This can be taken as an alternate definition of the process measure.
Here, we focus on finite predictive models.
[Finite predictive model] A finite predictive model is a triplet of hidden states , an output alphabet , and nonnegative transition matrices with and , satisfying the properties:

Irreducibility: is stochastic and irreducible.

Unifilarity: for some conditional probability and deterministic function .
A finite predictive model is a type of hidden Markov model [25], whose dynamic is to transition between states at each timestep while emitting a symbol with probabilities determined by the transition matrices . Unifilarity ensures that, given the model state and symbol , the next state is unique.
Given a finite predictive model , the state transition matrix has a single lefteigenstate
of eigenvalue
, by the PerronFrobenius theorem, satisfying . We call this state distribution the stationary state. Using it, we define the process generated by as , where and is the vector with all 1’s for its components. describes a stationary process. If we let represent the statedistribution that assigns the state probability , then is the probability of seeing word after starting in state .Given a model with stationary distribution , we define the model’s Rényi memory as . This includes the topological memory , the statistical memory , and the minmemory . Given a process , we define the Rényi complexity as the minimal memory over all models that generate [4]. These include the topological complexity , the statistical complexity , and the mincomplexity .
Among the class of finite predictive models, a particularly distinguished member is the [4]: [Generator ] A generator is a finite predictive model such that for each pair of distinct states , there exists a word such that . In other words, a generator must be irreducible, unifilar, and its states must be probabilistically distinct, so that no pair of distinct states predict the same future.
An important result of computational mechanics is that the generator is unique with respect to the process it generates [26]. This is a combined consequence of the equivalence of the generator definition with another, called the history , which is provably unique [6]. That is, given an , there is no other that generates . A further important result is that the minimizes both the statistical complexity and the topological complexity .
To fix intuitions, consider now several examples of models and their processes. First, consider the Biased Coin Process, a memoryless process in which, at each time step, a coin is flipped with probability of generating a and probability of generating a . Figure 5 displays three models for it. Model (a) is the process’ , and models (b) and (c) are each state alternative finite predictive models. Notice that in both models (b) and (c), the two states generate equivalent futures.
Continuing, Fig. 6
displays two alternative models of the EvenOdd Process. This process produces sequences formed by concatenating strings of an odd number of
s to strings of an even number of s. We see in (a) the process’ . In (b), we see an alternative finite predictive model, and notice that its states and predict the same futures, and so are not probabilistically distinct. We notice that they both play the role of state in the , in terms of the futures they predict.We can compare these examples using Lorenz curves of the state distributions, as shown in Fig. 7. Here, recall, we adopted the convention of comparing two distributions over a different number of states by extending the smaller system to include zeroprobability states. We notice that the state distribution always majorizes the state distribution of the alternative machines.
The key to formalizing this observation is the following lemma.
[State Merging] Let be a finite predictive model that is not an . Then the machine created by merging its probabilistically equivalent states is the of the process generated by .
Let be the equivalence relation if for all . Let consist of the set of equivalence classes generated by this relation. For a given class , consider the transition probabilities associated with each . For each such that , there is a outcome state . Comparing with another state in the same class , we have the set of outcome states . For the future predictions of both states and to be equivalent, they must also be equivalent after seeing the symbol . That is, for all also implies for all . But , and so we have for all .
The upshot of these considerations is that we can define a consistent and unifilar transition dynamic on given by the matrices for any and . It inherits unifilarity from the original model as well as irreducibility. It has probabilistically distinct states because we have already merged all of the probabilistically equivalent states. Therefore, the resulting machine is the of the process generated by .
The statemerging procedure here is an adaptation of the Hopcroft algorithm for minimization of deterministic (nonprobabilistic) finite automata, which is itself an implementation of the Nerode equivalence relation, [27]. It has been applied previously to analyze synchronization in [28].
Using Lemma 7, we can prove the main result of this section:
[Strong Minimality of ] Let be the of process and be any other finite generating machine. Let the stationary distributions be and , respectively. Then .
By Lemma 7, the states of the are formed by merging equivalence classes on the finite predictive model . Since the machines are otherwise equivalent, the stationary probability is simply the sum of the stationary probabilities for each , given by . That is:
One can then construct from by a series of transfer operations in which probability is shifted out of the state into new states . Since the two states are related by a series of transfer operations, .
It immediately follows from this that not only does the minimize the statistical complexity and the topological complexity , but it also minimizes every other Rényi complexity as well.
The uniqueness of the is extremely important in formulating this result. This property of follows from the understanding of predictive models as partitions of the past and of the as corresponding to the coarsest graining of these predictive partitions [6]. Other paradigms for modeling will not necessarily have this underlying structure and so may not have strongly minimal solutions. In the following, we see this is, in fact, the case for purestate quantum machines.
Iv Strong quantum advantage
A purestate quantum model can be generalized from the classical case by replacing the classical states with quantummechanical pure states and the symbollabeled transition matrices with symbollabeled Kraus operators .
[Purestate quantum model] A purestate quantum model is a quintuplet of a Hilbert space , an output alphabet , pure states corresponding to some set of state labels , and nonnegative Kraus operators with satisfying the properties:

Completeness: The Kraus operators satisfy .

Unifilarity: for some deterministic function .
This is a particular kind of hidden quantum Markov model [29] in which we assume the dynamics can be described by the evolution of pure states. This is practically analogous to the assumption of unifilarity in the classical predictive setting.
It is not necessarily the case that the states form an orthonormal basis; rather, nonorthonormality is the intended advantage [8, 9]. Overlap between the states allows for a smaller von Neumann entropy for the stationary state of the process. We formalize this shortly.
It is assumed that the Kraus operators have a unique stationary state . One way to compute it is to note that taking and the function determines a finite predictive model as defined above. The model’s stationary state is related to the stationary state of the quantum model via:
The process generated by a purestate quantum model has the word distribution, for words :
The eigenvalues of the stationary state form a distribution . The Rényi entropies of these distributions form the von NeumannRényi entropies of the states:
We noted previously that for a given state these are strongly minimal over the entropies of all projective, maximal measurements on the state. Given a model with stationary state , we may simply write as the Rényi memory of the model. Important limits, as before, are the topological memory , the statistical memory , and the minmemory , which represent physical limitations on memory storage for the generator.
To properly compare purestate quantum models and classical predictive models, we define the classical equivalent model of a purestate quantum model.
[Classical equivalent model] Let be a purestate quantum model, with probabilities and deterministic function and , respectively. Its classical equivalent is the classical finite predictive model with state set , alphabet and symbolbased transition matrices generated by the statetosymbol probabilities and deterministic function .
We now prove that a finite classical predictive model strongly maximizes all purestate quantum models of which it is the classical equivalent.
[Strong quantum advantage] Let be a purestate quantum model with stationary state , and let be the classical equivalent model with stationary state (with ). Let and . (We have : if not, then we can take a smaller Hilbert space that spans the states.) Let be an dimensional vector where the first components are the eigenvalues of and the remaining elements are . Then .
We know that:
where . However, we can also write in the eigenbasis:
where . Then the two sets of vectors can be related via:
where is a matrix comprised of rows of orthonormal dimensional vectors [30]. Now, we have:
Note that is not square, but since we have taken for , we can simply extend into a square unitary matrix by filling out the bottom
rows with more orthonormal vectors. This leaves the equation unchanged. We can then write:
Then by Theorem 1, .
for all .
for all follows from the definitions of the von NeumannRényi entropies and the Schurconcavity of .
Many alternative purestate quantum models may describe the same process. The “first mark”, so to speak, for quantum models is the machine [9, 15], which directly embeds the dynamics of the into a quantum system while already leveraging the memory advantage due to state overlap.
[Machine] Given an , where for some deterministic function , construct the corresponding machine in the following way:

The states are built to satisfy the recursive relation:

is the space spanned by the states .

The Kraus operators are determined by the relations:
One can check that this satisfies the completeness relations and has the correct probability dynamics for the process generated by the .
V Weak quantum minimality
An open problem is to determine the minimal quantum purestate representation of a given classical process. This problem is solved in some specific instances such as the Ising model [11] and the Perturbed Coin Process [14]. In these cases it is known to be the machine. We denote the smallest value of the Rényi entropy of the stationary state as , called the quantum Rényi complexities, including the limits, the quantum topological complexity , the quantum mincomplexity , and the quantum statistical complexity . If a strongly minimal quantum purestate model exists, these complexities are all attained by the same purestate model. One of our primary results in this section is that for some processes, this does not occur.
We start by examining two examples. The first, the MBW Process introduced in Ref. [29], demonstrates a machine whose machine is not minimal in the von Neumann complexity. Consider the process generated by the 4state MBW machine shown in Fig. 8.
This process’ HMM is simply a Markov chain, and its representation in Fig. 8 is its . Denote this classical representation by . If we take as an orthonormal basis of a Hilbert space, we can construct the machine with the states:
Since it is a Markov chain, we can write the Kraus operators as , where . This is a special case of the construction used in Ref. [13]. For machines of Markov chains, then, the dual basis is just . We denote the machine model of the 4state MBW Process as .
Let’s examine the majorization between and the Markov model via the Lorenz curves of , the eigenvalues of , and the stationary state of the Markov chain. See Fig. 9.
It turns out that there is a smaller quantum model embedded in two dimensions, with states:
In this case, derives the machine. This gives the proper transition probabilities for the 4state MBW model. This dimensionally smaller model we denote . Figure 10 compares the Lorenz curve of its stationary eigenvalues to those of . One sees that it does not majorize the machine, but it does have a lower statistical memory: and bit. (On the other hand, the machine has a smaller minmemory, with and .)
Now consider something in the opposite direction. Consider the 3state MBW model, denoted and displayed in Fig. 11. This is a generalization of the previous example to three states instead of four. We will compute the corresponding machine and show that there also exists a dimensionally smaller representation . In this case, however, is not smaller in its statistical memory.
The machine of this Markov chain is given by the states:
and Kraus operators defined similarly to before. We can examine the majorization between the machine and the Markov model by plotting the Lorenz curves of , the eigenvalues of , and the stationary state of the Markov chain, shown in Fig. 12.
The lowerdimensional model is given by the states:
with . This gives the proper transition probabilities for the 3state MBW model. Figure 13 compares the Lorenz curve of its stationary eigenvalues to that of . We see that it does not majorize . And, this time, this is directly manifested by the fact that the smallerdimension model has a larger entropy: and bit.
After seeing the ’s strong minimality with respect to other classical models and its strong maximality with respect to quantum models, it is certainly tempting to conjecture that a strongly minimal quantum model exists. However, the examples we just explored cast serious doubt. None of the examples covered above are strong minima. One way to prove that no strong minimum exists for, say, the 3state MBW process requires showing that there does not exist any other quantum model in dimensions that generates the process. This would imply that no other model can majorize . And, since this model is not strongly minimal, no strongly minimal solution can exist.
Appendix A proves exactly this—thus, demonstrating a counterexample to the strong minimality of quantum models.
[Weak Minimality of ] The quantum model weakly minimizes topological complexity for all quantum generators of the 3state MBW Process; consequently, the 3state MBW Process has no strongly minimal model.
Vi Concluding remarks
Majorizing states provides a means to compare a process’ alternative models in both the classical and quantum regimes. Majorization implies the simultaneous minimization of a large host of functions. As a result we showed that:

The majorizes all classical predictive models of the same process, and so simultaneously minimizes many different measures of memory cost.

The machine, and indeed any quantum realization of the , always majorizes the , and so simultaneously improves on all the measures of memory cost.

For at least one process, there does not exist any quantum purestate model that majorizes all quantum purestate models of that process. Thus, while an machine may be improved upon by different possible quantum models, there is not a unique one quantum model that is unambiguously the “best” choice.
Imagining the as an invariant “saddlepoint” in the majorization structure of modelspace, Fig. 14 depicts the implied geometry. That is, we see that despite its nonminimality among all models, the still occupies a topologically important position in modelspace—one that is invariant to one’s choice of memory measure. However, no similar model plays the topologically minimal role for quantum purestate models.
The quantum statistical complexity has been offered up as an alternative quantum measure of structural complexity—a rival of the statistical complexity [32]. One implication of our results here is that the nature of this quantum minimum is fundamentally different than that of . This observation should help further explorations into techniques required to compute and the physical circumstances in which it is most relevant. That the physical meaning of involves generating an asymptotically large number of realizations of a process may imply that it cannot be accurately computed by only considering machines that generate a single realization. This is in contrast to which, being strongly minimized, must be attainable in the singleshot regime along with measures like and .
In this way, the quantum realm again appears ambiguous. Ambiguity in structural complexity has been previously observed in the sense that there exist pairs of processes, and , such that but [33]. The classical and quantum paradigms for modeling can disagree on simplicity—there is no universal Ockham’s Razor. How this result relates to strong versus weak optimization deserves further investigation.
The methods and results here should also be extended to analyze classical generative models which, in many ways, bear resemblances in their functionality to the quantum models [34, 35, 36]. These drop the requirement of unifilarity, similar to how the quantum models relax the notion of orthogonality. Important questions to pursue in this vein are whether generative models are strongly maximized by the and whether they have their own strong minimum or, like the quantum models, only weak minima in different contexts.
Acknowledgments
The authors thank Fabio Anza, John Mahoney, Cina Aghamohammadi, and Ryan James for helpful discussions. As a faculty member, JPC thanks the Santa Fe Institute and the Telluride Science Research Center for their hospitality during visits. This material is based upon work supported by, or in part by, John Templeton Foundation grant 52095, Foundational Questions Institute grant FQXiRFP1609, the U.S. Army Research Laboratory and the U. S. Army Research Office under contract W911NF1310390 and grant W911NF1810028, and via Intel Corporation support of CSC as an Intel Parallel Computing Center.
References
 [1] E. N. Lorenz. Deterministic nonperiodic flow. J. Atmos. Sci., 20:130, 1963.
 [2] E. N. Lorenz. The problem of deducing the climate from the governing equations. Tellus, XVI:1, 1964.
 [3] A. B. Boyd, D. Mandal, and J. P. Crutchfield. Identifying functional thermodynamics in autonomous Maxwellian ratchets. New J. Physics, 18:023049, 2016.
 [4] J. P. Crutchfield and K. Young. Inferring statistical complexity. Phys. Rev. Let., 63:105–108, 1989.
 [5] J. P. Crutchfield. The calculi of emergence: Computation, dynamics, and induction. Physica D, 75:11–54, 1994.
 [6] C. R. Shalizi and J. P. Crutchfield. Computational mechanics: Pattern and prediction, structure and simplicity. J. Stat. Phys., 104:817–879, 2001.
 [7] J. P. Crutchfield. Between order and chaos. Nature Physics, 8:17–24, 2012.
 [8] M. Gu, K. Wiesner, E. Rieper, and V. Vedral. Quantum mechanics can reduce the complexity of classical models. Nature Comm., 3(762), 2012.
 [9] J. R. Mahoney, C. Aghamohammadi, and J. P. Crutchfield. Occam’s quantum strop: Synchronizing and compressing classical cryptic processes via a quantum channel. Scientific Reports, 6:20495, 2016.
 [10] C. Aghamohammadi, J. R. Mahoney, and J. P. Crutchfield. Extreme quantum advantage when simulating classical systems with longrange interaction. Scientific Reports, 7(6735), 2017.
 [11] W. Y. Suen, J. Thompson, A. J. P. Garner, V. Vedral, and M. Gu. The classicalquantum divergence of complexity in modelling spin chains. Quantum, 1:25, 2017.
 [12] A. J. P. Garner, Q. Liu, J. Thompson, V. Vedral, and M. Gu. Provably unbounded memory advantage in stochastic simulation using quantum mechanics. New J. Physics, 19:103009, 2017.
 [13] C. Aghamohammadi, S. P. Loomis, J. R. Mahoney, and J. P. Crutchfield. Extreme quantum memory advantage for rareevent sampling. Phys. Rev. X, 8:011025, 2018.
 [14] J. Thompson, A. J. P. Garner, J. R. Mahoney, J. P. Crutchfield, V. Vedral, and M. Gu. Causal asymmetry in a quantum world. Phys. Rev. X, 8:031013, 2018.
 [15] P. M. Riechers, J. R. Mahoney, C. Aghamohammadi, and J. P. Crutchfield. Minimized statecomplexity of quantumencoded cryptic processes. Phys. Rev. A, 93(5):052317, 2016.
 [16] B. Coecke, T. Fritz, and R. W. Spekkens. A mathematical theory of resources. Info. Comput., 250:59–86, 2016.
 [17] A. W. Marshall, I. Olkin, and B. C. Arnold. Inequalities: Theory of Majorization and Its Applications. Springer, New York, NY, 3 edition, 2011.
 [18] M. A. Nielsen. Conditions for a class of entanglement transformations. Phys. Rev. Lett., 83(436), 1999.
 [19] M. Horodecki and J. Oppenheim. Fundamental limitations for quantum and nanoscale thermodynamics. Nature Comm., 4(2059), 2013.
 [20] G. Gour, M. P. Müller, V. Narasimhachar, R. W. Spekkens, and N. Y. Halpern. The resource theory of informational nonequilibrium in thermodynamics. Phys. Rep., 583:1–58, 2015.
 [21] G. Grätzer. Lattice Theory: Foundation. Springer, Basel, 2010.
 [22] R. Renner and S. Wolf. Smooth Rényi entropy and applications. In IEEE Information Theory Society, editor, 2004 IEEE Intl. Symp. Info. Th.: Proceedings, page 232, Piscataway, N.J., 2004. IEEE.
 [23] M. Tomamichel. A Framework for NonAsymptotic Quantum Information Theory. PhD thesis, ETH Zurich, Zurich, 2012.
 [24] M. Horodecki, J. Oppenheim, and C. Sparaciari. Extremal distributions under approximate majorization. J. Phys. A: Math. Theor., 51(305301), 2018.
 [25] D. R. Upper. Theory and Algorithms for Hidden Markov Models and Generalized Hidden Markov Models. PhD thesis, University of California, Berkeley, 1997. Published by University Microfilms Intl, Ann Arbor, Michigan.
 [26] N. F. Travers and J. P. Crutchfield. Equivalence of history and generator machines. arxiv.org:1111.4500 [math.PR].
 [27] J. Hopcroft. An algorithm for minimizing states in a finite automaton. In A. Paz Z. Kohavi, editor, Theory of Machines and Computations, pages 189–196, New York, 1971. Academic Press.
 [28] N. F. Travers and J. P. Crutchfield. Exact synchronization for finitestate sources. J. Stat. Physics, 145:1181–1201, 2011.
 [29] A. Monras, A. Beige, and K. Wiesner. Hidden quantum Markov models and nonadaptive readout of manybody states. Appl. Math. Comput. Sci., 3:93, 2011.
 [30] L. P. Hughston, R. Jozsa, and W. K. Wootters. A complete classification of quantum ensembles having a given density matrix. Phys. Lett. A, 183:12–18, 1993.
 [31] J. R. Mahoney, C. Aghamohammadi, and J. P. Crutchfield. Occam’s quantum strop: Synchronizing and compressing classical cryptic processes via a quantum channel. Scientific Reports, 6(20495), 2016.
 [32] R. Tan, D. R. Terno, J. Thompson, V. Vedral, and M. Gu. Towards quantifying complexity with quantum mechanics. Eur. J. Phys. Plus, 129:191, 2014.
 [33] C. Aghamohammadi, J. R. Mahoney, and J. P. Crutchfield. The ambiguity of simplicity in quantum and classical simulation. Phys. Lett. A, 381(14):1223–1227, 2017.
 [34] W. Löhr and N. Ay. Nonsufficient memories that are sufficient for prediction. In J. Zhou, editor, Complex Sciences 2009, volume 4 of Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, pages 265–276. Springer, New York, 2009.
 [35] W. Löhr and N. Ay. On the generative nature of prediction. Adv. Complex Sys., 12(02):169–194, 2009.
 [36] J. B. Ruebeck, R. G. James, J. R. Mahoney, and J. P. Crutchfield. Prediction and generation of binary markov processes: Can a finitestate fox catch a markov mouse? Chaos, 28(013109), 2018.
 [37] J. P. Crutchfield and S. Marzen. Signatures of infinity: Nonergodicity and resource scaling in prediction, complexity and learning. Phys. Rev. E, 91(050106), 2015.
 [38] J. P. Crutchfield and S. Marzen. Structure and randomness of continuoustime, discreteevent processes. J. Stat. Phys., 169(2):303–315, 2017.
 [39] T. J. Elliot, A. J. P. Garner, and M. Gu. Quantum selfassembly of causal architecture for memoryefficient tracking of complex temporal and symbolic dynamics. arxiv.org:1803.05426.
Appendix A Appendix: Weak Minimality of
Here, we prove that is the unique 2D representation of the 3state MBW process. We show this by considering the entire class of 2D models and applying the completeness constraint.
We note that a purestate quantum model of the state MBW process must have three states , , and , along with three dual states , , and such that:
and:
We list the available geometric symmetries that leave the final stationary state unchanged:

Phase transformation on each state, ;

Phase transformation on each dual state, ; and

Unitary transformation and .
From these symmetries we can fix gauge in the following ways:

Set to be real and positive for all .

Set .

Set and set to be real and positive.
These gauge fixings allow us to write:
for , and and a phase .
That these states are embedded in a 2D Hilbert space means there must exist some linear consistency conditions. For some triple of numbers we can write:
Up to a constant, we use our parameters to choose:
Consistency requires that this relationship between vectors is preserved by the Kraus operator dynamic. Consider the matrix . The vector must be a null vector of ; i.e. . This first requires that be degenerate. One way to enforce this to check that the characteristic polynomial has an overall factor of . For simplicity, we compute the characteristic polynomial of :
To have an overall factor of , we need:
Typically, there will be several ways to choose phases to cancel out vectors, but in this case since the sum of the magnitudes of the complex terms is 8, the only way to cancel is at the extreme point where , , and and:
To recapitulate the results so far, has the form:
We now need to enforce that . We have the three equations:
Comments
There are no comments yet.