Robust inference of memory structure for efficient quantum modelling of stochastic processes

by   Matthew Ho, et al.

A growing body of work has established the modelling of stochastic processes as a promising area of application for quantum techologies; it has been shown that quantum models are able to replicate the future statistics of a stochastic process whilst retaining less information about the past than any classical model must – even for a purely classical process. Such memory-efficient models open a potential future route to study complex systems in greater detail than ever before, and suggest profound consequences for our notions of structure in their dynamics. Yet, to date methods for constructing these quantum models are based on having a prior knowledge of the optimal classical model. Here, we introduce a protocol for blind inference of the memory structure of quantum models – tailored to take advantage of quantum features – direct from time-series data, in the process highlighting the robustness of their structure to noise. This in turn provides a way to construct memory-efficient quantum models of stochastic processes whilst circumventing certain drawbacks that manifest solely as a result of classical information processing in classical inference protocols.



There are no comments yet.


page 3


Quantum coarse-graining for extreme dimension reduction in modelling stochastic temporal dynamics

Stochastic modelling of complex systems plays an essential, yet often co...

Extreme dimensional compression with quantum modelling

Effective and efficient forecasting relies on identification of the rele...

Strong and Weak Optimizations in Classical and Quantum Models of Stochastic Processes

Among the predictive hidden Markov models that describe a given stochast...

Extreme Quantum Advantage for Rare-Event Sampling

We introduce a quantum algorithm for efficient biased sampling of the ra...

Surveying structural complexity in quantum many-body systems

Quantum many-body systems exhibit a rich and diverse range of exotic beh...

Quantum-inspired identification of complex cellular automata

Elementary cellular automata (ECA) present iconic examples of complex sy...

Thermodynamically-Efficient Local Computation: Classical and quantum information reservoirs and generators

The thermodynamics of modularity identifies how locally-implemented comp...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Complex processes are prevalent throughout the world, taking the form of natural processes such as the weather Lynch (2008); Bauer et al. (2015) and DNA sequences Pavlos et al. (2015), as well as artificial processes like the stock market Preis et al. (2012) and traffic Kerner and Rehborn (1996). We construct models of these processes in order to better understand their structure and predict their behaviour. Within complexity science, the field of computational mechanics Crutchfield and Young (1989); Shalizi (2001); Crutchfield (2011) offers a systematic approach to understanding the intrinsic computation of a process by identifying the causal links between its past and future, and has been used to study a diverse set of dynamics such as deterministic chaos in the logistic map Crutchfield and Young (1989); Crutchfield (2011), cellular automata Hanson and Crutchfield (1997), the dripping faucet experiment Gonçalves et al. (1998), stock markets Park et al. (2007), and neural spike trains Haslinger et al. (2010). A key component of the approach are so-called -machines, which as a valuable byproduct represent the most parsimonius causal model of a process.

In recent decades, the prospect of using quantum effects in information processing has emerged, promising advantages for a range of applications in terms of algorithmic speed-ups Montanaro (2016), secure communication Bennett and Brassard (2014), and beyond. Stochastic modelling is no exception to this, and a growing body of work has established that when information is encoded into a quantum memory, causal models of a stochastic process can be designed that function whilst retaining less information about the past than is classically possible Gu et al. (2012); Mahoney et al. (2016); Riechers et al. (2016); Thompson et al. (2017); Elliott and Gu (2018); Binder et al. (2018); Elliott et al. (2019a); Liu et al. (2019). This quantum memory advantage can grow unbounded Garner et al. (2017); Aghamohammadi et al. (2017a); Elliott and Gu (2018); Elliott et al. (2019a); Thompson et al. (2018); Elliott et al. (2019b), and has been verified experimentally Palsson et al. (2017); Jouneghani et al. (2017); Ghafari et al. (2019). Like its classical counterpart, the amount of information stored within these quantum models has been suggested as a measure of structural complexity in stochastic dynamics Tan et al. (2014); Suen et al. (2017); Aghamohammadi et al. (2017b); Suen et al. (2018).

Currently, systematic approaches to constructing such quantum models are predicated on having a prior exact statistical description of the process, or knowledge of its -machine. As a result, to apply these tools to real-world systems we must first use classical inference protocols to construct an -machine Crutchfield and Young (1989); Shalizi and Klinker (2004); Strelioff and Crutchfield (2014), and then use this as a basis to construct a corresponding quantum model. It is desirable to instead have a model inference protocol to directly go from data to the quantum model, avoiding any extra computational overhead associated with also determining the classical model. In this vein, here we introduce such a protocol for directly inferring the memory structure of a quantum model of a stochastic process – which we show is robust to statistical noise. The protocol is tailored specifically for quantum models, taking advantage of certain of their features that allow some approximations that must be made in classical information processing to be avoided. Fig. 1 provides a schematic of our motivation.

Figure 1: Schematic context of our work. Quantum information processing has been shown to provide a more memory efficient route to stochastic modelling than classically possible. However, current approaches to constructing quantum models first require classical models to be inferred; here we introduce a blind inference protocol for going straight from raw data to quantum structure. The quantities , and represent the information stored by the minimal classical, quantum, and inferred quantum causal models respectively.

The layout of this article is as follows. In Section II we outline the general framework of stochastic processes and computational mechanics as is relevant here, as well as the more efficient quantum models. Section III provides the core of our results, introducing the inference protocol, showing its robustness to statistical fluctuations, and justifying its accuracy. The efficacy of our inference protocol is then demonstrated in practice with two toy processes in Section IV. Finally, we conclude in Section V, and discuss some future directions.

Ii Framework

ii.1 Stochastic processes

We consider discrete-time stochastic processes represented by a bi-infinite probabalistic string of outcomes , where

are random variables that take on values

drawn from an alphabet , and the subscript represents the timestep. Consecutive strings are called words, with the left index inclusive and the right exclusive. We consider stationary processes, such that . We partition the process into (semi-infinite) pasts and futures, denoted as and respectively, where is taken to be the present.

The Markov order is an important property of a process that defines an effective history length; a process is said to have Markov order if is the smallest value such that is satisfied Racca et al. (2007). That is, it is the smallest block length of the most recent past that provides a sufficient statistic of the future. When the process is said to be Markovian.

ii.2 Models

Computational mechanics. Computational mechanics Crutchfield and Young (1989); Shalizi (2001); Crutchfield (2011) provides a formal statistical framework for identifying and analysing structure in complex processes. Its modus operandi involves the minimal causal representation111Here, by causal we mean that the representation stores no information about the future of the process that could not be deduced from past observations. of stochastic processes, which may be determined by a systematic clustering of pasts. Specifically, the causal states of a process are a set of equivalence classes on the pasts, defined according to the relation


that is, two pasts belong to the same causal state iff they give rise to statistically-identical futures. We label the causal states as .

Because of the deterministic assignment of pasts to causal states, it can be seen that transitions between causal states are also deterministic conditional on the output symbol; that is, given a past , upon emission of the next symbol the new past must belong to causal state , where is a deterministic update function. This deterministic transition structure is sometimes referred to as unifilarity

, and allows us to represent the process as a deterministic edge-emitting hidden Markov model (HMM) known as the

-machine, where the causal states form the hidden states of the model, and the edge emissions the observed symbols Crutchfield and Young (1989); Shalizi (2001).

The amount of information stored by the -machine can be quantified by the Shannon entropy of the stationary distribution on causal states:


where . Across all (classical) causal representations of a process, the -machine minimises the information cost of its corresponding memory states, and it is in this sense we refer to it as being minimal (or optimal). Because of this distinguished feature, is called the statistical complexity, and is considered as a quantifier of structure in the process Crutchfield (2011), in some sense representing how much information about the past is needed to produce the future. This quantity is lower bounded by the mutual information between the past and future Shalizi (2001); in general this bound is not strict, and the difference is referred to as the modelling overhead Mahoney et al. (2011).

When dealing with raw data, one must estimate the probabilities through inference, which will be subject to unavoidable statistical fluctuations due to the finite amount of data. As such, when applying the equivalence relation Eq. (

1) a threshold -tolerence must be permitted, where pasts are assigned to the same causal state if their conditional future distributions are ‘close enough’ Crutchfield and Young (1989)222It should be noted that there is not a fixed definition of how this tolerence should be implemented, but typically it would be appropriate to use some form of statistical distance between the conditional distributions, with some maximal allowed distance for merging parameterised by .. Adjusting the strictness of this tolerence induces diferent levels of coarse-graining: if too narrow then the fluctuations will lead to additional spurious causal states that would have been merged with knowledge of the exact distributions; if too loose then pasts with different conditional futures can be merged. Depending on this degree of coarse-graining, the obtained value of will vary; the statistical complexity is sensitive to statistical fluctuations, and is generally not robust to noise.

Quantum computational mechanics. When considering quantum methods of information processing, the minimality of -machines no longer holds; it has been shown that causal quantum models can be found with lower information costs Gu et al. (2012), using non-orthogonal memory states to reduce the modelling overhead. The current state-of-the-art quantum models Liu et al. (2019) are based on unitary interactions between the memory subsystem and a probe ancilla:


where the first subspace contains the memory, the second the probe (measured after the interaction to produce the symbol for that timestep), are the quantum memory states (in one-to-one correspondence with the causal states ), and the phase factors are tunable parameters. Successive applications of on the quantum memory state and probe at each timestep will yield a string of outputs from the probe measurement that are statistically distributed according to the modelled process333After each timestep, the probe ancilla is either reset, or a fresh ancilla is introduced., as depicted in Fig. 2.

Figure 2: Unitary quantum models of stochastic processes. Repeated unitary interactions between a quantum memory and probe ancilla produces a string of stochastic outputs when the probe is measured. The specific form of the interaction depends on the particular process being modelled; the output statistics will then be as specified by this process.

The corresponding memory cost is called the quantum statistical memory, given by the von Neumann entropy of the quantum memory states:


where . The title of quantum statistical complexity is reserved for the minimum of this quantity over all causal quantum models; there is as yet no systematic approach to finding this minimal model however, and optimising over the phase factors is a cumbersome task. For this reason, we shall here use the best quantum models for which a systematic construction method is known: phaseless unitary quantum models Binder et al. (2018), given by Eq. (3) with all phase factors set to zero. Despite not generally being minimal, the corresponding quantum statistical memory of these models has still been suggested as quantifier of structure Gu et al. (2012); Tan et al. (2014); Suen et al. (2017); Aghamohammadi et al. (2017b); Suen et al. (2018), often emphasising different features to the classical statistical complexity.

Iii Inference protocol

We here introduce an inference protocol for the quantum statistical memory 444We use tildes to represent estimated quantities. of the phaseless unitary quantum models that can be used to investigate structure in time-series data. The protocol is tailored specifically to take advantage of features of the specific model, and bypassing the need to construct the -machine as an intermediate step. It is agnostic to the causal architecture of the process, and requires only an estimate of the Markov order.

The inference protocol is based on a set of postulated quantum memory states given by clustering pasts in which the last symbols are identical; we thus have a memory state for each of the possible -length words . The choice of should correspond to the estimated Markov order of the process (or at least, the effective Markov order – see below). From the data we then estimate the conditional probabilities , and implicitly define the quantum memory states to satisfy the interaction


A set of quantum memory states and corresponding interaction can be found through a recursive expression for the overlaps of the states and employing a reverse Gram-Schmidt procedure Binder et al. (2018). The estimated quantum statistical memory is then given by the von Neumann entropy of the corresponding stationary state of the memory:


where .

To show that this protocol provides a faithful estimate of the quantum statistical memory, we first prove two properties of the above construction:


Self-merging of quantum memory states
We show that when is at least as large as the Markov order the overlap of quantum memory states assigned to different pasts in the same causal state is unity when exact probabilities are used.


Robustness of quantum statistical memory.
We show that the quantum statistical memory of the phaseless unitary quantum model is insensitive to small perturbations in the probabilities.

With these properties we can then discuss the accuracy of the inference protocol:


Inference protocol.
We indicate how the accuracy of our estimate of quantum statistical memory scales with the amount of data, and how it converges for sufficiently large data streams.

iii.1 Self-merging of quantum memory states

We first show that the blind construction Eq. (5) will automatically adopt the causal architecture of the process (i.e., that pasts belonging to the same causal state are assigned to the same memory state) without explicit need to apply the causal equivalence relation Elliott et al. (2019a), provided that exact probabilities are used, and the chosen is at least as large as the Markov order of the process. That is, the quantum memory states we construct will correspond to the same states as would be obtained from the phaseless form of the model Eq. (3), but without prior knowledge of how the pasts are clustered into causal states. In turn, this means the blind construction will faithfully replicate the process, with the same quantum statistical memory.

To see this, let denote the Markov order of the process, and recall that this means . Since the Markov order can alternatively be expressed as the longest history length needed to determine the causal state (i.e., , all pasts where the latest symbols are identical belong to the same causal state Mahoney et al. (2016). We can see that if , the construction already correctly merges all pasts where the latest symbols are identical.

By analogy with the corresponding methods for phaseless unitary models Binder et al. (2018), we can express


Iteratively applying this relation, we obtain that


if we then have


where the full pasts and can be taken as any pasts with the correct corresponding last symbols. We are thus able to conclude that


which can be seen as an instantiation of the causal equivalence relation, i.e., two pasts are mapped to the same memory state iff they have the same conditional future statistics.

iii.2 Robustness of quantum statistical memory

We next show that the quantum statistical memory of our construction is robust to small perturbations of the probabilities. Consider mapping


where governs the relative changes in the distribution for each string, and the strength of the perturbation. We here outline a proof that the perturbation to scales smoothly with ; full details may be found in Appendix A.

The Gram matrix of a quantum state is defined as , and can be shown to have the same spectrum as  Jozsa (2000); Horn and Johnson (2012); Riechers et al. (2016). As such, it is possible to define the Gram matrix of our construction as


and correspondingly, from its spectrum calculate .

Consider that we have , such that it is possible to express the overlaps of the quantum memory states as


Note that we only need consider

steps into the future as this uniquely determines the subsequent memory state, independent of the past. Now replace each of the probability distributions in this expression by their corresponding perturbed forms, which may be obtained from the marginals of Eq. (

11). We can then calculate the perturbed form of the corresponding Gram matrix using this expression for the overlaps, and show that its spectrum varies smoothly with . Since the von Neumann entropy is a continuous function of the spectrum of a state Nielsen and Chuang (2010), we thus find that it too smoothly deforms with .

Hence, we can conclude that the quantum statistical memory is robust to small perturbations in the probability distributions. Due to the self-merging of our quantum memory states, we can also see that the quantum statistical memory of phaseless unitary quantum models is similarly robust in general. This is in contrast to the classical statistical complexity, which can vary discontinuously with the probabilities – notably, whenever the pertubation triggers a new merging of pasts into a causal state, or conversely, the splitting of a causal state.

iii.3 Inference protocol

With these two results in hand, we are now in a position to argue the accuracy of our inference protocol. By parsing the data and adopting a frequentist approach to estimate the marginal distribution of words of length (for some chosen ), we are able to estimate the conditional probabilities needed for Eq. (5). Moreover, we can use these marginals to construct estimates for the probabilities needed to evaluate Eq. (12) (the conditional probabilities can be obtained from multiplying with the assumption that is at least as large of the Markov order), and in turn, estimate . From the results of the previous two subsections, we can be assured that this should be an faithful estimate, provided that the estimated marginals are close to the exact distributions, and .

Let us first consider the error in our estimates of the marginals. With any inference from finite data there will of course be statistical fluctuations; this is not problematic for our inference protocol provided that these fluctuations are sufficiently small. These fluctuations can be treated in the same manner as the perturbations of the previous section, but with the perturbation terms now being stochastic variables. In Appendix B we show that the size of the error in the inferred Gram matrix and the estimated quantum statistical memory approximately scale as , where is the size of the data stream considered.

The next question is how we determine the choice of . From the above, we see that taking too large an will lead to untenably large fluctuations. We must therefore cap it at some value where ; as a rough guideline we suggest . On the other hand, we require to be large enough that it matches or exceeds the Markov order of the process, in order to effect the (approximate) self-merging of quantum memory states.

When a process has a large, or even infinite Markov order, it may not be possible to have an that satisfies both of these requirements. Nevertheless, while the Markov order tells us how far back into the past memory effects can persist, it does not inform us how strong they are. It is often the case that these long-range historical dependencies only have minor influence on the future, and that the recent past is much more relevant. In such instances, the influence of the distant past can be thought of as a small perturbation to the statistics with respect to the recent past alone, and so from the previous subsection we can expect that they have minimal impact on the requisite quantum statistical memory. We therefore introduce the concept of an effective Markov order , that encapsulates the idea that a sufficiently-long string of past observations that is less than the Markov order may nevertheless still be ‘good enough’ to capture most of the predictive information contained in the past.

We define the effective Markov order as the smallest length of a string of past observations for which the influence of considering an additional symbol one step further into the past does not exceed some threshold. Specifically, we define as the smallest integer that satisfies


where is the parameter defining the threshold555Strictly, we have a family of effective Markov orders for the process, parameterised by ., the expectation value is taken over the distribution of strings of length , and is some distance measure between probability distributions; for the purposes of this work we will use the trace distance . We can estimate the effective Markov order from this data, and use this as a guide for choosing a value for in the inference protocol.

Iv Examples

We now demonstrate the efficacy of our model with two toy example processes: the so-called - golden mean and nemo processes. We use an exact HMM representation of the processes to generate a representative string of outputs, and infer directly from this time-series. We choose the initial state according to the stationary distribution of the HMM, such that the output statistics are representative of the stationary state of the process. We will look at how the estimate for the quantum statistical memory varies both with the amount of data , and the history length , highlighting the range of values that would be considered appropriate given the above considerations regarding the (effective) Markov order and expected size of fluctuations.

iv.1 Golden mean process

We first look at the - golden mean process family. Here, and are tunable parameters that correspond to the Markov order and cryptic order666The cryptic order is a counterpart to the Markov order, describing the minimum length of past observations required to be certain of the present causal state, given that one knows the entire future, i.e., the smallest satisfying . of the process respectively Crutchfield et al. (2009); Mahoney et al. (2009, 2011, 2016). Here, we will consider the - golden mean process specifically, as represented by its -machine in Fig. 3(a).

Figure 3: Golden mean process. (a) -machine for the 3-2 golden mean process; the notation denotes that the indicated transition between states involves output of symbol and occurs with probability . (b) Average trace distance for variation steps in the past; hollow circles denote the points where , and the dashed line . (c) Comparison of exact and estimated quantum statistical memory for different lengths of data stream; the variation with is shown for the estimated quantities, and the vertical line indicates the effective Markov order for . For plots (b) and (c) we take .

In Fig. 3(b) we plot the expectation of the trace distance between differing symbols increasingly far into the past (i.e, between and ), from which we can infer an effective Markov order for the process as defined in Eq. (14). The limits of finite data are already visible in this plot, with the instability clear when is too large relative to , due to undersampling of the process statistics. The hollow circle on each plot represents the point at which – beyond this point we consider the statistics to be undersampled. Setting a threshold , we would assign an effective Markov order of , which aligns with the true Markov order of the process. Fig. 3(c) displays the estimated ; we see that at the Markov order the estimate is very close to the exact value , with statistical noise gradually degrading the quality of the estimate at larger when we have insufficient data. This highlights both the efficacy of protocol, and importance of selecting an appropriate value for .

iv.2 Nemo process

As a second example, we consider the nemo process, which can be represented by its -machine as in Fig. 4(a). A key feature of this process is that it has infinite Markov order: a contiguous string of zeros of any length cannot be exactly synchronised to a causal state. As such, with this example it is not possible to choose an that matches the Markov order of the process. Nevertheless, we will show that the effective Markov order can provide a suitable proxy.

Figure 4: Nemo process. (a) -machine for the nemo process. (b) Average trace distance for variation steps in the past; hollow circles denote the points where , and the dashed line . (c) Comparison of exact and estimated quantum statistical memory for different lengths of data stream; the variation with is shown for the estimated quantities, and the vertical line indicates the effective Markov order for . For plots (b) and (c) we take and .

Fig. 4(b) shows the expected trace distance for variation in the symbol steps into the past; setting a tolerance we assign an effective Markov order . Examining the estimated quantum statistical memory [Fig. 4(c)], we see that setting does indeed appear to provide an accurate estimate of , striking a balance between allowing sufficiently long histories to capture most of the past dependency, while not going as far as to underfit. Note that in this case we should consider to be insufficient data to provide a good estimate, as the prescribed is significantly smaller than the effective Markov order – practically, this could be deduced from the trace distance, which never drops below the threshold value.

V Discussion

We have introduced a protocol for estimating the information cost of quantum simulation of stochastic processes. We have shown that both this quantity and our protocol are robust to small statistical perturbations. This provides a means to characterise structure in a process according to the amount of (quantum) resources needed to capture its behaviour, analogous to corresponding classical quantities Crutchfield and Young (1989); Shalizi and Klinker (2004); Strelioff and Crutchfield (2014). Moreover, this provides a key step towards blind construction of quantum models that efficiently replicate the behaviour of such processes.

An essential consideration to be made in this latter vein is the capabilities of current and near-term quantum technologies. Our inferrence protocol accurately captures the information that must be stored by a quantum model of a process – appropriately indicating the amount of structure in the process – at the expense of indicating a multitude of memory states, typically , in excess of the number of causal states. The number of memory states is parameterised by a companion metric, the topological memory ; quantum advantages can also be found in terms of this measure Thompson et al. (2018); Ghafari et al. (2019); Liu et al. (2019); Elliott et al. (2019b); Loomis and Crutchfield (2019). Future work will investigate methods of compression in this parameter via truncation in terms of the quantum state space, opening up the possibility to implement the inferred constructions experimentally. The accuracy of these inferred models can then be explored using recently-developed quantifiers of process distinguishability Yang et al. (2019).

We thank Chew Lock Yue, Suen Whei Yeap, and Suryadi for discussions. This work was funded by the Lee Kuan Yew Endowment Fund (Postdoctoral Fellowship), Singapore Ministry of Education Tier 1 grant RG190/17, FQXi-RFP-1809 from the Foundational Questions Institute and Fetzer Franklin Fund, a donor advised fund of Silicon Valley Community Foundation, Singapore National Research Foundation Fellowship NRF-NRFF2016-02, and NRF-ANR grant NRF2017-NRF-ANR004 VanQuTe. T.J.E. thanks the Centre for Quantum Technologies for their hospitality.

Appendix A Detailed proof of robustness of quantum statistical memory

Recall from the main text that the Gram matrix of a quantum state is defined as  Jozsa (2000); Horn and Johnson (2012); Riechers et al. (2016). If one considers a purification of the original state , then can be recovered by taking a partial trace over the second subsystem, and by tracing out the first. The Gram matrix thus has the same spectrum as the original state, and so can be used to calculate functions of this spectrum, such as the entropy. As stated in Eq. (12), for our construction the Gram matrix is given by


Using Eq. (13), this can be expanded as


We now examine how this changes when the probabilities are replaced by their peturbed versions . Consider expanding out the square root of the product of two such pertubations:


Substituting this into Eq. (A), we obtain


Thus, we can write




From Weyl’s inequality, it then follows that the perturbation to the eigenvalues of

are bounded by the spectral norm of  Horn and Johnson (2012). Clearly, this norm scales with , and so the perturbation to the spectrum of varies continuously with . Finally, since the von Neumann entropy of a quantum state is given by the Shannon entropy of its spectrum – and is a continuous function of it Nielsen and Chuang (2010) – the quantum statistical memory is smoothly deformed by the pertubation, and so is robust.

Appendix B Scaling of statistical noise

We now examine the effects of statistical noise in our estimates of word probabilities on . These fluctuations can be considered as a (stochastic) pertubation, i.e. , allowing us to employ the results above. We will set to 1, folding the full scaling of the perturbation with and into .

Recall that the corrections to the eigenvalues arising from the perturbation are bounded by the spectral norm of – i.e., its largest eigenvalue – which in turn is bounded by the product of the dimension of the matrix with its largest element. The elements of this matrix are given in Eq. (20

); to assess how they scale we will replace the statistical variables by their standard errors. Since the word probabilities

are described by binomial distributions (a randomly selected string can be assigned as either being the given word

or not), the standard error is given by


Inserting this into Eq. (20) we obtain


The probability of obtaining a given word of length falls off approximately exponentially with . Let us assume all long words are roughly evenly distributed, and take . Considering that there are roughly terms in the sum, we have . Finally, since the dimension of the matrix scales as , the spectral norm (and thus the bound on the size of perturbations to the spectrum of the Gram matrix) scales .