1 Introduction
Stochastic process models of dynamical systems play a central role in scientific investigations across a broad range of disciplines, from computer science, to physics, to biology. Continuoustime Markov chains (CTMCs), in particular, have emerged over the last two decades as an especially powerful class of models to capture the intrinsic discreteness and stochasticity of biological systems at the single cell level. The ensuing crossfertilisation of stochastic systems biology with methods emerging from the formal modelling of computer systems has led to the dramatic explosion of a new interdisciplinary field at the intersection of computer science and biology.
Despite the unquestionable success of these efforts, scaling formal analysis techniques to larger systems remains a major challenge, since such systems usually result in very large statespaces, making subsequent analysis particularly onerous. In most cases, retrieving the evolution of the state distribution, while theoretically possible (by solving the KolmogorovChapman equations), in practice is prohibitively expensive. These hurdles also affect statistical techniques based on Monte Carlo sampling, since trajectories from CTMCs with a large statespace typically exhibit very frequent transitions and therefore require the generation of a very large number of random numbers. A popular alternative is therefore to rely on model approximations, by constructing alternative models which in some sense approximate the system’s behaviour. In the special case of CTMCs with a population structure (pCTMCs), fluid approximations
replacing the original dynamics with a deterministic set of ordinary differential equations (ODEs) have seen great success, due to their scalability and their well understood convergence properties
[1, 2, 3]. Such approximations rely on the particular structure of the statespace of pCTMCs; to the best of our knowledge, fluid approximations for general CTMCs have not been developed.In this paper, we propose a general strategy to obtain a fluid approximation for any CTMC. Our approach utilises manifold learning approaches, popular in machine learning, to embed the transition graph of a general CTMC in a Euclidean space. A powerful Bayesian nonparametric regression method complements the embedding, by inferring a drift vector field over the Euclidean space to yield a continuous process, which we term the
geometric fluid approximation. Crucially we show that in a simple pCTMC case our geometric fluid approximation is consistent with the standard fluid approximation. Empirical results on a range of examples of CTMCs without a population structure show that our approach captures well the average behaviour of the CTMC trajectories, and can be useful to solve efficiently approximate reachability problems.2 Background theory and related work
We briefly review here the mathematical definition of CTMCs, as well as the foundations of fluid approximation for population CTMCs (pCTMCs) [4, 5, 3], highlighting the specific aspects of pCTMCs that enable such a construction.
2.1 Continuous time Markov chains
A continuous time Markov Chain (CTMC) is a continuoustime, Markovian stochastic process over a finite statespace . The process is characterised by its generator matrix
, encoding the infinitesimal probability of transitioning between any two states. A more formal definition of CTMC
[5] can be given as follows:Definition 2.1.
Let , be a rightcontinuous process with values in a countable set . Let be a generator matrix on with jump matrix , such that for :
and  
The process is a continuoustime Markov chain with initial state distribution and generator matrix (CTMC()), if it satisfies:

; and

for , where and , such that , and .
This definition emphasises the socalled holding (or residence) times
, i.e. the random time that the system spends into any one state. The exponential distribution of the holding times is a simple consequence of the Markovian nature of the process; it also naturally suggests an exact algorithm to sample trajectories of CTMCs by drawing repeatedly exponential random numbers. This consideration forms the basis of the celebrated
Gillespie’s algorithm, widely used in the field of systems biology and known as the stochastic simulation algorithm (SSA) [6].An important special instance of the CTMC is the socalled population CTMC (pCTMC). Population CTMCs model systems consisting of indistinguishable agents belonging to different agent types, or species. The statespace is therefore identified with a vector of integer numbers, with each entry counting the species population, and transitions occurring at different rates depending on the counts of each agent type.
The transitions in a pCTMC can be regarded as occurrences of chemical reactions, written as
where for every species , particles of are consumed and particles are created (species unit vectors are orthonormal: ). The reaction rate constant is a factor in the propensity function , which constructs the matrix of the CTMC.
2.2 Continuous relaxation and the fluid limit
As discussed in the previous section, the Markovian nature of CTMCs naturally provides an exact sampling algorithm for drawing CTMC trajectories. The same Markovian nature also leads to a set of ordinary differential equations (ODEs) governing the evolution of the singletime marginal state probability, the celebrated ChapmanKolmogorov equations (CKE), which in the case of pCTMCs go under the name of Master equation. Unfortunately, such equations are rarely practically solvable, and analysis of CTMCs is often reliant on computationally intensive simulations.
In the case of pCTMCs, a more concise description in terms of the collective dynamics of population averages is however available. Starting with the seminal work of van Kampen [7], and motivated by the interpretation of pCTMCs as chemical reaction systems, several approximation schemes have been developed which relax the original pCTMC to a continuous stochastic process; see [8] for a recent review.
In this paper, we are primarily interested in the socalled fluid approximation, which replaces the pCTMC with a set of ODEs, which capture the average behaviour of the system. Fluid approximations have been intensely studied and their limiting behaviour is well understood, providing specific error guarantees. There are two characteristics of a pCTMC which are instrumental to enabling the fluid approximation. Firstly, there is a natural interpretation of the states as points in a vector space, where each dimension represents a species. Secondly, a drift vector field can be naturally defined by extending the propensity function to be defined on the whole vector space, which is a polynomial function of the number of agents of each type (i.e. polynomial function of the elements of the system state vector).
Established guarantees
Following Darling and Norris [3], we examine and formalise the aspects of pCTMCs which render them especially amenable to the fluid approximation. As mentioned, the first is that pCTMC statespaces are countable and there exists an obvious ordering. We can therefore write a trivial linear mapping from the discrete, countable statespace to a continuous Euclidean space , where is the number of agent types in the system.
The second aspect is that rates of transition from each state to all others (i.e. elements of the matrix) can be expressed as a function of the state vector . A drift vector can be defined as
for each . Since is some parametric function of in pCTMCs (due to the indistinguishable nature of the agents) the definition of the drift vector can be extended over the entire Euclidean space to produce the drift vector field , where . The authors show that such constructions will produce a fluid approximation solution obeying some error guarantees, under the conditions that:

initial conditions converge, i.e.
;

mean dynamics converge, i.e. is a Lipschitz field, such that
as ; and

noise converges to zero, i.e. that ,
and
where are positive constants.
There are many ways to satisfy the above criteria, but a common one (used in pCTMCs) is “hydrodynamic scaling”, where the increments of the state Markov process mapped to the Euclidean space are and the jump rate is . The criteria above are derived formally in [3, 9] and ensure that:
The first exit time from a suitably selected domain of the Euclidean mapping of the Markov chain statespace , converges in probability to the first exit time of the fluid limit.
Canonical embedding of pCTMCs
In the canonical embedding for continuous relaxation of pCTMCs, we construct an Euclidean space, where each dimension corresponds to the concentration of each species in the system. The states are then given coordinates for the dimension and are uniformly positioned in at every interval. The motivation is that in the limit of , the distance between neighbouring states will tend to 0 in the embedding, and jump sizes will similarly vanish, producing an approximately continuous trajectory of the system in the continuous concentration space.
3 Methodology
As discussed in the previous section, the fluid approximation of pCTMCs is critically reliant on the structure of the statespace of pCTMCs being isomorphic to a lattice in . This enables the definition of a drift vector field, which can be then naturally extended to the whole ambient space and, under mild assumption, leads to convergence under suitable scaling. Neither of these ingredients are obviously available in the general case of CTMCs lacking a population structure.
In this section, we describe the proposed methodology for a geometric fluid approximation for CTMCs. We motivate our approach by describing an exact, if trivial, general embedding of a CTMC’s statespace into a very highdimensional space. Such an embedding however affords the nontrivial insight that suitable approximate embeddings may be obtained considering the spectral geometry of the generator matrix. This provides an unexpected link with a set of techniques from machine learning, diffusion maps, which embed graphs into Euclidean spaces. The geometry of diffusion maps is well studied, and their distance preservation property is particularly useful for our purpose of obtaining a fluid approximation.
Diffusion maps however provide only one ingredient to a fluid approximation; they do not define an ODE flow over the ambient Euclidean space. To do so, we use Gaussian Process regression: this provides a smooth interpolation of the dynamic field between embedded states. Smoothness guarantees that nearby states in the CTMC (which are embedded to nearby points in Euclidean space by virtue of the distance preservation property of diffusion maps) will have nearby drift values, somewhat enforcing the pCTMC property that the transition rates are a function of the state vector.
This twostep strategy provides a general approach to associate a deterministic flow on a vector space to a CTMC. We empirically validate that such flow indeed approximates the mean behaviour of the CTMCs on a range of examples in the next section. Prior to the empirical section, we prove a theorem showing that, in the special case of pCTMCs of birth/ death type, our geometric construction returns the same fluid approximation as the standard pCTMC construction, providing an important consistency result.
3.1 Eigenembeddings of CTMCs
Trivial embedding of CTMCs
Consider a CTMC with initial distribution and generator matrix , on countable statespace . The single time marginal over at time of the process obeys the CKE:
(1) 
where is a column vector. Given an arbitrary embedding of the states in some continuous space, , the projected mean , obeys:
(2)  
(3) 
where refers to the coordinate of state . In general, step (2) is only possible for , with the identity matrix (i.e. with ).
We note that choosing the trivial embedding (i.e. each state mapped to the vertex of the probability ()simplex), equates the fluid process to the original CKE:
(4) 
The fluid approximation
For any embedding , the standard fluid approximation defines the drift at any state , to be:
(5) 
where is a matrix, and is the th row of .
In order to extend the drift over the entire space , we let be the transition kernel between any point and any state . Then we naturally define the continuous vector field to be
(6) 
Trivially, yields the original CKE as shown above,
(7) 
Similarly, in the fluid approximation with , and , one has
(9) 
We can therefore claim the following: in the case of , or for a symmetric , the fluid approximation process is exactly equivalent to the projected mean. This suggests that an approximate, low dimensional representation might be obtained by truncating the spectral expansion of the generator matrix of the CTMC. Spectral analysis of a transport operator is also the approach taken by diffusion maps
, a method which is part of the burgeoning field of manifold learning for finding lowdimensional representations of highdimensional data.
Markov chain as a random walk on a graph
Another avenue to reach the same conclusion is to consider the CTMC as a random walk on an abstract graph, where vertices represent states of the chain, and weighted directed edges represent possible transitions. From this perspective, it is natural to seek an embedding of the graph in a suitable lowdimensional vector space; it is well known in the machine learning community that an optimal (in a sense specified in Section 3.2) embedding can be obtained by spectral analysis of the transport operator linked to the random walk on the graph. We expect that if the graph geometry is a discrete approximation of some continuous space, the latter will serve well as a continuous statespace for a fluid limit approximation, when endowed with an appropriate drift vector field to capture the nongeometric dynamics in the graph.
3.2 Diffusion maps
A natural method to embed the CTMC states in continuous space for our purposes is diffusion maps [10, 11, 12, 13, 14]. This is a manifold learning method, where the authors consider a network defined by a symmetric adjacency matrix, with the goal of finding coordinates for the network vertices on a continuous manifold (as is usually the case with similarities of points in highdimensional spaces).
Diffusion on a manifold
The method follows from taking the normalised similarities to be the transition kernel of a diffusion process, evolving on a hidden manifold where the network vertices lie and with a smooth boundary . Resting on this, a family of diffusion operators are constructed which can be spectrally analysed to yield coordinates for each vertex on the manifold. The continuous operators which are theoretically constructed are assumed to be approximated by the analogous discrete operators which are constructed from data. The method can be thought to optimally preserve the normalised diffusion distance of the diffusion process on the highdimensional manifold, as Euclidean distance in the embedding. Diffusion distance between two vertices at time is defined to be the distance between the probability densities over the statespace, each initialised at respectively, and after a time has passed:
where is a Hilbert space in which the distance is defined, with , the inverse of the steadystate distribution .
Diffusion with drift for asymmetric networks
The methodology of diffusion maps has been extended in [15] to deal with learning manifold embeddings for directed weighted networks. Given an asymmetric adjacency matrix, the symmetric part is extracted and serves as a discrete approximation to a geometric operator on the manifold. Spectral analysis of the relevant matrix can then yield embedding coordinates for the nodes of the network. In the same manner as for the original formulation of diffusion maps a set of backward evolution operators are derived, the two relevant ones being:
(10)  
and  (11) 
The operators are parametrised by , which determines how affected the diffusion process on the manifold is by a sampling potential, . Choosing allows us to spectrally analyse a discrete approximation to the LaplaceBeltrami operator , separating it from the density dependent term in the diffusion operator .
Diffusion maps for CTMCs
For an arbitrary CTMC(), we regard to be a discrete approximation of the operator . However, it is unclear how one can extract the geometrically relevant component under a hidden potential and parameter . In practice, therefore, we assume a uniform measure on the manifold, i.e. constant , which renders a discrete approximation of (the choice of no longer matters); further, we take the sampling transition kernel corresponding to this operator to be composed of a symmetric and antisymmetric part (without loss of generality) which renders . This contains the relevant geometric information about the network, with the first eigenvectors of the operator used as embedding coordinates in a dimensional Euclidean space (ignoring the first eigenvector which is trivial by construction). A detailed exposition of the method as it relates to our purposes of embedding a Markov chain network can be found in Appendix A.
It should be noted that, while diffusion maps have been used to construct lowdimensional approximations of highdimensional SDEs [14], and to embed a discretetime Markov chain in continuous space with an accompanying advective field [15], doing the same for a continuoustime Markov chain has not been attempted. Distinctively, the focus of that work was not to clear a path between discrete and continuous state Markov processes, but rather the lowdimensional embedding of processes or sample points. In terms of the convenient table presented in [12] and restated here in Table 1
, we seek to examine the omitted entry that completes the set of Markov models; this is the third entry added here to the original table, taking
and the limit to be the case of a CTMC with finite generator matrix .Case  Operator  Stochastic Process 

finite
matrix 
RW in discrete space
discrete in time (DTMC) 

operators

RW in continuous space
discrete in time 

infinitesimal generator
matrix 
Markov jump process; discrete in space, continuous in time 

infinitesimal
generator 
diffusion process
continuous in space & time 
3.3 Gaussian processes for inferring drift vector field
Diffusion maps provide a convenient way to embed the CTMC graph into a Euclidean space ; however, the pushforward CTMC dynamics is only defined on the image of the embedding, i.e. where the embedded states are. In order to define a fluid approximation, we require a continuous drift vector field to be defined everywhere in . A natural approach is to treat this extension problem as a regression problem, where we use the pushforward dynamics at the isolated state embeddings as observations. We therefore use Gaussian processes (GPs), a nonparametric Bayesian approach to regression, to infer a smooth function that has the appropriate drift vectors where states lie.
A Gaussian process is a collection of random variables
indexed by a continuous quantity , which follows a distribution over a family of functions . Over the Hilbert space, the distribution can be thought of as an infinitedimensional Gaussian distribution over function values, where each dimension corresponds to a point on the domain of the function. We write
where is the mean function, and is the covariance kernel of the distribution over . The choice of kernel acts as the inner product in the space of functions , and so determines the kind of functions over which the distribution is defined. Certain kernels define a distribution over a dense subspace of , and we therefore say that the GP is a universal approximator — it can approximate any function in arbitrarily well. One such kernel is the squared exponential:
where the constants and
are hyperparameters: the
amplitude and lengthscale respectively.Gaussian process regression
Suppose we observe evaluations of an (otherwise hidden) function, at points
of the function’s domain. Once an appropriate prior is established, we are able to perform Bayesian inference to obtain a posterior distribution over possible functions consistent with our observations. In Gaussian process regression, the prior is the distribution given by the kernel. Prediction of the function value
at an unobserved domain point , conditioning on observations at points , is given in terms of the distribution:Since the integral involves only normal distributions, it is tractable and has a closed form solution, which is again a normal distribution. The observations may also be regarded as noisy, which will allow the function to deviate from the observed value in order to avoid extreme fluctuations. Using an appropriate noise model (Gaussian noise), retains the tractability and normality properties. Usually the mean of the predictive distribution is used as a point estimate of the function value. For a comprehensive understanding of Gaussian processes for regression purposes, we refer to
[16].In our case, the choice of kernel and its hyperparameters is critical, especially when the density of states is low. In the limit of infinite observations of the function, the Gaussian process will converge to the true function over , if the function is in the space defined by the kernel, regardless of the hyperparameters chosen. However, the number of states we embed is finite and so the choice of an appropriate prior can greatly aid the Gaussian process in inferring a good drift vector field. Here, we use the standard squared exponential kernel with a different lengthscale for each dimension, and select hyperparameters which optimise the likelihood of the observations. The optimisation is performed via gradient descent since the gradient for the marginal likelihood is available.
3.4 Consistency result
The geometric fluid approximation scheme is applicable in general to all CTMCs; it is therefore natural to ask whether it reduces to the standard fluid approximation on pCTMCs. We have the following result.
Theorem 3.1.
Let be a pCTMC whose underlying transition graph is a multidimensional grid graph. The geometric fluid approximation of coincides with the standard fluid approximation in the hydrodynamic scaling limit.
The proof relies on the explicit computation of the spectral decomposition of the Laplacian operator of a grid graph [17], and appeals to the universal approximation property of Gaussian Processes [16]. We conjecture this result to hold for general pCTMCs. See Appendix A.2 for the full proof.
Intuitively, away from the boundaries of the network, the coordinates of the embedded states approach the classical concentration embedding, where each dimension corresponds to a measure of concentration for each species. As the network grows (i.e. allowing larger maximum species numbers in the statespace of the chain) states are mapped closer together, reducing jump size, but preserving the ordering. The spacing of states near the centre of the population size is almost regular, approaching the classical density embedding, and the GP smoothing will therefore converge to the classical extended drift field.
4 Empirical observations
Experimental evidence of our geometric fluid approximation is necessary to give an indication of the method’s validity, and a better intuition for its domain of effectiveness. We apply the geometric fluid approximation to a range of CTMCs with differing structure, and present the experimental results in this section. The CTMC models we used are defined in Section 4.1.
There is no absolute way to assess whether the method produces a good approximation to the true probability density evolution; we therefore focus on two comparisons: how close the geometric fluid trajectory over time is to the empirical mean of the original CTMC, mapped on the same statespace (Section 4.2); and how close the firstpassage time (FPT) estimate from the fluid approximation is to the true FPT cumulative density function (estimated by computationally intensive Monte Carlo sampling; Section 4.4).
Further, we demonstrate in Section 4.3 how the method is applicable to a subset of the CTMC graph, such that only a neighbourhood of the statespace is embedded. This may result in fluid approximations for graphs whose global structure is not particularly amenable to embedding in a lowdimensional Euclidean space, and so is useful for gauging the behavioural characteristics of the system near a section of the statespace.
In all figures in this section, red lines are solutions of our geometric fluid approximation, obtained via numerical integration of the drift vector field as inferred by GP regression, and blue lines are the mean of CTMC trajectories mapped to the embedding space, which were obtained via Gillespie’s exact stochastic simulation algorithm (SSA) [6]. Finally, in figures showing trajectories on the diffusion maps (DM) manifold, grey line intersections are embedded states (the grey lines being the possible transitions, or edges of the network).
4.1 Models
We examine an array of models to assess the applicability domain of our method. The models are defined below and empirical comparisons for each are presented throughout this section.
Two species birthdeath processes
This model describes two independent birthdeath processes for two species, and serves as a basic sanity check. The CTMC graph has a 2D grid structure and in this sense resembles the system in Theorem 3.1. In the usual chemical reaction network notation, we write:
for the two species , and note that there is a system size variable such that the count for each species cannot exceed ; this produces a finitestate CTMC that can be spectrally decomposed and embedded. Note further that the birth process involves no particles here, and so transitioning from state to state (or from to ) occurs at the same rate of per second . Conversely, death processes are unimolecular reactions, such that transitioning from to occurs at a rate of per second , as the chemical reaction network interpretation dictates.
Two species LotkaVolterra model
This is a LotkaVolterra model of a predatorprey system. Allowed interactions are prey birth, predators consuming prey and reproducing, and predator death. The interactions with associated reaction rates are defined below in the usual chemical reaction network notation:
where prey is represented by species (rabbits) and predators by species (foxes), with maximum predator and prey numbers of .
SIRS model
We describe a widely used stochastic model of disease spread in a fixed population, wherein agents can be in three states: susceptible, infected, and recovered () and a contagious disease spreads from infected individuals to susceptible ones. After some time, infected individuals recover and are immune to the disease, before losing the immunity and reentering the susceptible state. We define a pCTMC for the process as follows:
where the constants have been chosen such that the ODE steady state is reached some time after s. The state of the pCTMC at time is , where refers to the number of agents in state at time , and so on for all species.
Genetic switch model
This is a popular model for the expression of a gene, when the latter switches between two activation modes: active and inactive [18, 19]. While active, the gene is transcribed into mRNA at a much faster rate than while inactive (factor of ). The gene switches between the two modes stochastically with a slow rate. We have the following reactions:
where the active and inactive modes are represented by the species and respectively, with a maximum count of 1. Despite being able to express this model in the usual chemical reaction network language, we emphasize that the binary nature of the switch prohibits usual scaling arguments for reaching the fluid limit.
4.2 Assessing fluid solution and mean trajectory in embedding space
In our geometric fluid approximation, we create a map using directed diffusion maps to embed the CTMC states into a Euclidean space of small dimensionality, and use Gaussian process regression to infer a drift vector field over the space. The resulting continuous trajectories, which we refer to as the geometric fluid approximation, are in this section compared to average trajectories of the CTMC systems, projected on the same space. The latter are obtained by drawing 1000 trajectories of the CTMC using the SSA algorithm, and taking a weighted average of the state positions in the embedding space.
Our geometric fluid approximation does well for pCTMC models, where we know that the statespace can be naturally embedded in a Euclidean manifold. This is especially true for systems like the independent birthdeath processes of two species, which do not involve heavy asymmetries in the graph structure. The more the structure deviates from a pCTMC and the more asymmetries in the structure, the larger the deviations we expect from the mean SSA trajectory. Additionally, we expect large deviations in the case of bimodal distributions over the statespace, as is the general case for fluid approximations. This is because the latter are pointmass approximations of a distribution, and so are naturally more suited to approximate unimodal, concentrated densities.
Two species birthdeath processes
As a sanity check, we examine how our method approximates the mean trajectory of the trivial system of two independent birthdeath processes described above. The true distribution for such a system is unimodal in the usual concentration space, and the graph has the structure of a 2D grid lattice with no asymmetries. As shown in Figure 1, the geometric fluid approximation is very close to the empirical mean trajectory, which supports our consistency theorem and expectations for agreement in the case of symmetric graphs.
LotkaVolterra model
We perform our geometric fluid approximation for the nontrivial case of a LotkaVolterra system, which models a closed predatorprey system as described above. The asymmetric consumption reaction distorts the grid structure representative of the Euclidean square two species space. Therefore, the manifold recovered is the Euclidean square with shrinkage along the consumption dimension — more shrinkage is observed where predators and prey numbers are higher, since this implies faster consumption reactions. We observe in Figure 2 that the fluid estimate keeps close to the mean initially and slowly diverges; however, the qualitative characteristics of the trajectory remain similar.
SIRS model
The SIRS model gives us the opportunity to compare trajectories in the embedding space of the geometric fluid, with trajectories in the concentration space used by the standard fluid approximation. We observe in Figure 3 good agreement with the empirical mean trajectory for both fluid methods.
The classical fluid trajectory (Figure 2(a)) is attainable in terms of the concentration of each species; it evolves according to coupled ODEs:
(12)  
(13)  
(14) 
where , and is the total population. Increasing linearly scales the ODE solution without affecting the dynamics; the SSA average converges to the ODE solution as . Similarly, Figure 2(b) shows the fluid solution in obtained by our geometric fluid approximation.
Genetic switch model
The model of a genetic switch is a departure from the usual pCTMC structure, since the binary switch introduces very slow mixing between two birthdeath processes each with a different fixed point. The bimodality of the resulting steadystate distribution is problematic to capture for any pointmass trajectory, and quickly leads to divergence of the fluid trajectory from the mean. With the particularly slow switching rate of s, our method produces fluid trajectories close to the mean trajectory for up to s, mostly because the mixing is very slow and the distribution remains relatively concentrated for a long time (Figure 4). However, with the faster rate of s, our fluid approximation quickly diverges from the mean trajectory (Figure 5), as the expected result of faster mixing.
pCTMC perturbations
It is expected that the method will perform well for CTMCs that are in some sense similar to a pCTMC, but cannot be exactly described by a chemical reaction network. We therefore demonstrate how the method performs for perturbations of a LotkaVolterra system. To achieve the perturbation, we add noise to every existing transition rate (nonzero element of ) of the LotkaVolterra system we had above. The perturbed transition matrix is described in terms of the LotkaVolterra matrix by
(15) 
for all , where , and as usual. The projection in Figure 6 shows that our method performs reasonably well near the pCTMC regime, where no classical continuous statespace approximation method exists.
A different kind of perturbation is achieved by randomly removing possible transitions of the original pCTMC. This amounts to setting some offdiagonal elements of the matrix to 0, and readjusting the diagonal so that all rows sum to 0. In order to avoid creating absorbing states or isolated states, we remove transitions randomly with a probability of 0.1. Our method performs reasonably under both kinds of perturbations, as seen in Figure 7.
4.3 Embedding a subset of the system
The empirical success of the method on perturbed pCTMC systems encouraged further exploration in cases where there is no global continuous approximation method, but the CTMC graph has regions which resemble a pCTMC structure, or are otherwise suitable for embedding in a continuous space. Consequently, we sought to embed only a subset of the statespace of a CTMC. Embedding statespace subsets can be useful for CTMCs that have a particularly disordered global structure (e.g. require many dimensions, or have areas on the manifold with low density), but which may contain a neighbourhood of the statespace that better admits a natural embedding. Additionally, one could introduce coffin states near the boundary of a pCTMC to apply the method on reachability problems.
A subset includes every reachable state within transitions from a selected root state , denoted as . Transitions from or to states outside the selected subset are ignored, and the remaining matrix is embedded in . The drift vectors on boundary states lack all components of transitions outside the subset, and so the probability flux is inaccurate on the boundary. Figure 8 shows the LotkaVolterra model subset , embedded in . We can see that the behaviour near the root state is close to the projected sample mean evolution, despite the boundary issues.
4.4 First passage times
Another common quest of such approximation techniques is estimating the first passage time (FPT) distribution for a target subset of states of the Markov chain. Literature on this is rich — there has been significant effort in this direction, utilising both established probability evolution methods and constructing new theoretical methods tailored to this problem [20, 21, 22]
. The former is possible since FPT estimation can be formulated as the classical problem of estimating how the probability distribution over the statespace evolves for a modified version of the Markov chain in question.
Specifically, consider a Markov chain with rate matrix for the statespace . Let be a set of target states for which we want to estimate the distribution for the FPT , given some initial state . The FPT cumulative density function (cdf) is equivalent to the probability mass on the set at time , if every state in is made absorbing. In this manner, many methods for approximating probability density evolution over the statespace of a CTMC can also be used to approximate FPT distributions.
The fluid proximity approach
A natural avenue to estimate the FPT when a fluid approximation to the CTMC exists, is to consider how close the fluid solution is to the target set . The classical fluid approximation usually relies on population structured CTMCs, where the target set is often a result of some population ratio threshold (e.g. all states where more than 30% of the total population is of species : ). Since the set is defined in terms of population ratios, it is trivial to map threshold ratios to the continuous concentration space where the pCTMC is embedded, and hence define corresponding concentration regions. The time at which the fluid ODE solution enters that region of concentration space is then an approximation for the FPT cdf. The latter will of course be a step function (from 0 to 1) since the solution is the trajectory of a point mass. Keeping the same threshold ratios for the target set, and scaling the population size
should drive the true FPT cdf towards the fluid approximation. If more moments of the probability distribution are approximated (for instance in moment closure methods) one can derive bounds for the FPT cdf; these can be made tighter as higher order moments are considered, as shown in
[21].In our case, the fluid ODE solution only tracks the first moment of the distribution which implies a point mass approximation. Additionally, we have done away with the population structure requirement, such that thresholds for defining target sets are no longer trivially projected to the continuous space where we embed the chain. The latter challenge is overcome by considering the Voronoi tessellation of the continuous space, where each embedded state serves as the seed for a Voronoi cell. We then say that the fluid solution has entered the target region if it has entered a cell whose seed state belongs in the target set . Equivalently, the solution is in the target region when it is closer (with Euclidean distance as the metric) to any target state than to any nontarget state.
Checking which is the closest state is computationally cheap, and so we can produce FPT estimates at little further cost from the fluid construction. Results for the SIRS model, the LotkaVolterra and perturbed LotkaVolterra models follow.
FPT in the SIRS model
We define a set of barrier states in the SIRS model, , and examine the FPT distribution of the system into the set , with initial state . Note that the trivial scaling laws for this model, owing to the fixed population size, makes it simple to identify corresponding barrier regions in concentration space: . We can therefore compare the fluid solution FPT estimate to the empirical CDF (trajectories drawn by the SSA), as well as to our own fluid construction with an embedding given by diffusion maps and a drift vector field estimated via a Gaussian process. Figure 9 shows that our approach is in good agreement with both the empirical mean FPT and the classical fluid result.
FPT in the LoktaVolterra model
Here we embed the LotkaVolterra model, and define the barrier set of states for which we estimate FPT cdfs, with initial state , for various system sizes .
We show in Figure 10 (left) how our fluid construction estimates an FPT close to the SSA cdf. This is expected when embedding a structured model such as the LotkaVolterra, where two dimensions are adequate to preserve the network topology and the Gaussian process can well approximate the continuous drift vector field. Finally, we show in Figure 10 (right) that a good estimate of the FPT is recovered for the perturbed LotkaVolterra, which is no longer a chemical reaction network.
5 Conclusion
CTMCs retain a central role as models of stochastic behaviour across a number of scientific and engineering disciplines. For pCTMCs, model approximation techniques such as fluid approximations have played a central role in enabling scalable analysis of such methods. These approximations, however, critically rely on structural features of pCTMCs which are not shared by general CTMCs. In this paper, we presented a novel construction based on machine learning which extends fluid approximation techniques to general CTMCs. Our new construction, the geometric fluid approximation, is (with certain hyperparameters) equivalent to classical fluid approximations for a class of pCTMCs; empirically, the geometric fluid approximation provides good quality approximations in a number of nontrivial case studies from epidemiology, ecology and systems biology.
Some potential paths forward become apparent under the lens of this work.Firstly, our method might be optimised to accommodate particular structures of CTMCs, for example by designing specific kernels for the GP regression part. This might be an effective way to incorporate domain knowledge and further improve the quality of the geometric approximation.
Secondly, we can extend this methodology by approximating the diffusion matrix field as well as the drift vector field. This would enable us to define a diffusion process on the manifold and so construct an approximating pdf rather than a point mass. An evolving pdf will be comparable to solutions produced by Van Kampen’s system size expansion, moment closure methods, and the chemical Langevin equation for the case of CTMCs representing chemical reaction networks.
Finally, we have shown how the geometric fluid approximation can be used for estimating first passage times. In general, it would be interesting to extend this component to define methodologies to approximate more complex path properties, such as temporal logic formulae which are often encountered in computer science applications [23, 24].
References
 [1] Kurtz TG. 1971 Limit Theorems for Sequences of Jump Markov Processes Approximating Ordinary Differential Processes. Journal of Applied Probability 8, 344–356.
 [2] Hillston J. 2005 Fluid flow approximation of PEPA models. In Second International Conference on the Quantitative Evaluation of Systems (QEST’05) pp. 33–42.
 [3] Darling R, Norris J. 2008 Differential equation approximations for Markov chains. Probability Surveys 5, 37–79.
 [4] Gardiner CW. 2009 Stochastic methods: a handbook for the natural and social sciences. Springer series in synergetics. Berlin: Springer 4th ed edition.
 [5] Norris JR. 1998 Markov Chains. Cambridge University Press.
 [6] Gillespie DT. 1977 Exact stochastic simulation of coupled chemical reactions. The Journal of Physical Chemistry 81, 2340–2361.
 [7] Kampen NGv. 1961 A POWER SERIES EXPANSION OF THE MASTER EQUATION. Canadian Journal of Physics 39, 551–567.
 [8] Schnoerr D, Sanguinetti G, Grima R. 2017 Approximation and inference methods for stochastic biochemical kinetics—a tutorial review. Journal of Physics A: Mathematical and Theoretical 50, 093001.
 [9] Darling RWR. 2002 Fluid Limits of Pure Jump Markov Processes: a Practical Guide. arXiv:math/0210109. arXiv: math/0210109.
 [10] Coifman RR, Lafon S, Lee AB, Maggioni M, Nadler B, Warner F, Zucker SW. 2005 Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. Proceedings of the National Academy of Sciences 102, 7426–7431.
 [11] Coifman RR, Lafon S. 2006 Diffusion maps. Applied and Computational Harmonic Analysis 21, 5–30.

[12]
Nadler B, Lafon S, Kevrekidis I, Coifman RR. 2006a Diffusion Maps, Spectral Clustering and Eigenfunctions of FokkerPlanck Operators. In Weiss Y, Schölkopf B, Platt JC, editors,
Advances in Neural Information Processing Systems 18 pp. 955–962. MIT Press.  [13] Nadler B, Lafon S, Coifman RR, Kevrekidis IG. 2006b Diffusion maps, spectral clustering and reaction coordinates of dynamical systems. Applied and Computational Harmonic Analysis 21, 113–127.
 [14] Coifman RR, Kevrekidis IG, Lafon S, Maggioni M, Nadler B. 2008 Diffusion Maps, Reduction Coordinates, and Low Dimensional Representation of Stochastic Systems. Multiscale Modeling & Simulation 7, 842–864.
 [15] Perraultjoncas DC, Meila M. 2011 Directed Graph Embedding: an Algorithm based on Continuous Limits of Laplaciantype Operators. In ShaweTaylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, editors, Advances in Neural Information Processing Systems 24 pp. 990–998. Curran Associates, Inc.
 [16] Rasmussen CE, Williams CKI. 2006 Gaussian processes for machine learning. Adaptive computation and machine learning. Cambridge, Mass: MIT Press. OCLC: ocm61285753.
 [17] Kłopotek MA. 2017 Spectral Analysis of Laplacian of a Multidimensional Grid Graph. arXiv:1707.05210 [math]. arXiv: 1707.05210.
 [18] Vu TN, Wills QF, Kalari KR, Niu N, Wang L, Rantalainen M, Pawitan Y. 2016 BetaPoisson model for singlecell RNAseq data analyses. Bioinformatics 32, 2128–2135.
 [19] Larsson AJM, Johnsson P, HagemannJensen M, Hartmanis L, Faridani OR, Reinius B, Segerstolpe a, Rivera CM, Ren B, Sandberg R. 2019 Genomic encoding of transcriptional burst kinetics. Nature 565, 251–254.
 [20] Darling DA, Siegert AJF. 1953 The First Passage Problem for a Continuous Markov Process. The Annals of Mathematical Statistics 24, 624–639.
 [21] Hayden RA, Stefanek A, Bradley JT. 2012 Fluid computation of passagetime distributions in large Markov models. Theoretical Computer Science 413, 106–141.
 [22] Schnoerr D, Cseke B, Grima R, Sanguinetti G. 2017 Efficient LowOrder Approximation of FirstPassage Time Distributions. Physical Review Letters 119, 210601.
 [23] Milios D, Sanguinetti G, Schnoerr D. 2018 Probabilistic Model Checking for ContinuousTime Markov Chains via Sequential Bayesian Inference. In McIver A, Horvath A, editors, Quantitative Evaluation of Systems vol. 11024 pp. 289–305. Cham: Springer International Publishing.
 [24] Bortolussi L, Milios D, Sanguinetti G. 2016 Smoothed model checking for uncertain ContinuousTime Markov Chains. Information and Computation 247, 235–253.
 [25] Belkin M, Niyogi P. 2003 Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. Neural Computation 15, 1373–1396.
Appendix A Diffusion maps for Markov chains
There exists extensive literature examining the implications of diffusion maps, as well as their limitations and strengths [10, 11, 12, 13, 14]. What follows is therefore not an attempt to rederive these results or convince the reader of the validity of the method, but rather to set notation and highlight the aspects that are relevant to our purposes. The exposition below is also necessary to act as a foundation for the results of PerraultJoncas and Meilă that build upon the original concept of diffusion maps as put forth by Coifman, Lafon, Nadler, and Kevrekidis.
a.1 Undirected graphs
In [13, 14], the authors consider a family of densitynormalised (i.e. anisotropic) symmetric kernels
characterising the distance between highdimensional points
. The kernel used here is the radial basis function
, which provides a similarity between points based on the Euclidean distance in the original space. The densitynormalising factor depends on the manifold density, , and the choice of the power leads to transition kernels of different diffusion process operators (see below). For a finite set of points we can construct an adjacency matrix whose elements are given by the kernel, for a network with points as nodes and weighted undirected edges.Assuming that the points were sampled by observing a diffusion process in the space , the authors then take the forward Markov transition probability kernel to be
where is the graph Laplacian normalisation factor. Since this is the transition probability for the putative continuous diffusion process evolving in the space , the (forward) infinitesimal diffusion operator of the process is given by
where is the identity operator, and is a (forward) transport operator defined as , which evolves a function according to and the manifold measure .
By asymptotic expansion of the relevant integrals, they show that the forward and backward operator pair is
and  (16)  
(17) 
respectively.
We then regard the adjacency matrix of a given network to be a discrete approximation of the transition kernel defined over continuous space. From that, we can construct discrete (in time and space) approximations to the diffusion operators above by performing the necessary normalisations. To retrieve the embedding coordinates for each network vertex one needs to spectrally analyse the approximation to the diffusion operator, taking the 1 to eigenvectors
ordered by the associated eigenvalues
with , to be the vertices’ coordinates in the first dimensions of the embedding. The first eigenvector is discarded as a trivial dimension where every vertex has the same coordinate by construction. Thus, the dimensional diffusion map at time is defined as:where we have discarded associated with as a trivial dimension. The time parameter refers to the diffusion distance after time which is preserved as Euclidean distance in the embedding space. Trivially, as all network nodes are mapped to the same point since the diffusion distance vanishes.
The parameter adjusts the effect that the manifold density has on the diffusion process. Choosing recovers the LaplaceBeltrami operator as the backward diffusion operator, if the points approximately lie on a manifold . Thus, the diffusion map corresponds to an embedding of the points unaffected by the manifold density (such that if two different networks were sampled from the same manifold but with different densities, we would recover consistent positions of the points on ). Choosing is equivalent to the Laplacian eigenmaps method which preceded diffusion maps [25]. If the vertices are sampled uniformly from the hidden manifold, Laplacian eigenmap becomes equivalent to analysing the LaplaceBeltrami operator, and so constructing a diffusion map with and with will recover the same embedding [11].
Consider now an Itô stochastic differential equation (SDE) of the form
(18) 
where is the dimensional Brownian motion. A probability distribution over the statespace of this system with condition , evolves forward in time according to the FokkerPlanck (FP) equation, also known as the Kolmogorov forward equation (KFE):
(19) 
with the sums running over all dimensions and denoting partial derivatives with respect to the th dimension () [4]. Similarly, the probability distribution for and condition satisfies
(20) 
where the differentiations are with respect to . Terms in the backward FPE become directly identifiable with the backward operator if we take and .
The original formulation of diffusion maps, as described above, assumes a symmetric kernel . Given a CTMC with a symmetric generator matrix , the methodology laid out so far would be sufficient to recover an embedding for the states on a continuous compact manifold , on which we can define an SDE approximation to the Markov jump process of the CTMC. Encouragingly, it has also been shown that the jump process would satisfy the reflecting (no flux) conditions on the manifold boundary , as required by a diffusion FP operator defined on such a manifold — i.e. for a point where is a normal unit vector at , and a function ,
a.2 Embedding unweighted, undirected, grid graphs
Taking the case of a pCTMC produced by a particular class of chemical reaction networks, we show that the embedding produced by Laplacian eigenmaps [25] (equivalent to diffusion maps with ) is consistent in some respect to the canonical (manual) embedding for the fluid limit of chemical reaction systems. To further simplify our analysis, we will consider an unweighted, undirected version of the matrix of the CTMC as the adjacency matrix of the network to be embedded. This implies that we ignore any density information of the vertices (states) on the manifold, and any directional component. We will later return to how this information affects our results.
Laplacian eigenmaps embedding
Assume that we have symmetric similarity matrix between points. We construct the Laplacian matrix , with . The Laplacian eigenmaps algorithm solves the minimisation problem
(21)  
(22) 
where the constraint serves to exclude the trivial solution of mapping everything to the same point. The solution is a matrix with each column vector corresponding to the dimensional coordinate embedding of each datum. It is shown that the solution to the problem is the eigenvector matrix corresponding to the lowest eigenvalues of , excluding the solution.
This emphasis on preserving local information allows us to appropriate the algorithm for embedding the network of states without having to calculate global state separation. For a CTMC described by a transition matrix , we transform to be an adjacency matrix between the nodes (states) of the network (CTMC) by placing an undirected edge of weight 1 between states which are separated by a single transition and 0 otherwise:
(23) 
If the network is connected and (the dimensionality for the embedding space) is picked appropriately, the algorithm will attempt to preserve local dimensions and therefore global ones if the network fits in that space. If is chosen higher than necessary, some states which are far apart might be placed closer together in the embedding, but local distances will still be preserved.
We examine a particular case of pCTMCs, produced by allowing reactions that only change the count of a single species per reaction. This produces an adjacency matrix for the network of states describing a grid network in dimensions. Following the derivation for the eigenvectors of the Laplacian of such a network presented in [17], we find that the lowest eigenvalue (excluding ) is degenerate (), and associated with eigenvectors . Their elements are
(24) 
where the index is the mapping of the node to its integer grid coordinates. Therefore, the embedded coordinate of a node in dimension is , where is the integer grid position of the node in that dimension. We observe that away from the boundaries (i.e. near the centre of the grid ) and for large , the argument of is near , so we approach the linear part of
. This means that near the centre states are almost uniformly distributed, as in the canonical embedding.
Further, we define the repeating unit volume in the network to be the space (in the embedding) enclosed by all the neighbour states of any state.
(25)  
(26) 
We then observe that . This is desirable for the fluid approximation since the density of states increases everywhere as states increase, and jump sizes vanish, approximating a continuous process.
To generalise this to nongrid cases, suppose that:

the CTMC network in question is generated by the tessellation of a Euclidean dimensional space into polygons, and

we pick the dimensionality of the embedding space for the Laplacian eigenmaps to be ,
then the Laplacian eigenmaps will preserve the ordering of the states in every dimension of the space. The embedding space will still be tessellated by the network by the same gons, but with differing edge lengths. If the number of tessellating units generating the network is increased, the Laplacian eigenmaps embedding will similarly preserve order but shrink the tessellating unit volumes. The density of states in the embedding space will therefore increase as the number of states in the network increases.
Generalising to weighted, undirected graphs
If we allow the graph a weighted but symmetric adjacency matrix , Laplacian eigenmaps (diffusion maps with ) will be unable to separate the underlying manifold on which the graph lies from the density on the manifold according to which the graph vertices are sampled . In the case of the two species pCTMC considered above, the hightransition rates near high population counts will cause a shrinking distortion on the manifold recovered, as the algorithm tries to preserve the closer distances with a uniform sampling measure (Figure 11, left).
To recover the inherent Riemannian geometry of , we choose . This takes into account the effects that a nonuniform sampling for the discrete approximation of might have, recovering the geometry of . In the diffusion map language above, choosing corresponds to spectrally analysing a discrete approximation to the LaplaceBeltrami operator , separating it from the density dependent term in the diffusion operator . Returning to the two species pCTMC, it retains the regular hypercubic structure of , but distributes states on it according to the sampling density. The sampling density also corresponds to the steady state distribution of the system, so states will be placed closer together near the steady state (Figure 11, right).
a.3 Directed graphs
Our focus necessarily shifts on embedding an arbitrary CTMC with no symmetry condition on . Following PerraultJoncas and Meilă [15] assume that we observe a graph , with nodes sampled a diffusion process on a manifold with density and edge weights given by the (nonsymmetric) kernel . The directional component of the kernel is further assumed to be derived from a vector field on without loss of kernel generality. As the authors saliently put it: “The question is then as follows: can the generative process’ geometry , distribution , and directionality , be recovered from ?”
In the same manner as for the original formulation of diffusion maps a set of backward evolution operators are derived, the two relevant ones being:
and  (27)  
(28) 
To construct this family of operators, the kernel is first decomposed into its symmetric and antisymmetric parts,
and further normalised according to either the asymmetric , or symmetric outdegree distribution . The subscript indices denote the type of kernel used to construct the operator and the outdegree distribution used to normalise it (such that associates to the full asymmetric kernel normalised with asymmetric degree distribution , and so on).
Discrete approximations for these operators can be constructed for an asymmetric kernel matrix of distances between highdimensional points, . The symmetric matrix can be extracted and the necessary eigendecomposition carried out to yield an embedding, where . However, given the infinitesimal generator of a CTMC , we do not have access to , but rather to the discrete approximation of the final evolution operator, . In order to recover the initial kernel matrix that gave rise to , we take , a uniform measure on the manifold , and a small value for . This makes the transformations from to reversible, since
(29)  
(30)  
(31) 
In the above, is a diagonal matrix which forces the diagonal of to be 1, as expected from a distancebased kernel matrix. The final step is the familiar uniformisation procedure which approximates a CTMC with a DTMC. The choice of
Comments
There are no comments yet.