In this paper, we consider stochastic processes arising from random perturbed deterministic dynamical systems. The dynamics of such a stochastic process, say , is a combination of the random perturbation and the underlying deterministic dynamics. The ergodicity of , and the rate of ergodicity, i.e., the speed of convergence of the law of to its invariant probability measure, is a significant quantity given by the spectral gap of the infinitesimal generator of . From an applied point of view, knowing the speed of convergence is very useful to sampling, uncertainty quantification, and sensitivity analysis [12, 24, 37].
However, the ergodicity of a stochastic process is difficult to study in an quantitative way. The spectral method, which tries to estimate the spectral gap directly, only works for a limited class of problems such as over-damped Langevin dynamics[2, 30]. The probabilistic approach, on the other hand, although being “softer” and more applicable, usually does not give a precise bound in most of existing results. For instance, by constructing a Lyapunov function and showing the minorization condition of a certain “small set”, one can easily deduce the geometric ergodicity [18, 19, 36]. Nevertheless, the rate of geometric ergodicity obtained in this way is far from being optimal. In most cases, we only know that the speed of convergence to steady state is approximately being for some , but is usually too close to to be useful in practice.
The computational study of the ergodicity, on the other hand, is far from being mature. While one can compute eigenvalues of the discretized infinitesimal generator for low-dimensional problems (1D or 2D) just as discussed in, this approach does not work well if lives in a higher dimensional space. One can also obtain the convergence rate by computing the correlation decay of a test function using the Monte Carlo simulation. However, as discussed in 
, the correlation (or auto-correlation) has small expectation and large variance, which results in an unrealistic requirement of large amount of samples in real simulations. In addition, the selection of test functions is very subjective.
The main goal of this paper is to propose a coupling approach, a powerful tool that has been used in many rigorous and computational studies [5, 16, 23, 34, 35], to numerically compute the geometric ergodicity. Traditionally, coupling method is mainly used in the theoretical study of stochastic dynamics. This is partially because for stochastic differential equations, a numerically simulated trajectory only approximate the real trajectory at discrete times with certain accuracy. As a result, on a continuous state space, two numerical trajectories can easily “miss” each other even if the actual trajectory have been coupled together. We solve this problem by using the maximal coupling whenever two trajectories are sufficiently close to each other, and develop a numerical coupling scheme. By applying to various examples, we show that our numerical coupling algorithm works well for iterative mappings with random perturbations, stochastic differential equations with non-degenerate diffusion, as well as high-dimensional oscillators. Also, it can be adapted for certain systems with degenerate diffusion with some extra computational cost.
A secondary goal of our study is to reveal the dependence of geometric ergodicity of the perturbed stochastic systems on the underlying deterministic dynamics. By applying on random perturbed circle maps with distinct chaotic behaviors, our result shows that the rate of geometric ergodicity, or heuristically the spectral property, can reveal, in some sense, the mixing property of the underlying dynamics. For example, when the magnitude of noise decreases, the rate of geometric ergodicity drops “quickly” when the underlying dynamics is ergodic but not mixing (quasi-periodic orbits), while the rate drops “dramatically” when the underlying dynamics admits a stable periodic orbit (See Section 4 for more details). Our simulation also shows that larger time scale separation between slow and fast deterministic dynamics can increase, when random noise is added, the rate of geometric convergence to the invariant probability distribution of the stochastic dynamics. This is consistent with the heuristic argument.
The paper is organized as follows. Section 2 gives necessary probability and dynamical system preliminaries for our study. Our numerical algorithms are described in Section 3. Section 4 studies the connection of geometric ergodicity with the chaotic properties of the deterministic dynamics by several examples of iterative mappings with distinct mixing properties on circle. Examples of stochastic differential equations with various deterministic or random structures are studied in Section 5. We make conclusion and further discussions in Section 6.
2. Probability and dynamical systems preliminary
2.1. Markov process and geometric ergodicity
Let be a state space, which can be , or a subset of with -field Consider a Markov process on where can be , , or for with transition probabilities i.e., for any is a measurable function for each fixed , and is a probability measure for each fixed such that
For a reference measure on is said to be -irreducible if for any , implies that for some .
For any probability measure on the evolution of is denoted as
A measure is called invariant if holds for any A Markov process is said to be ergodic if it admits a unique invariant probability measure such that for any Throughout this paper, we assume that the Markov process is ergodic with an invariant probability measure It is not hard to see that is -irreducible.
The emphasis of this paper is the geometric ergodicity. An ergodic Markov process is said to be geometrically ergodic with rate if for -a.e.
where is the total variation distance, i.e., We say that is geometrically contracting with rate if for -almost every initial pairs it holds that
Note that by the triangular inequality, geometric ergodicity implies the geometric contraction.
2.2. Coupling of Markov processes
Let and be two probability measures on . A coupling between and is a probability measure on whose first and second marginals are respective and
For random variablesand taking values in with respective distribution and , we have the following well-known inequality (see, e.g., Lemma 3.6. in )
In the present paper, we consider the coupling of Markov processes. Given Markov processes and on sharing the same transition probabilities A coupling of and is a stochastic process on such that
The first and second marginal processes and are respective copies of and ;
If be such that , then for all .
The first meeting time of and denoted as is called the coupling time. The coupling is said to be successful if the coupling time is almost surely finite, i.e.,
Let be a coupling of a Markov process (admitting transition probabilities ) with initial distribution Then
By the definition of coupling time, implies that By noting that (resp. ) is the distribution of (resp. ), it follows from (2.1) that
We call (2.2) the coupling inequality. A coupling is said to be an optimal coupling if the equality in (2.2) is achieved. Besides, a coupling is said to be a Markov coupling if is a Markov process on A Markov coupling is further called irreducible if it is -irreducible, where in the unique invariant probability measure of and
In the present paper, we estimate the rate of geometric ergodicity from below via (2.2
). However, in practice, we cannot compute the coupling time for all initial values. Therefore, some theoretical arguments are necessary to extend the result from one initial value to almost all initial values. The following lemma plays a such role that enable us to extend the finiteness of moment generating function of the coupling time from one pair of initial values to-almost every pairs of initial values.
Assume is an irreducible Markov coupling. If there exists a pair of initial value and a constant such that
then (2.3) holds for -a.e. pair of initial values.
Suppose that the proposition does not hold, then there exists a measurable set with such that
holds for any pair
One problem with Lemma 2.2 is that many efficient couplings we shall use, such as the reflection coupling introduced in Section 3, are not irreducible. On the other hand, although the independent coupling (i.e., the two marginal processes are updated independently all the time) brings about the irreducibility, it is usually not efficient for the coupling process. In fact, most stochastic processes in (e.g., a strong-Feller process), including all the numerical examples in this paper, are non-atomic, which means that any two independent trajectories of , say and , satisfy (Without loss of generality, we assume that ) So independent coupling of a non-atomic Markov process has zero probability of being coupled successfully in finite time.
To overcome this difficulty, we introduce the coupling with independent components. Still, without loss of generality, we assume . A coupling with independent components means that at each step before coupled, there is a positive probability (can be very small) that the two marginal processes are updated in an independent way. The following lemma shows that a coupling with independent components of a non-atomic Markov process is irreducible. In this way, we can use a mixture of independent coupling and other more efficient couplings to achieve both the irreducibility and the coupling efficiency.
Assume is a coupling with independent components of a non-atomic Markov process. Then is -irreducible.
It is sufficient to show that for any product set with positive measure, there exists some such that .
By the ergodicity, since has positive -measure, there exists such that for all . Similarly, there exists such that for all . Let . Because there is a positive probability that independent updates be chosen for , and the Markov process is non-atomic, we have .
Assume is a coupling with independent components of a non-atomic Markov process. If there exists an initial value and a constant such that
then (2.5) holds for -a.e. initial values .
Assume there exists a measurable set with such that
holds for any .
Let be an arbitrary initial value. By the irreducibility of , there exists a time such that the probability of is strictly positive. Denote the restricted measure of on by . Now let and . Because is a coupling with independent components, the probability that and remain being independent for is strictly positive. Since the Markov process is non-atomic, so with probability independent updates will not make and couple. Hence, there exists a positive number such that
for any measurable set .
It follows from Lemmata 2.2, 2.3, and 2.4 that for any coupling with independent components, the finiteness of can be generalized from one initial value to almost every initial values. In addition, by Markov inequality, the geometric ergodicity (and contraction) follows from the finiteness of . However, the moment generating function is difficult to compute in practice, especially when is close to the critical value. Hence, we use the exponential tail instead. This is justified by the following Lemma.
For any two initial distributions and , if
then for any it holds that
In fact, from equation (2.6) we have
Thus, the tail of the integral is upper bounded by
which goes to zero as goes to infinity.
Hence, must be finite.
Combine the above lemmata together, we have the following proposition.
Let be a coupling with independent components of a non-atomic Markov process.
(i) If has an exponential decay rate for an initial pair , then for any for -a.e. initial pairs (or ) is geometrically contracting with rate
(ii) If admits an exponential decay rate for an initial value , then (or ) is geometrically ergodic with rate .
2.3. An upper bound of geometric rate
The coupling inequality (2.2) usually only gives a lower bound of the geometric convergence/contraction rate. We argue that the upper bound of the geometric ergodicity can be estimated by using first passage times because of the existence of the optimal coupling.
For the sake of simplicity, we consider discrete-time Markov processes. Recall that for an optimal coupling, the equality in (2.2) holds. It has been shown that for any two mutually singular probabilities and on an optimal coupling with initial distribution exists and was explicitly constructed in [17, 38].
Let and be two discrete-time Markov processes on sharing the same transition probabilities with initial distributions and respectively, where Assume that there exists a sequence of disjoint pairs of sets such that , Let
Then if denote
the geometric rate of contraction is at most .
Let be the optimal coupling of the Markov process with transition probabilities with initial distribution Then and are respective copies of and such that
Note at the coupling time we have This means that before time , either has exited from or has exited from . Note for any (resp. ) has the same distribution as (resp. ). This completes the proof. ∎
In Section 4.4, for a random perturbed circle map with a stable 2-periodic orbit, we give both an upper and a lower bound of the geometrically ergodic rate through the first exit time and the coupling time, receptively.
2.4. Deterministic dynamics and random perturbations
By a discrete- or continuous-time deterministic dynamical system, we mean an iterative mapping
or an ordinary differential equation (ODE)
is a vector field onwhich is locally Lipschitz continuous.
In this paper, we mainly focus on the Markov processes arising from random perturbations of deterministic dynamical systems. To be specific, we shall consider
(i) the random perturbation of a discrete-time dynamics given by
where are independent random variables taking values in which will be defined specifically in each particular situation;
(ii) the random perturbation of a continuous-time dynamics (2.7) given by a stochastic differential equation (SDE) on
where is a matrix-valued function, and is a Wiener process on . Also, and are assumed to be smooth enough to give a well-defined solution for all .
Below we introduce basic notions of chaotic behaviors for deterministic dynamical systems. In section 4, we shall see that although geometric ergodicity usually holds when random perturbations are added, the rate can vary, as the random perturbation vanishes, in different ways if the unperturbed dynamics admits essentially distinct chaotic properties.
Periodicity, quasi-periodicity and ergodicity. For a deterministic dynamical system on the state space is called a periodic point if there exists such that for any where denotes the solution of the system with initial value The minimum of such is called the period of . In particular, is called a fixed point if
When the state space a quasi-periodic solution for a deterministic dynamics on is usually involved with several irrational independent frequencies. In particular, a map on is said to be quasi-periodic if its derivative is a constant irrational number. Quasi-periodicity characterizes the kind of recurrent behavior that any positive-measured set can be visited infinitely often by a quasi-periodic orbit. This is in fact a special case of the Birkhoff ergodic theorem, which says that any positive-measured set can be visited repeatedly by orbits starting from almost every initial point, with average visiting times proportional to the set measure.
Hyperbolicity, uniform expanding and exponential (resp. polynomial) mixing. Both periodicity and quasi-periodicity only tell us the average recurrence behavior of the system. To develop more chaotic properties, we usually require some hyperbolic property of the system. Roughly speaking, a smooth deterministic system is said to be uniform (resp. non-uniform) hyperbolic if its tangent map admits both expanding and contracting directions with expanding and contracting rate being uniform (resp. non-uniform). By uniform (resp. non-uniform) expanding we mean that the tangent map only admits expanding directions with uniform (non-uniform) expanding rate. Uniform and non-uniform hyperbolicity or expanding dynamics always come with mixing behaviors in terms of decay of correlations of measure evolutions.
For deterministic dynamical systems, the decay of correlations looks at the system in a statistical way that how quickly the evolution of a certain observable converges to its expectation with respect to the invariant measure (if exists). A deterministic dynamics is said to be exponential (resp. polynomial) mixing if it has an exponential (resp. polynomial) rate of decay of correlations. It has been shown that uniform hyperbolic or expanding systems (e.g., the uniform expanding maps on ) is exponential mixing, while for systems with non-uniform hyperbolicity, the exponential mixing property may be lost. A well-known example illustrating this is the almost expanding map with only one neutral fixed point. This kind of system is known to be at most polynomial mixing (see ).
2.5. Numerical scheme of SDEs
In the real simulations, SDE (2.9) is numerically computed at discrete times. We usually choose a time step size and consider discrete-time trajectories . To avoid confusion and make notations consistent, we let be the true trajectory of SDE (2.9), and be the trajectory of the numerical integrator. In addition, we denote as the time- sample chain of such that , and as the time- sample chain of with .
The most commonly used numerical schemes of stochastic differential equations is the Euler-Maruyama scheme
where are standard normal random variables independent for each . Note that the time- sample chain fits the setting of discrete-time random perturbed dynamics (2.8):
Euler-Maruyama method can be improved to the Milstein method. The D Milstein method reads
In particular, in any dimensions, Euler-Maruyama method coincide with Milstein method if is a constant matrix.
Now we recall the strong and weak approximations defined in . Let be a given finite time. If
holds for each sufficiently small , then we say that converges strongly to with order . Let denote the space of times continuously differentiable functions with polynomial growth rate for the function itself and all partial derivatives up to order . If for any test function and any given finite time , we have
then we say that converges to weakly with order .
It is well known that under suitable regularity conditions, Euler-Maruyama scheme has strong order of convergence and weak order of convergence , and Milstein scheme has strong order of convergence .
3. Description of algorithm
By numerically studying the coupling times, the main idea of this paper is to use the exponential tail of coupling time to estimate the geometric ergodicity of a stochastic process. Assume that numerically we obtain, for a pair of initial value that
then it follows from Proposition 2.6 that
holds for almost every pair of initial values. The numerical verification of geometric ergodicity is similar, except is replaced by samples from the invariant probability measure.
Since this paper mainly studies coupling times in a numerical way, we consider, for the sake of definiteness, the time-discrete Markov process as it fits the case of both random perturbations of an iterative mapping and the time- sample chain of a numerical trajectory of an SDE.
3.1. Coupling methods
Consider a Markov coupling . In the theoretical proof, and are coupled when which is usually done by making both trajectories enter a “small set” which satisfies the minorization condition . However, in the numerical estimation of coupling times, these couplings are not the most efficient ones. Instead, we will use a mixture of the following coupling methods to achieve numerical coupling efficiently.
Independent Coupling. Independent coupling means when running the coupling process , the noise terms in the two marginal processes and are independent until they are coupled. In other words, we have
where and are independent random variables for each In the theoretical studies, independent coupling is frequently used together with renewal theory to prove different rates of convergence to the invariant probability measure. In this paper, the independent coupling is used to make the coupling process have some “independent components” so that Lemmata 2.3 and 2.4 are applicable.
Synchronous Coupling. Another commonly used way to couple two processes is the synchronous coupling. Contrary to the independent coupling for which randomness in the two stochastic trajectories are totally unrelated, in the synchronous coupling, we put the same randomness to both processes until they are coupled. Thus, we have
where for The advantage of synchronous coupling is that if the deterministic part of the system already has enough stability so that it admits a random attractor, then will approach to quickly when driven by the same noise term.
Reflection Coupling. When the state space has dimension bigger than , two Wiener processes will meet less often than the 1D case. This makes independent coupling less effective. Then the reflection coupling plays a role.
Take the Euler-Maruyama scheme of the SDE
as an example, where is an invertible constant matrix. Recall that the Euler-Maruyama scheme of reads as
where is a normal random variable with mean zero and covariance matrix . The reflection coupling means that we run the time- chain as
while run as
where is a projection matrix with
In other words, the noise term is reflected against the hyperplane that orthogonally passes the midpoint of the line segment connectingand .
It has been proved that reflection coupling is optimal for Brownian motions [21, 35]. It also works well for many SDEs [7, 8, 14, 15, 35], including Langevin dynamics with degenerate noise [5, 16]. The reflection coupling introduced above is still applicable for some non-constant under suitable assumptions . However, for general non-constant , the “true reflection” is given by the Kendall-Cranston coupling with respect to the Riemannian matrix [10, 20, 26], which is more difficult to implement numerically.
Maximal Coupling. When running numerical simulations, the above three couplings can only bring close to . We still need a mechanism to make with certain positive probability. The maximal coupling aims to achieve this. It is derived to couple two trajectories as much as possible at the next step, which is in fact modified from the now well-known Doeblin coupling . We adopt the name “maximal coupling” from .
Assume that at certain time , takes the value . Denote the probability measures associated with and by and , respectively. Let be the “minimum probability measure” of and such that
where is a normalizer that makes a probability measure. Then the next step is sampled as
with probability ,
with probability ,
with probability ,
with probability ,
In other words, and are coupled if and only if the two samples fall into a“common future” simultaneously. We remark that the classical version of Doeblin coupling requires the two trajectories enter a certain predefined “small set” simultaneously. Then a construction called the Nummelin split guarantees them to be coupled with certain positive probability. When running numerical simulations, such a construction becomes unnecessary. We can couple them whenever the probability distributions of their next step have enough overlap.
3.2. Numerical Algorithm
We propose the following two numerical algorithms to estimate the exponential tail of the coupling time for the rate of geometric contraction and and geometric ergodicity, respectively. The input of Algorithm 1 is a pair of initial points , and the output is a lower bound of the geometric contraction rate of . Algorithm 2 takes input of one point and produces a lower bound of the convergence rate of . In Algorithm 2, we need to sample from the invariant probability measure. This is done by choosing the initial value of from a long trajectory of , such that is approximately sampled from the invariant distribution .
Since the geometric ergodicity implies the geometric contraction, in practice, it is sufficient to only run Algorithm 2 to detect the rate of geometric convergence/contraction.
It remains to discuss the implementation of the maximal coupling. If the probability density function of bothand can be explicitly given, denoted by and respectively, one can perform coupling by comparing these two probability density functions. The algorithm is described in Algorithm 3. It is similar to the implementation described in [23, 25]
As discussed in Section 2.2, the reflection or synchronous coupling does not give an irreducible process in general, and we use a mixture of independent coupling and reflection (or synchronous) coupling so that the coupling has “independent components”. To achieve this, at each step, we generate an i.i.d. Bernoulli random variable with , which is independent of everything else. The independent coupling is chosen whenever and we use reflection (or synchronous) coupling for otherwise. It then follows from Lemmata 2.3 and 2.4 that the exponential tail of the coupling time can be generalized to almost every initial values.
In practice, for all examples we have tested and all couplings we have used, exponential tails starting from different initial values have the same rate. We believe that the requirement of independent components is only a technical limitation. Lemmata 2.3 and Lemma 2.4 should hold a very general class of irreducible Markov processes and couplings.
4. Geometric ergodicity of time-discrete stochastic dynamics
It has long been observed that for qualitatively different deterministic dynamical systems, their small random perturbations also have qualitatively different asymptotic dynamics . In this section, we perform numerical examples of random perturbations of four deterministic dynamics on with qualitatively different chaotic behavior: (1) an expanding map with exponential mixing rate; (2) a circle map admitting a neutral fixed point and a polynomial mixing rate; (3) a circle map admitting quasi-periodic orbits on ; (4) a circle map admitting a stable periodic orbit. We note that the chaotic behaviors of dynamics in (1) – (4) becomes weaker. For random perturbations of the above four deterministic dynamics, the rate of geometric convergence are compared under different magnitudes of noise. Qualitative different changes of the geometric convergence rates are observed among the above four examples as the noise vanishes. In general, when the noise is sufficiently small, as the underlying deterministic dynamic becomes more chaotic, the geometric convergence rate decreases in a slower way.
4.1. Smooth expanding circle maps
Consider a deterministic dynamics given by the iterative mapping :
It is easy to see that for , is uniformly expanding. It is well known that the dynamical system admits an exponential mixing rate and an invariant probability measure with smooth density.
We consider a random perturbation of through a Markov process given by
where are i.i.d. standard normal random variables, and is the noise magnitude.
In our simulations, we run Algorithm 2 with samples and collect coupling times. The slope of exponential of the coupling time gives the rate of geometric ergodicity. Noise magnitudes are chosen to be . The vs. plots are demonstrated in log-linear plot in Figure 1. We can see that the coupling time has an exponential tail. The slope of the exponential tail vs. is also plotted in Figure 1 (lower right corner). The slope of exponential tail drops linearly with the strength of noise. This is expected because
has a standard deviation. Hence, the distance between two trajectories need to be in order to couple. If we assume that is well-mixed, heuristically two trajectories should take time to be close to each other.
4.2. Circle map with neutral fixed point
The second example is a circle map with a neutral fixed point. Consider
where is a parameter. It is well known that the deterministic dynamical system has a neutral fixed point at . As a result, the dynamical system has power-law mixing rate .
Now we consider , the small random perturbation of , which is given by
where are i.i.d. standard normal random variables.
Still, we run Algorithm 2 with samples and collect coupling times to compute the rate of geometric ergodicity. Noise magnitudes are the same as before. The vs. plots are demonstrate in log-linear plot in Figure 2. The coupling time still has a good exponential tail. The slope of the exponential tail vs. is also plotted in Figure 2 (lower right corner). In spite of slower mixing rate, the slope of exponential tail drops linearly with the strength of noise, which is same as the exponential mixing example. This is because the slow mixing of is caused by a longer return time from very small neighborhood of the neutral fixed point. Very small noise is sufficient to “shake” trajectories away from the neutral fixed point and maintain a suitable mixing rate. Hence, the effect of slower-mixing rate can not be observed unless the noise term becomes extremely small. We refer [3, 4] for some recent theoretical results for similar maps with very small random perturbation.
4.3. Quasi-periodic map
The next example is the quasi-periodic map
It is easy to see that the orbit of is quasi-periodic, which is ergodic but not mixing. Now we consider the Markov process that is given by
where are i.i.d. standard normal random variables.
The rate of geometric ergodicity is computed by running Algorithm 2 with samples and collect coupling times. Noise magnitudes are same as before. The vs. plots are demonstrate in log-linear plot in Figure 3. Tails of the coupling time are still exponentials. The slope of the exponential tail vs. is also plotted in Figure 3 (lower right corner). Different from the previous two examples, in this example, the slope of exponential tail drops super-linearly with the strength of noise. We find that a cubic polynomial function can fit this slope vs. curve fairly well, as shown in Figure 3 lower right corner. The heuristic reason for the slope is the following. Without mixing, the only force that brings two trajectories together is the diffusion, which takes time to move distance. Hence one expect two trajectories to be “well mixed” after time. In addition they need to be close to each other to couple. This brings the requiring total time needed to .
4.4. Stable periodic orbit
The last example is the Logistic map. Consider
The deterministic dynamical system admits a periodic orbit with period that attracts all initial values on . The two values of this periodic orbit, and are approximately and respectively.
Now we consider the small random perturbation of , which is a Markov process given by
where are i.i.d. standard normal random variable.
We compute the rate of geometric ergodicity of with different noise magnitudes by running Algorithm 2 with trajectories. The parameter is chosen to be , and , as coupling is extremely slow in this example if the noise is small. Slopes of exponential tails of the coupling times are demonstrated in Figure