. Key features of these complex systems are multiscale dynamics, high-dimensional phase space, nonlinear energy transfers, highly non-Gaussian probability density functions (PDFs), intermittent instability, random internal and external forcing as well as extreme events. Understanding and predicting these complex turbulent dynamical systems have both scientific and practical significance. Many excitable media models belong to such types of nonlinear dynamical stochastic systems[33, 56, 45, 23]. These excitable media models are ubiquitous in nature and can be found in physical, chemical, and biological systems that are far from thermodynamic equilibrium. One important feature of these excitable medium is that they are susceptible to finite perturbations, where a variety of wave patterns can be triggered. Examples of excitable medium include solitary waves, target like patterns, and spiral waves.
For nonlinear dynamical stochastic models, the high dimensionality imposes great challenges to the understanding of the underlying mechanisms, the design of effective prediction schemes and various uncertainty quantification problems [4, 47]. In particular, the high dimensionality leads to both a large computational cost and an enormous storage even for one single model simulation. Moreover, it is often of interest to calculate the time evolution of the statistics, especially the covariance matrix that includes the information of the interactions between different components [1, 29]. Since analytical solutions for complex nonlinear systems are typically unavailable, Monte Carlo methods are often applied. However, the affordable sample size is very limited due to the computational cost while the size of the covariance matrix, which is the square of the state variables, can be many magnitudes larger in many real applications that involve millions or billions of state variables. Furthermore, non-Gaussian features due to the nonlinear nature of the underlying dynamics such as extreme events can greatly affect the calculation of the covariance matrix and introduce extra difficulties. Therefore, understanding the covariance structures in nonlinear dynamical stochastic models and then developing efficient computing strategies that can exploit these structures become extremely important.
To facilitate the calculation of the covariance matrix, various approximations are applied in practice. One widely used strategy is to adopt Gaussian approximations for the underlying dynamics since there are many well-developed analytical and numerical methods for sampling high-dimensional Gaussian distributions. However, completely ignoring the non-Gaussian information in the underlying dynamics often leads to large model errors. One effective strategy is to decompose the entire phase space into a high dimensional Gaussian part and a low dimensional non-Gaussian part using the conditional Gaussian framework[11, 12, 9, 10, 38, 32]. Another strategy is to exploit the fact that the majority of the problem uncertainty is distributed within a low dimensional subspace of the Karhuen-Loève expansion, then uncertainty quantification (UQ) can be operated just in this subspace for sampling and computational simplicity [51, 15, 13, 7, 19, 46, 14].
This paper focuses on a different structure-exploiting strategy known as localization, which is widely used in many areas including the numerical weather prediction [26, 22, 25] and efficiently quantifying the uncertainty in excitable media [6, 44]. In many applications, the system state variables represent the information at different spatial locations, and the dependence of certain information between different grid points decays quickly when their distance becomes large. In other words, two grid points can be regarded as nearly independent if they are far from each other. The localization strategy exploits this feature and adopts a band matrix as the approximation of the covariance [48, 2]. Thus, the localization allows the development of much simpler approaches in analyzing the covariance matrix and thus facilitates the study of many important UQ problems.
While the localization strategy has a wide range of real-world applications, it remains unclear about the role of different dynamical components on affecting the covariance localization. In addition, despite many empirical evidences, there is no rigorous justifiction of the covariance decay as a function of the spatial distance. Nevertheless, these are crucial topics in practice for at least two reasons. First, it is known that applying localization by approximating the full covariance matrix with a band matrix may introduce some biases in quantifying the uncertainty of the underlying system [53, 42]
and the current bias estimates[53, 41] require the knowledge of the underlying covariance matrix as well as a quantitative description of the decay in the covariance. Second, in order to apply localization in practice, one often needs to specify a localization radius , which represents the typical distance for two components to be roughly independent. Unfortunately, in the absence of a rigorous theory to understand the covariance decay behavior, determining often relies on exhaustive tuning and data fitting, which can be expensive and inaccurate.
This paper aims at developing a theoretical framework to understand the covariance behavior for a large class of stochastically coupled nonlinear systems in high dimensions that include many excitable media models and other nonlinear complex models. Rigorous mathematical analysis shows that the local effect from the diffusion results in an exponential decay of the components in the covariance matrix as a function of the distance while the global effect due to the mean field interaction synchronizes different components and contributes to a global covariance. To achieve this, a covariance comparison principle is established. Then by explicitly computing the covariance propagation in a linear surrogate model, an upper bound on the covariance entries can be obtained.
Two important applications of these theoretical results are illustrated. First, a spatial averaging sampling strategy is developed. It makes use of the statistical symmetry to replace the samples from repeated experiments using ensemble methods by samples at different spatial grid points [8, 49], which greatly reduces the number of repeated experiments while the accuracy remains the same. In other words, combining the spatial averaging strategy with a small number of Monte Carlo simulation of the underlying nonlinear complex systems, the covariance with a localization structure can be sampled in an efficient and accurate fashion, at least for a finite time. Secondly, the same structure can be exploited to improve the accuracy of data assimilation [30, 37, 18, 31]. The theory developed here provides a practical guideline to choose the localization radius in a systematical way that facilitates an effective data assimilation for large dimensional systems.
The rest of the article is organized as follows. Section 2 states the main theoretical results for the spatial localization for nonlinear dynamical stochastic models and the two applications. Section 3 includes the detailed analysis of the covariance matrix with both local and nonlocal effects. Numerical simulations of a linear model and a stochastically coupled FitzHugh-Nagumo (FHN) model for excitable media in different dynamical regimes are shown in Section 4. The article is concluded in Section 5. Details of proofs are included in Appendix.
2 Problem setup and main results
To describe the behavior of nonlinear dynamical stochastic models for an excitable media, we consider a general stochastic diffusion reaction model with mean field interaction on a one dimensional torus :
is a spatial white noise. We can discretize (2.1) in space, and write down the dynamics at the -th grid point as
where are independent standard Wiener processes.
The deterministic forcings in (2.2
) can be classified into two groups based on their range of action. Heredescribes how does a component force its neighbors and itself while models the McKean Vlasov type of mean field interaction . We will adapt this classification to simplify our notation. Moreover, we will consider the general setting where is not univariate. This will be useful when we discuss the FHN model, as we need both the membrane voltage and the recovery forcing.
Specifically, we let be a stochastic process in . For simplicity, we assume all block share the same dimension, that is and . Suppose each block follows an SDE
In (2.3), are independent standard Wiener processes of dimension , and is symmetric. The index should be interpreted in a cyclic fashion, that is . The natural distance between indices is given by
For simplicity, we assume are i.i.d. samples from .
Our main results can be summarized as the follows.
Suppose the following constants are independent of
Here denotes the maximum eigenvalue. Then for any test function
denotes the maximum eigenvalue. Then for any test functionand , there is a constant independent on , such that
Here . And if , the bound can be improved to .
The exact value of can be found below in Theorem 3.3. The main conclusion here is that when the dimension is large, for components that are far from each other, their covariance is close to zero. In other words, the covariance between model components is significant only for nearby components. Following , we call such a covariance structure to be local. Below, we illustrate two important applications that show such a property is crucial for uncertainty quantification (UQ).
2.1 Sampling with spatial averaging
One fundamental UQ question regarding SDEs such as (2.3) is how to efficiently compute for a given test function and ensure that the estimation error is small. The standard Monte Carlo approach involves simulating (2.3) multiple times, and use the sample average as an estimator. However, with the increase of the dimension , the computational costs to generate each independent sample increases dramatically. Therefore, only a small number of samples can be adopted in practice.
One important observation is that (2.3) is invariant under index shifting. Therefore, by symmetry, shares a common distribution for all . As a consequence, each of them can be viewed as an individual, despite not independent, sample. Exploiting this perspective, it has been proposed in  the following spatial average as an estimator of
Since the samples are independent, the statistical properties of is determined by the ones of
It is straightforward to verify that and are unbiased:
The accuracy of these estimators can be measured by their variance. By independence, we have, which is given by
Therefore we have
In contrast, if we consider the standard estimator , its variance is , which is of multiple of the one of . In other words, the spatial averaging has increase the effective sample size by a factor of .
2.2 Localization in data assimilation
. One major application of DA is numerical weather prediction, where forecasts are obtained by combining simulations of the dynamical models with satellite data. The associated dimensions of such data assimilation problems are extremely high, often in orders of millions in order to describe variables in different temporal and spatial scales of the underlying nonlinear complex systems. Moreover, for these refined models, the current computational power can only provide around one hundred simulation samples. DA algorithms, for example the ensemble Kalman filter (EnKF), need to use these 100 samples to represent the covariance matrix of model components.
The classic sample covariance,
is not accurate in this setting. A classic random matrix results indicates that if are i.i.d. samples from , the covariance estimation error . Therefore, the estimator is very inaccurate in the situations with the small samples and high dimensions, i.e., . The intuition behind this mathematical phenomenon is that, spurious correlation may exist between each pair of components with a small probability because of the sampling effect, but there are pairs of components. Thus, the resulting errors are very likely to be significant.
Nevertheless, if we are aware that covariance is local as described in Theorem 2.1, then the erroneous estimation can be made accurate. Note that the far-off-diagonal entries are close to zero. Thus, a natural way is to truncate these entries. In particular, with a given bandwidth and covariance matrix , we define the localization of as
It has been shown in  that such a localization technique can significantly improve the sampling accuracy. A simplified version in  shows that if is Gaussian distributed, then there exists a uniform constant , so that for any ,
In other words, if is a fixed bandwidth, it requires only many samples to find an accurate estimate of .
On the other hand, one needs to notice that (2.7) indicates is a good estimator of , but not necessarily . In order for it to also be a good estimator of , one needs to be small as well. Theorem 2.1 can guarantee this if there is only local interaction, i.e. . Since a covariance matrix’s -norm is bounded by its -norm, we can plug in the estimate in Theorem 2.1 with and find
Since localization involves using very simple numerical operation to significantly improve the estimation of the covariance matrix, it has been widely applied in all DA algorithms, in particular, the ensemble Kalman filter [17, 24, 21, 57, 40, 27, 55, 54]. Moreover, adaptation of the localization strategy can be applied to Bayesian inverse problems where the associated covariance matrices follow the description of Theorem 2.1. In particular, it has been shown in 
that blocking the high dimensional components, and then applying Gibbs sampler, the resulting Markov chain Monte Carlo algorithms will converge to the desired distribution with a rate independent of the dimension. One fundamental assumption of all these practical computational methods is that the underlying system is spatially localized, yet there has been no rigorous justification of this assumption. This article closes this gap for systems of type of (2.3).
To facilitate our discussion, we will use the following notations: denotes the norm of , while is the inner product;
denotes the identity matrix of dimension. Given a matrix , the norm is the operator norm of ; denotes the maximum eigenvalue of and is its Frobenius norm. We use to denote the -entry of .
When we have a vectorof dimension, we often view it as the concatenation of sub-blocks of dimension . The decomposition can be written as . Given a matrix , we write its -th sub-block as or . The readers should not confuse these notations with the matrix entries . In particular, we can write down the entries of :
With a function , denotes its Jacobian matrix. denotes the partial derivative with respect to , which is the -th block-column of .
For two symmetric matrices indicates that is positive semidefinite.
3 Analysis of covariance matrices
We provide the explicit statements of our result in this section. The detailed proofs are allocated in Appendix.
3.1 Linear covariance propagation
Consider the following multivariate SDE:
Our main result is established through a covariance comparison principle between (3.1) and a linear surrogate.
Suppose the covariance propagation of (3.1) is dominated by in the following sense:
Let , and
Then for any test function with bounded gradient,
Here, , and are dimensional matrices. Their sub-blocks are given by
Since this equation is linear, we can apply the Duhamel’s formula and write the solution as
By Ito’s isometry,
3.2 A representation of the covariance matrix
One difficulty in reaching Theorem 3.1 is how to compute the covariance matrix for a multivariate SDE. We will establish a framework for covariance matrix computation, starting from a simple Stein’s lemma type of formula:
Suppose are independent standard Gaussian random variables of dimension
are independent standard Gaussian random variables of dimension, and are smooth functions with bounded first order derivatives. Let , then
In order to apply Lemma 3.2, we need to represent as a function of certain Gaussian random variables. These Gaussian random variables are the increments of the Wiener processes driving . To setup this connection, recall that one way to simulate (3.1
) is to perturb its ordinary differential equation (ODE) version with random noises. In particular, consider the ODE system
Let be the mapping from to . We suppress the dependence of on , for notational simplicity. Then one way to simulate (3.1) is generate the following iterates
Then is a numerical approximation of if when is large. A standard verification Lemma of this statement is provided in Appendix. We write the -th block of as . are dimensional standard Gaussian noise for .
The strategy here is to consider instead of , because its dependence on the noise realization is easier to handle using Lemma 3.2. We can show a similar result for as in Theorem 3.1. Then in the limit , we have our desired results. The details are shown in Appendix. As a remark, it might be possible and more elegant to represent the covariance of directly using tools from Malliavin calculus , and avoid the procedures of discretization and taking continuous limit. Yet the current proof strategy can be easier to understand for an applied math audience.
3.3 Diffusion and mean field interaction
While (3.2) provides a general upper bound, from its formulation is not easy to see the covariance decay we expect. To make the covariance decay explicit, we try to use instead a homogenous linear surrogate. We update the constant definition in (2.4) to the spatial inhomogeneous setting:
They are finite under the conditions of Theorem 3.1. We can build a homogeneous linear surrogate by letting
Then the linear surrogate (3.3) corresponds
where the discrete Laplacian is given by
which is closely related to the heat equation.
The spatial covariance of is built up by the diffusion effect of and the mixing effect of mean field interaction . These two effects operate in different ways. The diffusion propagates the diagonal entries in the covariance matrix to the off-diagonal ones, it contributes to the exponential decay term in Theorem 2.1. The mean field interaction synchronizes the components, and contributes to a global covariance of scale .
We can establish the following explicit bounds in terms of , instead of the matrix :
When there is only mean field interaction, i.e. , the result can be simplified by choosing ,
When there is only diffusion effect, that is , the result can be simplified as
It is worth noticing the result holds for all . This indicates the covariance structure actually decays faster than exponential. In practice, one can try to minimize the right hand side with respect to , or directly compute the covariance of (3.7) to find the exact local covariance structure. While sharper theoretical upper bound maybe obtainable by applying formulas as in , they are complicated and work only asymptotically. Our current upper bound is simpler, and sufficient for applications discussed in Sections 2.1 and 2.2.
With Theorem 3.3, it is straightforward to show the spatial averaging estimator is consistent:
Since in (2.6), . ∎
3.4 Long time stability
In many applications, the long time scenario is of interest. For example, if we want to find the equilibrium measure of , we often simulate for a long time, and use its distribution as an approximate. Likewise, if we are interested in data assimilation with models as (2.3), the algorithms are usually iterated for an extended period. In order to apply the spatial averaging strategy in Section 2.1, and the localization strategy in Section 2.2 for these operations, we need the local covariance structure to be stable in time.
In the view of Theorem 3.3, we can guarantee the local covariance structure is stable if . In particular we have the following corollary by letting in the estimate.
If , then there is a , so that if , we have
4 Test Examples
In this section, we demonstrate our theoretical findings with several numerical test examples.
4.1 A linear model
We start with a linear model, where the analytic solutions associated with this linear model can be written down explicitly (See Appendix B). Therefore, this model is able to provide insights and validation of the theories developed in the previous sections.
The linear model is as follows:
with periodic boundary conditions. Here is the averaged value of all the state variables,
In the linear model (4.1), the coefficients of damping , diffusion , mean field interaction and the stochastic noise are all constants. Thus, the flow field is statistically homogeneous. Below, we fix
while the diffusion and mean field interaction coefficients and vary in different cases.
In addition, only one-layer model is utilized here and we consider the covariance of different themselves. Therefore,
Furthermore, . Below, we study the role of the mean field interaction and the diffusion. All the results shown in this subsection are computed using the exact solution given in Appendix B, which allows us to understand Theorem 3.3 without being interfered by numerical errors.
1. Linear model with only mean field interaction. First, we study the case with only mean field interaction. In other words, we set the diffusion coefficient . The constants in (4.3) reduce to
According to (3.8) in Theorem 3.3, the covariance bound is proportional to and the coefficient is provided by letting . Therefore, assuming the initial covariance is zero, we have the following result
In Figure 4.1, we show the covariance between and for at . Here the initial conditions are zero everywhere. Different columns show the covariance with different . The spatiotemporal simulations are also included to provide intuitions. Without the diffusion and the initial covariance, the covariance between and all other with has the same non-zero value due to the global effect from the mean field interaction.
In Figure 4.2, the dependence of the covariance between and as a function of is shown. It is clear that the covariance decays as a function of , which validates the results in Theorem 3.3 (and that in (4.6)). The black dashed line in panel (b) (log-log scale) shows the bound in (4.6), where the constant is given by the coefficient on the right hand side of (4.6) in front of .
2. Linear model with only diffusion. Next, we study the case with only diffusion. In other words, we set the mean field interaction coefficient . The constants in (4.3) reduce to
In Figure 4.3, we show the covariance between and for at with fixed. Here the initial conditions are zero everywhere. Different columns show the covariance with different strengths of the diffusion coefficient . The spatiotemporal simulations are also included to provide intuitions. Without the mean field interaction and the initial covariance, the covariance between and for being far from decays to zero, reflecting the local contribution due to only the diffusion (without mean field interaction). On the other hand, the increase of allows the increase of the covariance between and for far from . This is clearly illustrated in the spatiotemporal patterns and is consistent with our intuition as well.
Figure 4.4 shows the covariance between and for with and at . In panel (b), the logarithm scale is shown, which indicates a linear dependence of the log covariance on the distance . This validates the exponential decay of the covariance as a function of the distance as shown in (4.8). The black dashed line shows the bound in (4.8), where the constant is given by the coefficient on the right hand side of (4.8) in front of by taking . We have tested other , with which the bounds all above the actual covariance. Therefore, we validate the theoretical results with only diffusion, described by (3.9).
3. Linear model with both the mean field interaction and diffusion. Finally, we study the case with both mean field interaction and diffusion. Figure 4.5 is similar to Figure 4.3 but with a non-zero mean field interaction . Therefore, unlike the results in Figure 4.5, the covariance between and where is far from is non-zero due to the mean field interaction that has a global impact. In Figure 4.6, it is also clear that when increases the covariance between and first experience an exponential decay and then the covariance remains as a constant. Thus, the exponential decay is the main contribution to the covariance behavior for grid points that are close to while the mean field interaction plays the dominant role at the location that is far away. These simulations are consistent with the theoretical results shown in Theorem 3.3.
4.2 A stochastically coupled FHN model
In this subsection, we explore the covariance structure in a stochastically coupled FitzHugh-Nagumo (FHN) model, which is a nonlinear model that describes activation and deactivation dynamics of spiking neurons[33, 8]. The model reads,
where and are activator and inhibitor variables. Periodic boundary conditions are imposed on variables. In (4.9), the timescale ratio leads to a slowfast structure of the model. The parameter such that the system has a global attractor in the absence of noise and diffusion . The random noise is able to drive the system above the threshold level of global stability and triggers limit cycles intermittently. Note that with , the model reduces to the classical FHN model with a single neuron and it contains the model families with both coherence resonance and self-induced stochastic resonance . With different choices of the noise strength and the diffusion coefficient , the system in (4.9) exhibits rich dynamical behaviors. Finally, in (4.9), the variable is the mean field interaction.
The goal of this subsection is to study the spatial structure of the covariance due to both the local effect (i.e., diffusion) and the global effect (i.e., mean field interaction).
4.2.1 Basic properties
For the convenience of stating the theoretical results, we slightly change of notation of (4.9) in this paragraph,
To apply the framework we developed, it is natural to let such that