To accurately capture the dynamical behavior of dynamical systems, it is essential for one to assess the effects of the input uncertainties on model predictions der2009aleatory; helton2011quantification. To introduce the methodology, consider a continuous dynamical system evolving on a smooth manifold:
where the underlying dynamical system evolves in a complete metric space with a countable dense set
, vectordefines the model parameters, is the temporal index, and is a Lipschitz vector field strogatz2001nonlinear.
From a dynamical evolution perspective, the state is uniquely determined by the state and possibly some noise following the ergodic theory giannakis2019data. The discrete-time dynamics can be defined as:
where is a smooth diffeomorphism 111An invertible function that maps one differentiable manifold to another in such a way that both the function and its inverse are smooth. that maps the state to . In this case, all uncertainty in the system originates from the uncertainty in the initial system state giannakis2019data; arbabi2017ergodic.
This study assumes that Eq. 2 is configured with a random initial state with known joint probability density function
. The goal here is to carry out uncertainty quantification and propagation by means of probability density function (PDF). Therefore, the solution scheme requires the reformulation of the governing equations from ordinary/partial differential equations (ODEs/PDEs) to its stochastic formatli2009stochastic; ghanem2003stochastic
. Several approaches have been developed in the past few decades, which can be classified into two main categories: macro-scale methods and micro-scale methods.
1.1 Macro-scale evolution
Macro-scale methods focus on the computation of the distribution parameters, for instance, mean
in the case of normal distributionmelchers2018structural; low2013new. In practice, an analytical expression in terms of a general probability evolution equation is available only for limited linear systems, where the task of tracking the evolution of the PDF can be accomplished using homogeneity and additivity properties. Specifically, if the input PDF is Gaussian, the evolution of probability can be hereby represented by a sequence of Gaussian density functions with a shifted center and scaled variation. Similar analytical expressions for nonlinear systems may be obtained via linearization, whereas these methods are very effective if only the second-order statistics of the probability evolution is of interest. For a more general and highly nonlinear system, macro-scale methods have difficulty in describing the multi-mode shape or tails of a non-Gaussian distribution cook2016gaussian; kareem2008numerical.
1.2 Micro-scale trajectory
Micro-scale methods resort to a set of random realizations of the uncertain inputs using Monte Carlo (MC) sampling, or collocation points. In particular, collocation based methods use a grid of points for which PDFs are obtained through finite difference (FD) or finite element methods (FEM) ghanem2003stochastic; melchers2018structural; xiu2002wiener
. The global PDF then results from a mixture of these PDFs. Interpolation functions are used to describe the PDF of any arbitrary point in space and time, which allows the description of the probability densities at any arbitrary points in addition to the interpolation ones. However, it should be noted that FD and FEM may encounter challenges when applied to high-dimensional random space since their integral domain is not tailored to the random space. By contrast, MC methods use the ensemble of independent sample trajectories to represent the evolution of probability, where the PDF at any time is approximated by a set of discrete probabilities at sample pointsrobert2013monte. MC therefore disregards information given by the PDF around particular samples, or, in other words, it assumes the region of a sample to be homogeneous. Therefore, in order to minimize the error in the representation of the real PDF, MC has to employ a large number of samples to fill the random space, or to keep the region of each sample as small as possible. Considering the computational cost of examining a great number of samples, possible remedy strategies include (a) constructing an easy-to-evaluate surrogate/emulator that is trained using a small number of propagated data rajashekhar1993new; luo2020bayesian; luo2019deep; (b) utilizing advanced sampling strategies such as importance sampling, adaptive sampling, etc, to generate high-quality samples nie2004new; au2001estimation; papaioannou2015mcmc; and (c) performing the prescreening of the most influential variables by global sensitivity analysis techniques sudret2008global; wei2015variable.
1.3 Meso-scale parcel
In this study, a generalized perspective regarding the investigation of the dynamical evolution of the PDF of interest is introduced following the seminal works presented in Yin2013theory; Yin2013method. Viewed from this perspective, the PDF of the input uncertainties are modeled by a mixture of Gaussians bishop2006pattern; kurtz2013cross
. Characterization of each component of the mixture includes the computation of the mean and covariance, and an active learning method is hereby proposed to accelerate the parameter estimation, where the means of these Gaussian components are directly estimated by a low-discrepancy sequence or a clustering algorithmrobert2013monte; zhang2013structural; kaufman2009finding
, and the optimal covariances are obtained through a complexity-reduced expectation maximization (EM) algorithmdempster1977maximum; mclachlan2007algorithm. Next, the evolution of each Gaussian component is expressed in a convolution format that produces an evolutionary kernel li2004probability; wan2000unscented. The integral can be efficiently solved by the third-degree spherical-radial cubature rule arasaratnam2009cubature; jia2013high. Finally, by assembling the resolved integral results of each component provides the evolutionary PDF of interest.
Compared to the aforementioned macro-scale evolution and micro-scale trajectory, this meso-scale parcel scheme has two advantages. First, mixture modeling can provide asymptotically converged approximations to a given arbitrary probability distributionbishop2006pattern; kaufman2009finding. Second, the computational complexity of tracking the evolution of the PDF of interest is much lower than the sampling-based methods where the number of samples required by the MC method in order to adequately estimate the PDF increases significantly xiu2002wiener; robert2013monte. Moreover, meso-scale parcel is an effective alternative at an intermediate scale to the macro-/micro-scale methods, where the macro-scale evolution can be regarded as a special case of the meso-scale representation, in which the statistical structure only contains one component, and the micro-scale trajectory can be cast in the meso-scale format with as many components as the number of samples and Dirac delta function as an indicator. Fig. 1 gives a graphic illustration of these three perspectives.
The paper is organized as follows. Section 2 provides the definition of the problems of interest. Section 3 gives the computational guidelines of the proposed meso-scale scheme. Section 4 interprets the meso-scale uncertainty propagation of two special cases: conservative and Markov models. Section 5 provides examples of application to demonstrate the effectiveness of the proposed meso-scale based method. Finally, conclusion is drawn in Section 6 with some discussion on the relative advantages and limitations of the proposed scheme, and potential future works.
2 Methodology: a meso-scale probability evolution scheme
For notational brevity, let us formulate a stochastic dynamical system in a general model form strogatz2001nonlinear:
where is an input random vector and represents the corresponding output at time instance , which also is a random vector.
The goal is to obtain the evolutionary PDF of . Hence, a governing equation with respect to the instantaneous PDF should first be derived li2004probability
. In the case of continuous random variables, such PDF can be written in a convolution form asli2009stochastic:
where is the Kronecker delta function and is the joint PDF describing the input uncertainties, which assumed to be available in this work.
The main point is to find a computationally efficient representation of to compute the evolutionary PDF of . This naturally leads to a geometric segmentation, and mixture modeling that learns a probabilistic model by assuming the data are generated from a mixture of distributions is adopted here bishop2006pattern; kaufman2009finding. Hence, is approximated through a convex combination of a series of weighted base distributions as:
with denoting the weighting factor for the base distribution. Additionally, equality along with box constraints of are imposed as:
In practice, normal or multivariate normal distribution function is extensively used as Gaussian mixture model (GMM) is capable of approximating any arbitrary density function when sufficient base terms have been includeddempster1977maximum; mclachlan2007algorithm. Accordingly, Eq. 6 can be rewritten as:
Henceforth, the rest work is to use optimization methods to identify the parameters and substitute optimized parameters to Eq. 5 to compute the instantaneous PDF:
3 Computational algorithms and implementation details
The computational algorithm of the proposed meso-scale scheme contains two major steps. First, one has to determine the weight, location, and scale of each meso-scale component. Second, the optimized meso-scale components are integrated into a single kernel by Eq. 9. To ensure the precision of the constructed kernel regarding describing the dynamical behaviors, the number of the component PDFs has to be sufficiently large kurtz2013cross; kaufman2009finding. As a result, effective optimization of model parameters is a prerequisite in this case as conventional algorithms such as expectation-maximization by definition involve the optimization of a larger number of parameters dempster1977maximum; mclachlan2007algorithm.
To reduce the computational complexity, weighting factors are assumed to be homogeneous, that is, in Eq. 6 is assumed sufficiently large, and hence we have . Next, the locations can be directly determined by either a low-discrepancy sequence or clustering algorithms (Section 3.1), and the only parameters that needed to be optimized at this stage are and evaluation of this reduced model can be efficiently completed by any standard optimization algorithm (Section 3.2). Moreover, spherical-radial integration is introduced to compute the evolutionary kernel, where a third-degree spherical-radial cubature rule is adopted for efficient numerical integration (Section 3.3).
3.1 Determination of the representative points
To illustrate the connections between location parameters and tessellation/cluster centroids, let us consider an open set . The sub-domain set is called a tessellation of if for and . Each is assigned to a point . The representative point set (or rep-point set for brevity) corresponds to a tessellation of and also provides a candidate for . The task of finding can be approximately performed by finding the best partition of bishop2006pattern; kaufman2009finding. A measure of the quality of the partition can be given by the quadrature error that is defined as du1999centroidal:
where denotes a general norm and is a Lipschitz constant. If measured through information discrepancy using entropy theory, the quality of partition stated in Eq. 10 can be reformulated as:
is the cumulative distribution function (CDF) ofand is the indicator function. This -discrepancy coincides with the Kolmogorov-Smirnov distance illian2008statistical. Hence, can be determined in two ways: through a low discrepancy sequence by minimizing the -discrepancy (Section 3.1.1), or through clustering points by minimizing the quadrature error (Section 3.1.2). Fig. 2 provides a graphic illustration of these two approaches.
3.1.1 Low-discrepancy sequence approach
The concept of low discrepancy sequence (LDS) is central to the quasi-Monte Carlo method (QMC) and to the number theoretic method (NTM) robert2013monte; zhang2013structural; illian2008statistical
. An LDS is basically a uniformly distributed point setgenerated in a unit hypercube , where is the dimension of the input random space. The rep-point set for distributions different from the uniform can be obtained via transformation as:
where is a transformation function. Several effective LDSs, e.g. the good lattice point set (GLP), good point set (GP), Halton sequence, Haber sequence, Hammersley sequence, Faure sequence, Sobol sequence, etc, have been shown to provide lower -discrepancies than random point sets nie2004new; sudret2008global; wei2015variable; zhang2013structural. may be simply taken as the inverse of the CDF of , that is . Generally, has independent marginal, i.e. . Therefore,
where the -discrepancy of concerning is the same as the one for . Techniques for generating for elliptically contoured and multivariate Liouville distributions can be found in the literature fang2018symmetric. In particular, for a multivariate standard Gaussian distribution, the Box-Muller transformation provides a good alternative to the method based on the inverse of the CDF. Then, can be taken as the means of the components of a GMM, i.e. . An advantage of this manipulation is that the rep-point set transformed from an LDS can make the components distributed more uniformly than a random point set. A GLP set on is shown in Fig. 2. (a.1), its transformation to a Gaussian distribution is shown in Fig. 2. (a.2), and the corresponding voronoi tessellation is provided in Fig. 2. (a.3).
3.1.2 Clustering analysis approach
While the transformation of an LDS usually suits independent random variables, clustering methods provide an approach for dependent random variables kaufman2009finding. In essence, clustering methods aim to minimize the last term of Eq. 10 instead of minimizing the -discrepancy dempster1977maximum; mclachlan2007algorithm
. For example, for the K-means clustering algorithm, this minimization is equivalent to seeking the mass centroid of, i.e.
and it can be implemented as follows: (1) Initialize the cluster centers from by randomly sampling from ; (2) Randomly sample an auxiliary point set from , where ; (3) Assign respectively to their nearest cluster centers based on the Eulerian distance; (4) Update each center with the mean of the auxiliary points assigned to it; (5) Repeat step (3) and (4) until no longer changes. The second row of Fig. 2 graphically illustrates the connection between cluster centroids and the proposed meso-scale statistical structures.
3.2 Determination of the Gaussian components
With computed location parameters, the approximation capability of the constructed mixture model is primarily determined by the complexity of the selected kernel bishop2006pattern; kaufman2009finding. There are three types of kernels, namely, homogeneous kernel, inscribed kernel, and adaptive kernel. Fig. 3 gives a summary of these kernels.
To demonstrate the influences that model configuration has on the approximation capability, we resort to the concept of Gaussian complexity, which is defined as eldan2018gaussian:
where are independent standard Gaussian random variables and represents the set of interest.
According to eldan2018gaussian, the approximation space of interest to us is a function takes values of order and whose Lipschitz constant is . It is clear that such functions trivially have complexity at most . Hence, the kernel type and kernel numbers are the most important factors that should be addressed in the implementation. Fig. 4
graphically illustrates the influences of these two factors. Note that each circle represents a Gaussian component, for instance, homogeneous kernels indicate the same variance has been assigned. Also, the heatmap comparison is based on the complexity of the configured model usingEq. 15.
For numerical implementation, the optimal set of is assumed given by . The optimal set of can be determined by the EM algorithm. It should be observed that EM could be used to determine all the parameters, including the weights , means and covariances dempster1977maximum; mclachlan2007algorithm. However, as anticipated, in this case and are determined in advance, therefore significantly reducing the complexity of the optimization problem and allowing the use of more kernels. The corresponding EM algorithm can be implemented according to the steps stated in Appendix A.
3.3 Determination of the evolutionary kernels
Similarly to what is done in macro-scale methods, the second-order statistics can be used to describe each evolutionary kernel given by Eq. 9. Specifically, the objective is the estimation of the first two statistical moments of the system states , i.e., the mean
and the variance
where denotes the number of auxiliary points assigned to the evolutionary kernel and denotes the auxiliary point with weight for .
The auxiliary points can be chosen from an LDS, a random point set, a sigma-point set or a cubature point set illian2008statistical. A Gaussian cubature point set is favorable here since the integrand is a Gaussian density function. A Gaussian density function has a symmetric and radial shape. This feature facilitates efficient integration rules arasaratnam2009cubature; jia2013high. Explicitly, an integral with the integrand of the form
where is some arbitrary nonlinear function, denotes the region of integration, and is the known weighting function can be transformed into a spherical-radial integration. Therefore, Eq. 16 can be expressed as:
where represents the number of cubature points used in the spherical-radial cubature, while and are constants to be determined arasaratnam2009cubature; jia2013high. If has mean and covariance , the above integral can be further simplified to:
Through a third-degree spherical-radial cubature rule with and covariance , Eq. 21 writes as:
Note where represents the dimension of the input random space and denotes the vector:
Substituting these back into Eq. 16 and Eq. 17 allows the determination of the evolutionary kernel. The global probability evolution is obtained by assembling all the evolutionary kernels. Fig. 5. (a) illustrates a 2-dimensional cubature point set of a component PDF. The rep-points and their corresponding cubature points of a GMM of the bivariate Gaussian PDF are shown in Fig. 5. (b).
4 Meso-scale perspective of two classical typologies of probability evolution
4.1 Meso-scale scheme for conservative models
A conservative model places the randomness of a stochastic system in an augmented random initial condition. Particularly, random model parameters can be written in a state space where is derived from a random initial condition via deterministic dynamics li2009stochastic; ghanem2003stochastic. The random excitations can also be written in a state space by a series expansion, where the random coefficients are regarded as random initial conditions and the deterministic basis functions are regarded as deterministic dynamics melchers2018structural; xiu2002wiener. Thus, , , and can be assembled and formulated in an augmented state as:
with random initial condition
where and are respectively the augmented velocity function and state function of initial condition .
To compute the probability evolution of a conservative model from a meso-scale perspective (Eq. 6), the input PDF is expressed as:
Correspondingly, the evolutionary PDF is derived in a similar manner to Eq. 9 and it writes as:
Using the third-order Gaussian cubature rule arasaratnam2009cubature, the evolutionary kernel is represented with mean
In the view of state-space modeling, meso-scale scheme deals with conservative models as a nonlinear filter, i.e. cubature Kalman filter (CKF). The multi-dimensional integrals involved in the time(predictive density) and measurement(posterior density) Bayesian updating of the CKF are efficiently addressed via the cubature rule. Such a derivative-free method broadens the applicability of the proposed meso-scale schemewan2000unscented; arasaratnam2009cubature; jia2013high.
4.2 Meso-scale scheme for Markov models
The second type of model that is closely connected to the proposed meso-scale scheme is the Markov model grigoriu2013stochastic. By definition, Markov models posit that the randomness of a stochastic system is injected sequentially by random excitations. The initial condition may also be random. Usually, a Markov model is written as
where dentoes the random excitation. It is a Wiener process with mean and covariance . Moreover, such a Wiener process holds:
In a similar manner(Eq. 6), the input PDF is and it is expressed by the meso-scale representation:
Accordingly, the probability evolution is:
where denotes the integral domain, including infinite time slices from to . With the derivative-free discretization, the second-order statistics become:
Note that the integral domain is not but , which is time-varying. Therefore, a new GMM should be constructed for as each random excitation is injected eldan2018gaussian. Considering that is an additive noise and independent of , a GMM can be initially constructed for and then updated according to . However, instead of constructing a series of new GMMs between time and , a simpler alternative method can be followed fang2018symmetric. The computational details are given in Appendix B.
5 Numerical examples
On the basis of aforementioned fundamentals, four examples have been presented here, some of which have been used by others to illustrate various features of the response. The example built upon the first one addressing a linear transformation followed by a nonlinear transformation, a Duffing oscillator and a nonlinear system with uncertain parameters under random excitation.
5.1 Example i@: Linear Transformation
Let us consider a linear transformation like:
where with mean and . This linear transformation converts an uncorrelated Gaussian distribution into a correlated one. The inverse of Eq. 43 gives:
The Jacobian of the transformation is and the evolutionary PDF is:
Here we use a Good lattice points (GLP) set with 89 points to determine the mean of the GMM for . Each evolutionary kernel is calculated via a third-order Gaussian cubature rule. The analytical and meso-scale solutions of are shown in Fig. 6.(a) and (b) respectively. Fig. 6
.(c) plots them together to give a better comparison. In addition, the Kernel density estimation (KDE) solution forbased on quasi monte carlo (QMC) with the same GLP set is also calculated, as shown in Fig. 6.(d). The KDE solution is obtained by minimizing MSE with Gaussian kernel density functions. Note that the KDE solution shows a false multi-modal nature. A possible reason is that this micro-scale method loses partial information of around each particular point of the GLP set, and fails to record its evolution, which is thus not exactly reflected in the KDE solution. By contrast, the proposed meso-scale method maintains the information on the PDF and can record the evolution precisely.
5.2 Example ii@: Nonlinear Transformation
In this example, we consider the nonlinear transformation:
where the input uncertainties take the same distribution stated in the previous example. By inverting Eq. 46 two real roots for can be found, as:
The Jacobian in this case has the same value for both roots:
Therefore, the nonlinearly transformed probability distribution can be expressed as:
Similarly, GLP set and third-order Gaussian cubature rule are used to determine . The analytical and meso-scale solutions are shown in Fig. 7.(a) and (b) respectively, where the size of the mesh is . Fig. 7.(c) gives a contour plot and Fig. 7.(d) shows a KDE solution, where the bandwidth of the kernel-smoothing window is set to . In can be observed that a false single-mode PDF rather than the exact double-mode one is computed via the normal Kernel smoothing function while meso-scale method have successfully identified the existence of two extreme regions.
5.3 Example iii@: Duffing Oscillator
Consider a Duffing oscillator subjected to a Gaussian white noise excitation, governed by equationcaughey1971nonlinear; bergman1992robust:
where the nominal parameters are , , , and . The initial condition is a Gaussian random vector with mean and covariance:
The analytical solution of the stationary PDF is given by:
This example has been studied in the past using finite element method, which showed a distinct advantage over Monte Carlo simulation in problems of small dimension like this one. Here this problem has been revisited using the meso-scale based methodology. The initial PDF is constructed by GMM with 350 component PDFs, and then updated at each time step (0.015 seconds). Fig. 8 shows the sampling trajectories of the PDF from a meso-/micro-scale perspective. The computational algorithm provided in the Appendix B is used to calculate the influence from . Therefore, the total number of simulations is , where 4 is the number of cubature points for each evolutionary kernel (See Fig. 5 for details). The total number is lower than the usual number of nodes in a finite element method.
As time progresses, approaches a stationary PDF, whose meso-scale solution is depicted in Fig. 9. (c). Meanwhile, the meso-scale approximation of the input uncertainties are described Fig. 9. (a). Differences between the analytical expression and the meso-scale solutions are provided for comparison. The results suggest a good match between the two solutions. Furthermore, it can be observed that the error distribution reflects the locations of the adopted meso-scale components. The maximum error is in a relatively small range, demonstrating the effectiveness of the proposed PDF evolution modeling scheme.
5.4 Example iv@: Nonlinear Structure with Uncertain Parameters Subjected to Random Excitation
In the last example, let us consider a 10-story 2-bay uncertain shear frame subjected to random ground motion li2009stochastic. The lumped masses, , are listed in Table 1:
Meanwhile, other structural characteristics are given as: ; column square cross-section with dimension ; beams with infinite stiffness. The damping matrix is , where and are respectively the mass matrix and the initial stiffness matrix. Assume and . The Bouc-Wen model is used to model the restoring force as:
where is the initial stiffness; is the inter-story drift and is the hysteretic component satisfying:
in which the parameters take the value , , , , and . The initial Young’s modulus is an uncertain parameter. The random ground motion is represented by a randomly scaled El Centro record with peak ground acceleration (PGA) as the random variable. The total probabilistic information is listed in Table 2.
As previously discussed in Section 4.1, this system can be recast into a conservative model by placing the total random variables in an augmented random initial condition. For comparison, the meso-scale method and QMC are used to compute the probability evolution. The second-order statistics of the top floor displacement calculated by both methods are shown in Fig. 10.(b) and (c). Note that both methods give similar second-order statistics. In addition, the meso-scale method can also provide the evolution of probability as shown in Fig. 10.(d), which is not given by QMC.
6 Concluding remarks
In this paper, the probability evolution has been investigated from a meso-scale perspective. The proposed scheme is able not only to maintain the information concerning the PDF with an equivalent statistical structure but also to track the evolution of this statistical structure. This is accomplished by utilizing a Gaussian mixture model (GMM) for the representation of the input PDF, and solving a series of mixtures of Gaussian integrals for the probability evolution. We have demonstrated that such meso-scale computation frees the macro-scale methods from the limitations of the second-order statistics and enhancing the expressibility of micro-scale methods by maintaining the PDFs around samples. Furthermore, the connection between the proposed meso-scale scheme to the standard conservative and Markov models has been established in the context of stochastic modeling. The third-degree spherical-radial cubature rule is introduced to further reduce the number of parameters that are involved in the optimization of GMM. The efficacy of the meso-scale method is verified by several examples. In summary, the provided results have demonstrated the merit of the proposed method and a new meso-scale perspective regarding examining the probability evolution it offers.
For the further extensions to this work, future studies may address issues as: (1) generalizing the method to the probabilistic regime, i.e. incorporating the Bayesian framework into the current scheme; (2) scaling the uncertainty quantification process to an ultra high-dimensional stochastic process via the adoption of deep latent space model, variational autoencoder (VAE), generative adversarial networks (GANs), etc; and (3) applying the method to a broader range of different types of engineering problems.
This research has been supported in part by the National Science Foundation under Grant Agreement No. 1520817 and No. 1612843. A. Kareem gratefully acknowledges the financial support of Robert M Moran professorship.
a Expectation-maximization algorithm for covariances computation
Expectation-maximization (EM) algorithm is closely related to the maximum likelihood estimation (MLE) that is an effective approach that underlies many machine learning algorithms. The EM algorithm performs MLE for learning parameters in probabilistic models with latent variables. With determinedand (See Section 3.2), using EM algorithm to optimize can be summarized to six steps:
Initialize the set as .
If is an LDS, then randomly choose an initial set and an auxiliary point set , where .
is a cluster set, then set the initial value of as:
where is the auxiliary point set assigned to the cluster, with .
Estimate the log likelihood:
Evaluate the responsibilities (expectation step):
Re-estimate the following parameters (maximization step):
where . Then re-estimate the log likelihood with the given by Eq. 58.
Check for convergence for either the parameters or the log likelihood.
If not converged, repeat steps ; if converged, set .
B Computational procedures for Markov models
The overall updating scheme for computing the first two statistical moments can be summarized to two steps.
Utilize the samples of the additive noise to estimate the updated values of the mean and covariance of each evolutionary kernel: