1 Introduction
To accurately capture the dynamical behavior of dynamical systems, it is essential for one to assess the effects of the input uncertainties on model predictions der2009aleatory; helton2011quantification. To introduce the methodology, consider a continuous dynamical system evolving on a smooth manifold:
(1) 
where the underlying dynamical system evolves in a complete metric space with a countable dense set
, vector
defines the model parameters, is the temporal index, and is a Lipschitz vector field strogatz2001nonlinear.From a dynamical evolution perspective, the state is uniquely determined by the state and possibly some noise following the ergodic theory giannakis2019data. The discretetime dynamics can be defined as:
(2) 
where is a smooth diffeomorphism ^{1}^{1}1An invertible function that maps one differentiable manifold to another in such a way that both the function and its inverse are smooth. that maps the state to . In this case, all uncertainty in the system originates from the uncertainty in the initial system state giannakis2019data; arbabi2017ergodic.
This study assumes that Eq. 2 is configured with a random initial state with known joint probability density function
. The goal here is to carry out uncertainty quantification and propagation by means of probability density function (PDF). Therefore, the solution scheme requires the reformulation of the governing equations from ordinary/partial differential equations (ODEs/PDEs) to its stochastic format
li2009stochastic; ghanem2003stochastic. Several approaches have been developed in the past few decades, which can be classified into two main categories: macroscale methods and microscale methods.
1.1 Macroscale evolution
Macroscale methods focus on the computation of the distribution parameters, for instance, mean
in the case of normal distribution
melchers2018structural; low2013new. In practice, an analytical expression in terms of a general probability evolution equation is available only for limited linear systems, where the task of tracking the evolution of the PDF can be accomplished using homogeneity and additivity properties. Specifically, if the input PDF is Gaussian, the evolution of probability can be hereby represented by a sequence of Gaussian density functions with a shifted center and scaled variation. Similar analytical expressions for nonlinear systems may be obtained via linearization, whereas these methods are very effective if only the secondorder statistics of the probability evolution is of interest. For a more general and highly nonlinear system, macroscale methods have difficulty in describing the multimode shape or tails of a nonGaussian distribution cook2016gaussian; kareem2008numerical.1.2 Microscale trajectory
Microscale methods resort to a set of random realizations of the uncertain inputs using Monte Carlo (MC) sampling, or collocation points. In particular, collocation based methods use a grid of points for which PDFs are obtained through finite difference (FD) or finite element methods (FEM) ghanem2003stochastic; melchers2018structural; xiu2002wiener
. The global PDF then results from a mixture of these PDFs. Interpolation functions are used to describe the PDF of any arbitrary point in space and time, which allows the description of the probability densities at any arbitrary points in addition to the interpolation ones. However, it should be noted that FD and FEM may encounter challenges when applied to highdimensional random space since their integral domain is not tailored to the random space. By contrast, MC methods use the ensemble of independent sample trajectories to represent the evolution of probability, where the PDF at any time is approximated by a set of discrete probabilities at sample points
robert2013monte. MC therefore disregards information given by the PDF around particular samples, or, in other words, it assumes the region of a sample to be homogeneous. Therefore, in order to minimize the error in the representation of the real PDF, MC has to employ a large number of samples to fill the random space, or to keep the region of each sample as small as possible. Considering the computational cost of examining a great number of samples, possible remedy strategies include (a) constructing an easytoevaluate surrogate/emulator that is trained using a small number of propagated data rajashekhar1993new; luo2020bayesian; luo2019deep; (b) utilizing advanced sampling strategies such as importance sampling, adaptive sampling, etc, to generate highquality samples nie2004new; au2001estimation; papaioannou2015mcmc; and (c) performing the prescreening of the most influential variables by global sensitivity analysis techniques sudret2008global; wei2015variable.1.3 Mesoscale parcel
In this study, a generalized perspective regarding the investigation of the dynamical evolution of the PDF of interest is introduced following the seminal works presented in Yin2013theory; Yin2013method. Viewed from this perspective, the PDF of the input uncertainties are modeled by a mixture of Gaussians bishop2006pattern; kurtz2013cross
. Characterization of each component of the mixture includes the computation of the mean and covariance, and an active learning method is hereby proposed to accelerate the parameter estimation, where the means of these Gaussian components are directly estimated by a lowdiscrepancy sequence or a clustering algorithm
robert2013monte; zhang2013structural; kaufman2009finding, and the optimal covariances are obtained through a complexityreduced expectation maximization (EM) algorithm
dempster1977maximum; mclachlan2007algorithm. Next, the evolution of each Gaussian component is expressed in a convolution format that produces an evolutionary kernel li2004probability; wan2000unscented. The integral can be efficiently solved by the thirddegree sphericalradial cubature rule arasaratnam2009cubature; jia2013high. Finally, by assembling the resolved integral results of each component provides the evolutionary PDF of interest.Compared to the aforementioned macroscale evolution and microscale trajectory, this mesoscale parcel scheme has two advantages. First, mixture modeling can provide asymptotically converged approximations to a given arbitrary probability distribution
bishop2006pattern; kaufman2009finding. Second, the computational complexity of tracking the evolution of the PDF of interest is much lower than the samplingbased methods where the number of samples required by the MC method in order to adequately estimate the PDF increases significantly xiu2002wiener; robert2013monte. Moreover, mesoscale parcel is an effective alternative at an intermediate scale to the macro/microscale methods, where the macroscale evolution can be regarded as a special case of the mesoscale representation, in which the statistical structure only contains one component, and the microscale trajectory can be cast in the mesoscale format with as many components as the number of samples and Dirac delta function as an indicator. Fig. 1 gives a graphic illustration of these three perspectives.The paper is organized as follows. Section 2 provides the definition of the problems of interest. Section 3 gives the computational guidelines of the proposed mesoscale scheme. Section 4 interprets the mesoscale uncertainty propagation of two special cases: conservative and Markov models. Section 5 provides examples of application to demonstrate the effectiveness of the proposed mesoscale based method. Finally, conclusion is drawn in Section 6 with some discussion on the relative advantages and limitations of the proposed scheme, and potential future works.
2 Methodology: a mesoscale probability evolution scheme
For notational brevity, let us formulate a stochastic dynamical system in a general model form strogatz2001nonlinear:
(3) 
where is an input random vector and represents the corresponding output at time instance , which also is a random vector.
The goal is to obtain the evolutionary PDF of . Hence, a governing equation with respect to the instantaneous PDF should first be derived li2004probability
. In the case of continuous random variables, such PDF can be written in a convolution form as
li2009stochastic:(4) 
Solving Eq. 4 in the context of the physical model stated in Eq. 3 using the principle of probability preservation gives the integral expression of the instantaneous PDF in the random input space:
(5) 
where is the Kronecker delta function and is the joint PDF describing the input uncertainties, which assumed to be available in this work.
The main point is to find a computationally efficient representation of to compute the evolutionary PDF of . This naturally leads to a geometric segmentation, and mixture modeling that learns a probabilistic model by assuming the data are generated from a mixture of distributions is adopted here bishop2006pattern; kaufman2009finding. Hence, is approximated through a convex combination of a series of weighted base distributions as:
(6) 
with denoting the weighting factor for the base distribution. Additionally, equality along with box constraints of are imposed as:
(7) 
In practice, normal or multivariate normal distribution function is extensively used as Gaussian mixture model (GMM) is capable of approximating any arbitrary density function when sufficient base terms have been included
dempster1977maximum; mclachlan2007algorithm. Accordingly, Eq. 6 can be rewritten as:(8) 
Henceforth, the rest work is to use optimization methods to identify the parameters and substitute optimized parameters to Eq. 5 to compute the instantaneous PDF:
(9) 
3 Computational algorithms and implementation details
The computational algorithm of the proposed mesoscale scheme contains two major steps. First, one has to determine the weight, location, and scale of each mesoscale component. Second, the optimized mesoscale components are integrated into a single kernel by Eq. 9. To ensure the precision of the constructed kernel regarding describing the dynamical behaviors, the number of the component PDFs has to be sufficiently large kurtz2013cross; kaufman2009finding. As a result, effective optimization of model parameters is a prerequisite in this case as conventional algorithms such as expectationmaximization by definition involve the optimization of a larger number of parameters dempster1977maximum; mclachlan2007algorithm.
To reduce the computational complexity, weighting factors are assumed to be homogeneous, that is, in Eq. 6 is assumed sufficiently large, and hence we have . Next, the locations can be directly determined by either a lowdiscrepancy sequence or clustering algorithms (Section 3.1), and the only parameters that needed to be optimized at this stage are and evaluation of this reduced model can be efficiently completed by any standard optimization algorithm (Section 3.2). Moreover, sphericalradial integration is introduced to compute the evolutionary kernel, where a thirddegree sphericalradial cubature rule is adopted for efficient numerical integration (Section 3.3).
3.1 Determination of the representative points
To illustrate the connections between location parameters and tessellation/cluster centroids, let us consider an open set . The subdomain set is called a tessellation of if for and . Each is assigned to a point . The representative point set (or reppoint set for brevity) corresponds to a tessellation of and also provides a candidate for . The task of finding can be approximately performed by finding the best partition of bishop2006pattern; kaufman2009finding. A measure of the quality of the partition can be given by the quadrature error that is defined as du1999centroidal:
(10) 
where denotes a general norm and is a Lipschitz constant. If measured through information discrepancy using entropy theory, the quality of partition stated in Eq. 10 can be reformulated as:
(11) 
where
is the cumulative distribution function (CDF) of
and is the indicator function. This discrepancy coincides with the KolmogorovSmirnov distance illian2008statistical. Hence, can be determined in two ways: through a low discrepancy sequence by minimizing the discrepancy (Section 3.1.1), or through clustering points by minimizing the quadrature error (Section 3.1.2). Fig. 2 provides a graphic illustration of these two approaches.3.1.1 Lowdiscrepancy sequence approach
The concept of low discrepancy sequence (LDS) is central to the quasiMonte Carlo method (QMC) and to the number theoretic method (NTM) robert2013monte; zhang2013structural; illian2008statistical
. An LDS is basically a uniformly distributed point set
generated in a unit hypercube , where is the dimension of the input random space. The reppoint set for distributions different from the uniform can be obtained via transformation as:(12) 
where is a transformation function. Several effective LDSs, e.g. the good lattice point set (GLP), good point set (GP), Halton sequence, Haber sequence, Hammersley sequence, Faure sequence, Sobol sequence, etc, have been shown to provide lower discrepancies than random point sets nie2004new; sudret2008global; wei2015variable; zhang2013structural. may be simply taken as the inverse of the CDF of , that is . Generally, has independent marginal, i.e. . Therefore,
(13) 
where the discrepancy of concerning is the same as the one for . Techniques for generating for elliptically contoured and multivariate Liouville distributions can be found in the literature fang2018symmetric. In particular, for a multivariate standard Gaussian distribution, the BoxMuller transformation provides a good alternative to the method based on the inverse of the CDF. Then, can be taken as the means of the components of a GMM, i.e. . An advantage of this manipulation is that the reppoint set transformed from an LDS can make the components distributed more uniformly than a random point set. A GLP set on is shown in Fig. 2. (a.1), its transformation to a Gaussian distribution is shown in Fig. 2. (a.2), and the corresponding voronoi tessellation is provided in Fig. 2. (a.3).
3.1.2 Clustering analysis approach
While the transformation of an LDS usually suits independent random variables, clustering methods provide an approach for dependent random variables kaufman2009finding. In essence, clustering methods aim to minimize the last term of Eq. 10 instead of minimizing the discrepancy dempster1977maximum; mclachlan2007algorithm
. For example, for the Kmeans clustering algorithm, this minimization is equivalent to seeking the mass centroid of
, i.e.(14) 
and it can be implemented as follows: (1) Initialize the cluster centers from by randomly sampling from ; (2) Randomly sample an auxiliary point set from , where ; (3) Assign respectively to their nearest cluster centers based on the Eulerian distance; (4) Update each center with the mean of the auxiliary points assigned to it; (5) Repeat step (3) and (4) until no longer changes. The second row of Fig. 2 graphically illustrates the connection between cluster centroids and the proposed mesoscale statistical structures.
3.2 Determination of the Gaussian components
With computed location parameters, the approximation capability of the constructed mixture model is primarily determined by the complexity of the selected kernel bishop2006pattern; kaufman2009finding. There are three types of kernels, namely, homogeneous kernel, inscribed kernel, and adaptive kernel. Fig. 3 gives a summary of these kernels.
To demonstrate the influences that model configuration has on the approximation capability, we resort to the concept of Gaussian complexity, which is defined as eldan2018gaussian:
(15) 
where are independent standard Gaussian random variables and represents the set of interest.
According to eldan2018gaussian, the approximation space of interest to us is a function takes values of order and whose Lipschitz constant is . It is clear that such functions trivially have complexity at most . Hence, the kernel type and kernel numbers are the most important factors that should be addressed in the implementation. Fig. 4
graphically illustrates the influences of these two factors. Note that each circle represents a Gaussian component, for instance, homogeneous kernels indicate the same variance has been assigned. Also, the heatmap comparison is based on the complexity of the configured model using
Eq. 15.For numerical implementation, the optimal set of is assumed given by . The optimal set of can be determined by the EM algorithm. It should be observed that EM could be used to determine all the parameters, including the weights , means and covariances dempster1977maximum; mclachlan2007algorithm. However, as anticipated, in this case and are determined in advance, therefore significantly reducing the complexity of the optimization problem and allowing the use of more kernels. The corresponding EM algorithm can be implemented according to the steps stated in Appendix A.
3.3 Determination of the evolutionary kernels
Similarly to what is done in macroscale methods, the secondorder statistics can be used to describe each evolutionary kernel given by Eq. 9. Specifically, the objective is the estimation of the first two statistical moments of the system states , i.e., the mean
(16) 
and the variance
(17)  
where denotes the number of auxiliary points assigned to the evolutionary kernel and denotes the auxiliary point with weight for .
The auxiliary points can be chosen from an LDS, a random point set, a sigmapoint set or a cubature point set illian2008statistical. A Gaussian cubature point set is favorable here since the integrand is a Gaussian density function. A Gaussian density function has a symmetric and radial shape. This feature facilitates efficient integration rules arasaratnam2009cubature; jia2013high. Explicitly, an integral with the integrand of the form
(18) 
where is some arbitrary nonlinear function, denotes the region of integration, and is the known weighting function can be transformed into a sphericalradial integration. Therefore, Eq. 16 can be expressed as:
(19) 
with
(20) 
where represents the number of cubature points used in the sphericalradial cubature, while and are constants to be determined arasaratnam2009cubature; jia2013high. If has mean and covariance , the above integral can be further simplified to:
(21) 
Through a thirddegree sphericalradial cubature rule with and covariance , Eq. 21 writes as:
(22) 
with
(23) 
Note where represents the dimension of the input random space and denotes the vector:
(24) 
(25) 
Substituting these back into Eq. 16 and Eq. 17 allows the determination of the evolutionary kernel. The global probability evolution is obtained by assembling all the evolutionary kernels. Fig. 5. (a) illustrates a 2dimensional cubature point set of a component PDF. The reppoints and their corresponding cubature points of a GMM of the bivariate Gaussian PDF are shown in Fig. 5. (b).
4 Mesoscale perspective of two classical typologies of probability evolution
4.1 Mesoscale scheme for conservative models
A conservative model places the randomness of a stochastic system in an augmented random initial condition. Particularly, random model parameters can be written in a state space where is derived from a random initial condition via deterministic dynamics li2009stochastic; ghanem2003stochastic. The random excitations can also be written in a state space by a series expansion, where the random coefficients are regarded as random initial conditions and the deterministic basis functions are regarded as deterministic dynamics melchers2018structural; xiu2002wiener. Thus, , , and can be assembled and formulated in an augmented state as:
(26) 
with random initial condition
(27) 
It is understood that a dynamic stochastic system with given states (Eq. 26) and initial conditions (Eq. 27) can be expressed in a Lagrangian differential format grigoriu2013stochastic:
(28) 
where and are respectively the augmented velocity function and state function of initial condition .
To compute the probability evolution of a conservative model from a mesoscale perspective (Eq. 6), the input PDF is expressed as:
(29) 
Correspondingly, the evolutionary PDF is derived in a similar manner to Eq. 9 and it writes as:
(30) 
where
(31) 
Using the thirdorder Gaussian cubature rule arasaratnam2009cubature, the evolutionary kernel is represented with mean
(32) 
and covariance
(33) 
where
(34) 
Remark (1).
In the view of statespace modeling, mesoscale scheme deals with conservative models as a nonlinear filter, i.e. cubature Kalman filter (CKF). The multidimensional integrals involved in the time(predictive density) and measurement(posterior density) Bayesian updating of the CKF are efficiently addressed via the cubature rule. Such a derivativefree method broadens the applicability of the proposed mesoscale scheme
wan2000unscented; arasaratnam2009cubature; jia2013high.4.2 Mesoscale scheme for Markov models
The second type of model that is closely connected to the proposed mesoscale scheme is the Markov model grigoriu2013stochastic. By definition, Markov models posit that the randomness of a stochastic system is injected sequentially by random excitations. The initial condition may also be random. Usually, a Markov model is written as
(35) 
where dentoes the random excitation. It is a Wiener process with mean and covariance . Moreover, such a Wiener process holds:
(36) 
In a similar manner(Eq. 6), the input PDF is and it is expressed by the mesoscale representation:
(37) 
Accordingly, the probability evolution is:
(38) 
(39)  
where denotes the integral domain, including infinite time slices from to . With the derivativefree discretization, the secondorder statistics become:
(40) 
and
(41) 
where
(42) 
Remark (2).
Note that the integral domain is not but , which is timevarying. Therefore, a new GMM should be constructed for as each random excitation is injected eldan2018gaussian. Considering that is an additive noise and independent of , a GMM can be initially constructed for and then updated according to . However, instead of constructing a series of new GMMs between time and , a simpler alternative method can be followed fang2018symmetric. The computational details are given in Appendix B.
5 Numerical examples
On the basis of aforementioned fundamentals, four examples have been presented here, some of which have been used by others to illustrate various features of the response. The example built upon the first one addressing a linear transformation followed by a nonlinear transformation, a Duffing oscillator and a nonlinear system with uncertain parameters under random excitation.
5.1 Example i@: Linear Transformation
Let us consider a linear transformation like:
(43)  
where with mean and . This linear transformation converts an uncorrelated Gaussian distribution into a correlated one. The inverse of Eq. 43 gives:
(44)  
The Jacobian of the transformation is and the evolutionary PDF is:
(45) 
Here we use a Good lattice points (GLP) set with 89 points to determine the mean of the GMM for . Each evolutionary kernel is calculated via a thirdorder Gaussian cubature rule. The analytical and mesoscale solutions of are shown in Fig. 6.(a) and (b) respectively. Fig. 6
.(c) plots them together to give a better comparison. In addition, the Kernel density estimation (KDE) solution for
based on quasi monte carlo (QMC) with the same GLP set is also calculated, as shown in Fig. 6.(d). The KDE solution is obtained by minimizing MSE with Gaussian kernel density functions. Note that the KDE solution shows a false multimodal nature. A possible reason is that this microscale method loses partial information of around each particular point of the GLP set, and fails to record its evolution, which is thus not exactly reflected in the KDE solution. By contrast, the proposed mesoscale method maintains the information on the PDF and can record the evolution precisely.5.2 Example ii@: Nonlinear Transformation
In this example, we consider the nonlinear transformation:
(46)  
where the input uncertainties take the same distribution stated in the previous example. By inverting Eq. 46 two real roots for can be found, as:
(47)  
The Jacobian in this case has the same value for both roots:
(48) 
Therefore, the nonlinearly transformed probability distribution can be expressed as:
(49) 
Similarly, GLP set and thirdorder Gaussian cubature rule are used to determine . The analytical and mesoscale solutions are shown in Fig. 7.(a) and (b) respectively, where the size of the mesh is . Fig. 7.(c) gives a contour plot and Fig. 7.(d) shows a KDE solution, where the bandwidth of the kernelsmoothing window is set to . In can be observed that a false singlemode PDF rather than the exact doublemode one is computed via the normal Kernel smoothing function while mesoscale method have successfully identified the existence of two extreme regions.
5.3 Example iii@: Duffing Oscillator
Consider a Duffing oscillator subjected to a Gaussian white noise excitation, governed by equation
caughey1971nonlinear; bergman1992robust:(50) 
where the nominal parameters are , , , and . The initial condition is a Gaussian random vector with mean and covariance:
(51) 
The analytical solution of the stationary PDF is given by:
(52) 
This example has been studied in the past using finite element method, which showed a distinct advantage over Monte Carlo simulation in problems of small dimension like this one. Here this problem has been revisited using the mesoscale based methodology. The initial PDF is constructed by GMM with 350 component PDFs, and then updated at each time step (0.015 seconds). Fig. 8 shows the sampling trajectories of the PDF from a meso/microscale perspective. The computational algorithm provided in the Appendix B is used to calculate the influence from . Therefore, the total number of simulations is , where 4 is the number of cubature points for each evolutionary kernel (See Fig. 5 for details). The total number is lower than the usual number of nodes in a finite element method.
As time progresses, approaches a stationary PDF, whose mesoscale solution is depicted in Fig. 9. (c). Meanwhile, the mesoscale approximation of the input uncertainties are described Fig. 9. (a). Differences between the analytical expression and the mesoscale solutions are provided for comparison. The results suggest a good match between the two solutions. Furthermore, it can be observed that the error distribution reflects the locations of the adopted mesoscale components. The maximum error is in a relatively small range, demonstrating the effectiveness of the proposed PDF evolution modeling scheme.
5.4 Example iv@: Nonlinear Structure with Uncertain Parameters Subjected to Random Excitation
In the last example, let us consider a 10story 2bay uncertain shear frame subjected to random ground motion li2009stochastic. The lumped masses, , are listed in Table 1:
0.5  1.1  1.1  1  1  1.1  1.3  1.2  1.2  1.2 
Meanwhile, other structural characteristics are given as: ; column square crosssection with dimension ; beams with infinite stiffness. The damping matrix is , where and are respectively the mass matrix and the initial stiffness matrix. Assume and . The BoucWen model is used to model the restoring force as:
(53) 
where is the initial stiffness; is the interstory drift and is the hysteretic component satisfying:
(54) 
in which the parameters take the value , , , , and . The initial Young’s modulus is an uncertain parameter. The random ground motion is represented by a randomly scaled El Centro record with peak ground acceleration (PGA) as the random variable. The total probabilistic information is listed in Table 2.
Parameter  Distribution  Mean  C.O.V. 
E  Normal  Pa  
PGA  Normal 
As previously discussed in Section 4.1, this system can be recast into a conservative model by placing the total random variables in an augmented random initial condition. For comparison, the mesoscale method and QMC are used to compute the probability evolution. The secondorder statistics of the top floor displacement calculated by both methods are shown in Fig. 10.(b) and (c). Note that both methods give similar secondorder statistics. In addition, the mesoscale method can also provide the evolution of probability as shown in Fig. 10.(d), which is not given by QMC.
6 Concluding remarks
In this paper, the probability evolution has been investigated from a mesoscale perspective. The proposed scheme is able not only to maintain the information concerning the PDF with an equivalent statistical structure but also to track the evolution of this statistical structure. This is accomplished by utilizing a Gaussian mixture model (GMM) for the representation of the input PDF, and solving a series of mixtures of Gaussian integrals for the probability evolution. We have demonstrated that such mesoscale computation frees the macroscale methods from the limitations of the secondorder statistics and enhancing the expressibility of microscale methods by maintaining the PDFs around samples. Furthermore, the connection between the proposed mesoscale scheme to the standard conservative and Markov models has been established in the context of stochastic modeling. The thirddegree sphericalradial cubature rule is introduced to further reduce the number of parameters that are involved in the optimization of GMM. The efficacy of the mesoscale method is verified by several examples. In summary, the provided results have demonstrated the merit of the proposed method and a new mesoscale perspective regarding examining the probability evolution it offers.
For the further extensions to this work, future studies may address issues as: (1) generalizing the method to the probabilistic regime, i.e. incorporating the Bayesian framework into the current scheme; (2) scaling the uncertainty quantification process to an ultra highdimensional stochastic process via the adoption of deep latent space model, variational autoencoder (VAE), generative adversarial networks (GANs), etc; and (3) applying the method to a broader range of different types of engineering problems.
Acknowledgement
This research has been supported in part by the National Science Foundation under Grant Agreement No. 1520817 and No. 1612843. A. Kareem gratefully acknowledges the financial support of Robert M Moran professorship.
References
a Expectationmaximization algorithm for covariances computation
Expectationmaximization (EM) algorithm is closely related to the maximum likelihood estimation (MLE) that is an effective approach that underlies many machine learning algorithms. The EM algorithm performs MLE for learning parameters in probabilistic models with latent variables. With determined
and (See Section 3.2), using EM algorithm to optimize can be summarized to six steps:
Initialize the set as .

If is an LDS, then randomly choose an initial set and an auxiliary point set , where .

is a cluster set, then set the initial value of as:
(55) where is the auxiliary point set assigned to the cluster, with .


Estimate the log likelihood:
(56) 
Evaluate the responsibilities (expectation step):
(57) 
Reestimate the following parameters (maximization step):
(58) where . Then reestimate the log likelihood with the given by Eq. 58.

Check for convergence for either the parameters or the log likelihood.

If not converged, repeat steps ; if converged, set .
B Computational procedures for Markov models
The overall updating scheme for computing the first two statistical moments can be summarized to two steps.

Utilize the samples of the additive noise to estimate the updated values of the mean and covariance of each evolutionary kernel:
(64) and
Comments
There are no comments yet.