1 Introduction
Complex nonlinear dynamic models with an intractable likelihood or moments are increasingly common in economics. A popular approach to estimating these models is to match informative sample moments with simulated moments from a fully parameterized model using SMM. However, economic models are rarely fully parametric since theory usually provides little guidance on the distribution of the shocks. The Gaussian distribution is often used in applications but in practice, different choices of distribution may have different economic implications; this is illustrated below. Yet to address this issue, results on semiparametric simulationbased estimation are few.
This paper proposes a Sieve Simulated Method of Moments (SieveSMM) estimator for both the structural parameters and the distribution of the shocks and explains how to implement it. The dynamic models considered in this paper have the form:
(1)  
(2) 
The observed outcome variable is , are exogenous regressors and
is a vector of unobserved latent variables. The unknown parameters include
, a finite dimensional vector, and the distribution of the shocks . The functions are known, or can be computed numerically, up to and . The SieveSMM estimator extends the existing SieveGMM literature to more general dynamics with latent variables and the literature on sieve simulationbased estimation of some static models.The estimator in this paper has two main building blocks: the first one is a sample moment function, such as the empirical characteristic function (CF) or the empirical CDF; infinite dimensional moments are needed to identify the infinite dimensional parameters. As in the finite dimensional case, the estimator simply matches the sample moment function with the simulated moment function. To handle this continuum of moment conditions, this paper adopts the objective function of
Carrasco & Florens (2000); Carrasco et al. (2007a) in a seminonparametric setting.The second building block is to nonparametrically approximate the distribution of the shocks using the method of sieves, as numerical optimization over an infinite dimension space is generally not feasible. Typical sieve bases include polynomials and splines which approximate smooth regression functions. Mixtures are particularly attractive to approximate densities for three reasons: they are computationally cheap to simulate from, they are known to have good approximation properties for smooth densities, and draws from the mixture sieve are shown in this paper to satisfy the
smoothness regularity conditions required for the asymptotic results. Restrictions on the number of mixture components, the tails and the smoothness of the true density ensure that the bias is small relative to the variance so that valid inferences can be made in large samples. To handle potentially fat tails, this paper also introduces a Gaussian and tails mixture. The tail densities in the mixture are constructed to be easy to simulate from and also satisfy
smoothness properties. The algorithm below summarizes the steps required to compute the estimator.To illustrate the class of models considered and the usefulness of the mixture sieve for economic analysis, consider the first empirical application in Section 5 where the growth rate of consumption is assumed to follow the following process:
(3)  
(4) 
Compared to the general model (1)(2), the corresponds to the outcome , the latent variable is and the parameters are . This very simple model, with a flexible distribution for the shocks , can explain the low level of the riskfree rate with a simple power utility and recent monthly data. In comparison, the LongRun Risks models relies on more complex dynamics and recursive utilities (Bansal & Yaron, 2004) and the Rare Disasters literature involves hard to quantify very large, low frequency shocks (Rietz, 1988; Barro, 2006b). Empirically, the SieveSMM estimates of distribution of in the model (3)(4) implies both a larger higher welfare cost of business cycle fluctuations and an annualized riskfree rate that is up to 4 percentage points lower than predicted by Gaussian shocks. Also, in this example the riskfree rate is tractable, up to a quadrature over , when using Gaussian mixtures:
In comparison, for a general distribution the riskfree rate depends on all moments but does not necessarily have closed form. The mixture thus combines flexible econometric estimation with convenient economic modelling.^{1}^{1}1Gaussian mixtures are also convenient in more complicated settings where the model needs to be solved numerically. For instance, all the moments of a Gaussian mixture are tractable and quadrature is easy so that it can be applied to both the perturbation method and the projection method (see e.g. Judd, 1996, for a review of these methods) instead of the more commonly applied Gaussian distribution.
As in the usual sieve literature, this paper provides a consistency result and derives the rate of convergence of the structural and infinite dimensional parameters, as well as asymptotic normality results for finite dimensional functionals of these parameters. While the main results only provide lowlevel conditions for a specific choice of moments and sieve basis, Appendix F provides highlevel conditions which can be used for a larger class of bounded moments and sieve bases. These results also allow to nonparametrically estimate quantities other than the distribution of the shocks. While the results apply to both static and dynamic models alike, two important differences arise in dynamic models compared to the existing literature on sieve estimation: proving uniform convergence of the objective function and controlling the dynamic accumulation of the nonparametric approximation bias.
The first challenge is to establish the rate of convergence of the objective function for dynamic models. To allow for the general dynamics (1)(2) with latent variables, this paper adapts results from Andrews & Pollard (1994) and Ben Hariz (2005) to construct an inequality for uniformly bounded empirical processes which may be of independent interest. It allows the simulated data to be nonstationary when the initial is not taken from the ergodic distribution. It holds under the geometric ergodicity condition found in Duffie & Singleton (1993). The boundedness condition is satisfied by the CF and the CDF for instance. Also, the inequality implies a larger variance than typically found in the literature for iid or strictly stationary data with limited dependence induced by the moments.^{2}^{2}2See Chen (2007, 2011) for a review of sieve Mestimation with iid and dependent data.
The second challenge is that in the model (1)(2) the nonparametric bias accumulates dynamically. At each time period the bias appears because draws are taken from a mixture approximation instead of the true , this bias is also transmitted from one period to the next since depends on . To ensure that this bias does not accumulate too much, a decay condition is imposed on the DGP. For the consumption process (3)(4), this condition holds if both and are strictly less than . The resulting bias is generally larger than in static models and usual sieve estimation problems. Together, the increased variance and bias imply a slower rate of convergence for the SieveSMM estimates. Hence, in order to achieve the rate of convergence required for asymptotic normality, the SieveSMM requires additional smoothness of the true density . Note that the problem of bias accumulation seems quite generic to sieve estimation of dynamic models: if the computation of the moments or likelihood involve a filtering step then the bias accumulates inside the prediction error of the filtered values.^{3}^{3}3This is related to the accumulation of errors studied in the approximation of DSGE models (see e.g. PeraltaAlva & Santos, 2014). Note that in the present estimation context, the error in the moments involves the difference between dimensional integral over the true and the approximated distribution of the shocks which complicates the analysis. This is also related to the propagation of prediction error in the filtering of unobserved latent variables using e.g. the Kalman or Particle filter.
MonteCarlo simulations illustrate the properties of the estimator and the effect of dynamics on the bias and the variance of the estimator. Two empirical applications highlight the importance of estimating the distribution of the shocks. The first is the example discussed above, and the second estimates a different stochastic volatility model on a long daily series of exchange rate data. The SieveSMM estimator suggests notable asymmetry and fat tails in the shocks, even after controlling for the timevarying volatility. As a result, commonly used parametric estimates for the persistence are significantly downward biased which has implications for forecasting; this effect is confirmed by the MonteCarlo simulations.
Related Literature
The SieveSMM estimator presented in this paper combines two literatures: sieve estimation and the Simulated Method of Moments (SMM). This section provide a nonexhaustive review of the existing methods and results to introduce the new challenges in the combined setting.
A key aspect to simulationbased estimation is the choice of moments . The Simulated Method of Moments (SMM) estimator of McFadden (1989) relies on unconditional moments, the Indirect Inference (IND) estimator of Gouriéroux et al. (1993) uses auxliary parameters from a simpler, tractable model and the Efficient Method of Moments (EMM) of Gallant & Tauchen (1996) uses the score of the auxiliary model. Simulationbased estimation has been applied to a wide array of economic settings: early empirical applications of these methods include the estimation of discrete choice models (Pakes, 1986; Rust, 1987), DSGE models (Smith, 1993) and models with occasionally binding constraints (Deaton & Laroque, 1992). More recent empirical applications include the estimation of earning dynamics (Altonji et al., 2013), of labor supply (Blundell et al., 2016) and the distribution of firm sizes (Gourio & Roys, 2014). Simulationbased estimation can also applied to models that are not fully specified as in Berry et al. (1995), these models are not considered in this paper.
To achieve parametric efficiency, a number of papers consider using nonparametric moments but assume the distribution is known.^{4}^{4}4See e.g. Gallant & Tauchen (1996); Fermanian & Salanié (2004); Kristensen & Shin (2012); Gach & Pötscher (2010); Nickl & Pötscher (2011). To avoid dealing with the nonparametric rate of convergence of the moments Carrasco et al. (2007a) use the continuum of moments implied by the CF. This paper uses a similar approach in a seminonparametric setting. In statistics, Bernton et al. (2017) use the Wasserstein, or Kantorovich, distance between the empirical and simulated distributions. This distance relies on unbounded moments and is thus excluded from the analysis in this paper.
General asymptotic results are given by Pakes & Pollard (1989) for SMM with iid data and Lee & Ingram (1991); Duffie & Singleton (1993) for timeseries. Gouriéroux & Monfort (1996) provide an overview of simulationbased estimation methods.
While most of the literature discussed so far deals with fully parametric SMM models, there are a few papers concerned with sieve simulationbased estimation. Bierens & Song (2012) provide a consistency result for SieveSMM estimation of a static firstprice auction model.^{5}^{5}5In order to do inference on , they propose to invert a simulated version of Bierens (1990)
’s ICM test statistic. A recent working paper by
Bierens & Song (2017) introduces covariates in the same auction model and gives an asymptotic normality result for the coefficients on the covariates. Newey (2001) uses a sieve simulated IV estimator for a measurement error model and proves consistency as both and go to infinity. These papers consider specific static models and provide limited asymptotic results. Furthermore, they consider sampling methods for the simulations that are very computationally costly (see Section 2.3 for a discussion).^{6}^{6}6Additionally, an incomplete working paper by Blasques (2011) uses the highlevel conditions in Chen (2007) for a ” SemiNonParametric Indirect Inference” estimator. These conditions are very difficult to verify in practice and additional results are needed to handle the dynamics. Also, to avoid using sieves and SMM in moment conditions models that are tractable up to a latent variable, Schennach (2014) proposes an Entropic Latent Variable Integration via Simulation (ELVIS) method to build estimating equations that only involve the observed variables. Dridi & Renault (2000) propose a SemiParametric Indirect Inference based on a partial encompassing principle.An alternative to using sieves in SMM estimation involves using more general parametric families to model the first 3 or 4 moments flexibly. RugeMurcia (2012, 2017)
considers the skew Normal and the Generalized Extreme Value distributions to model the first 3 moments of productivity and inflation shocks.
Gospodinov & Ng (2015); Gospodinov et al. (2017) use the Generalized Lambda famility to flexibly model the first 4 moments of the shocks in a noninvertible moving avergage and a measurement error model. However, in applications where the moments depend on the full distribution of the shocks, which is the case if the data is nonseparable in the shocks , then the estimates will be sensitive to the choice of parametric family. Also, quantities of interest such as welfare estimates and asset prices that depend on the full distribution will also be sensitive to the choice of parametric family.Another related literature is the sieve estimation of models defined by moment conditions. These models can be estimated using either SieveGMM, Sieve Empirical Likelihood or Sieve Minimum Distance (see Chen, 2007, for a review). Applications include nonparametric estimation of IV regressions^{7}^{7}7See e.g. Hall & Horowitz (2005); Carrasco et al. (2007b); Blundell et al. (2007); Darolles et al. (2011); Horowitz (2011).
, quantile IV regressions,
^{8}^{8}8See e.g. Chernozhukov & Hansen (2005); Chernozhukov et al. (2007); Horowitz & Lee (2007). and the seminonparametric estimation of asset pricing models,^{9}^{9}9See e.g. Hansen & Richard (1987); Chen & Ludvigson (2009); Chen et al. (2013); Christensen (2017). for instance. Existing results cover the consistency and the rate of convergence of the estimator as well as asymptotic normality of functional of the parameters for both iid and dependent data. See e.g. Chen & Pouzo (2012, 2015) and Chen & Liao (2015) for recent results with iid data and dependent data.In the empirical SieveGMM literature, an application closely related to the dynamics encountered in this paper appears in Chen et al. (2013). The authors show how to estimate an Euler equation with recursive preferences when the value function is approximated using sieves. Recursive preferences require a filtering step to recover the latent variable. As in the SieveSMM setting, this has implications for bias accumulation in parameter dependent timeseries properties. Exisinting results, based on coupling methods (see e.g. Doukhan et al., 1995; Chen & Shen, 1998), do not apply to this class of moments and the authors rely on Bootstrap inference without formal justification.
Notation
The following notation and assumptions will be used throughout the paper: the parameter of interest is . The finite dimensional parameter space is compact and the infinite dimensional set of densities is possibly noncompact. The sets of mixtures satisfy , is the data dependent dimension of the sieve set . The dimension increases with the sample size: as . Using the notation of Chen (2007), is the mixture approximation of the density . The vector of shocks has dimension and density . The total variation distance between two densities is and the supremum (or sup) norm is . For simplification, the following convention will be used and , where and correspond the Euclidian norm of and respectively. is a norm on the mixture components: where is the Euclidian norm and are the mixture parameters. For a functional , its pathwise, or Gâteau, derivative at in the direction is , it will be assumed to be continuous in and linear in . For two sequences and , the relation implies that there exists such that for all .
Structure of the Paper
The paper is organized as follows: Section 2 introduces the SieveSMM estimator, explains how to implement it in practice and provides important properties of the mixture sieve. Section 3 gives the main asymptotic results: under regularity conditions, the estimator is consistent. Its rate of convergence is derived, and under further conditions, finite dimensional functionals of the estimates are asymptotically normal. Section B provides two extensions, one to include auxiliary variables in the CF and another to allow for dynamic panels with small . Section 4 provides MonteCarlo simulations to illustrate the theoretical results. Section 5 gives empirical examples for the estimator. Section 6 concludes. Appendix A gives some information about the CF and details on how to compute the estimator in practice as well as identification and additional asymptotic normality results for the stochastic volatility model. Appendix B provides extensions of the main results to moments of auxiliary variables and short panel data. Appendix C provides additional MonteCarlo simulations for short panels. Appendix D provide additional empirical results to the ones presented in the main text. Appendix E provides the proofs to the main results and the extensions. The online supplement includes:^{10}^{10}10The online supplement can be found at http://jjforneron.com/SieveSMM/Supplement.pdf. Appendix F which provides results for more general moment functions and sieve bases and Appendix G which provides the proofs for these results.
2 The SieveSMM Estimator
This section introduces the notation used in the remainder of the paper. It describes the class of DGPs considered in the paper and describes the DGP of the leading example in more details. It discusses the choice of mixture sieve, moments and objective function as well as some important properties of the mixture sieve. The simple running example used throughout the analysis is based on the empirical applications of Section 5.
Example 1 (Stochastic Volatility Models).
In both empirical applications, follows an AR(1) process with lognormal stochastic volatility
The first empirical application estimates a linear volatility process:
The second empirical application estimates a lognormal stochastic volatility process:
In both applications with the restrictions and . The first application approximates with a mixture of Gaussian distributions, the second adds two tail components to model potential fat tails. Using the notation given in (1)(2), the latent variable is given by , where and (or ).
Stochastic volatility (SV) models in Example 1 are intractable because of the latent volatility. With lognormal volatility, the model becomes tractable after taking the transformation (see e.g. Kim et al., 1998) and the problem can be cast as a deconvolution problem (Comte, 2004). However, the transformation removes all the information about asymmetries in , which turn out to empirically significant (see section 5). In the parametric case, alternatives to using the transformation involve Bayesian simulationbased estimators such as the Particle Filter and Gibbs sampling or EMM for frequentist estimation.
2.1 Sieve Basis  Gaussian and Tails Mixture
The following definition introduces the Gaussian and tails mixture sieve that will be used in the paper. It combines a simple Gaussian mixture with two tails densities which model asymmetric fat tails parametrically. Drawing from this mixture is computationally simple: draw uniforms and gaussian random variables, switch between the Gaussians and the tails depending on the uniform and the mixture weights
. The tail draws are simple functions of uniform random variables.Definition 1 (Gaussian and Tails Mixture).
A random variable follows a component Gaussian and Tails mixture if its density has the form:
where is the standard Gaussian density and its left and right tail components are
with for and for . To simulate from the Gaussian and tails mixture, draw , and compute and . Then, for :
follows the Gaussian and tails mixture .
For applications where fat tails are deemed unlikely, as in the first empirical application, the weights can be set to zero to get a Gaussian only mixture. If and then the left and right tails satisfy:
When then draws from the tail components have finite expectation, they also have finite variance if . More generally, for the th moment to be finite, , is necessary. Gallant & Nychka (1987) also add a parametric component to model fat tails by mixing a Hermite polynomial density with a Student density. Neither the Hermite polynomial nor the Student distribution have closedform quantiles, which is not practical for simulation. Here, the densities are constructed to be easy to simulated from. The tail indices will be estimates along with the remaining parameters of the mixture distribution.
The indicator function introduces discontinuities in the parameter . Standard derivativefree optimization routines such as the NelderMead algorithm (Nelder & Mead, 1965) as implemented in the NLopt library of Johnson (2014) can handle this estimation problem as illustrated in Section 4.^{11}^{11}11The NLopt library is available for C++, Fortran, Julia, Matlab, Python and R among others.
In the finite mixture literature, mixture components are known to be difficult to identify because of possible label switching and the likelihood is globally unbounded.^{12}^{12}12See e.g. McLachlan & Peel (2000) for a review of estimation, identification and applications of finite mixtures. See also Chen et al. (2014b) for some recent results. Using the characteristic function rather than the likelihood resolves the unbounded likelihood problem as discussed in Yu (1998). More importantly, the object of interest in this paper is the mixture density itself rather than the mixture components. As a result, permutations of the mixture components are not a concern since they do not affect the density .
2.2 Continuunm of Moments and Objective Function
As in the parametric case, the moments need to be informative enough to identify the parameters. In SieveSMM estimation, the parameter is infinite dimensional so that no finite dimensional vector of moments could possibly identify . As a result, this paper relies on moment functions which are themselves infinite dimensional.
The leading choice of moment function in this paper is the empirical characteristic function for the joint vector of lagged observations :
where is the imaginary number such that .^{13}^{13}13The moments can also be expressed in terms of sines and cosines since .
The CF is onetoone with the joint distribution of
, so that the model is identified by if and only if the distribution of identifies the true . Using lagged variables allows to identify the dynamics in the data, Knight & Yu (2002) show how the characteristic function can identify parametric dynamic models. Some useful properties of the CF are given in Appendix A.1.Besides the CF, another choice of bounded moment function is the CDF. While the CF is a smooth transformation of the data, the empirical CDF has discontinuities at each point of support of the data which could make numerical optimization more challenging. Also, the CF around summarizes the information about the tails of the distribution (see Ushakov, 1999, page 30). This information is thus easier to extract from the CF than the CDF. The main results of this paper can be extended to any bounded moment function satisfying a Lipschitz condition.^{14}^{14}14Appendix F allows for more general nonLipschitz moment functions and other sieve bases. However, the conditions required for these results are more difficult to check.
Since the moments are infinite dimensional, this paper adopts the approach of Carrasco & Florens (2000); Carrasco et al. (2007a) to handle the continuum of moment conditions:^{15}^{15}15Carrasco & Florens (2000) provide a general theory for GMM estimation with a continuum of moment conditions. They show how to efficiently weight the continuum of moments and propose a Tikhonov (ridge) regularization approach to invert the singular variancecovariance operator. Earlier results, without optimal weighting, include Koul (1986) for minimum distance estimation with a continuum of moments.
(5) 
The objective function is a weighted average of the square norm between the empirical and the simulated moment functions. As discussed in Carrasco & Florens (2000) and Carrasco et al. (2007a), using the continuum of moments avoids the problem of constructing an increasing vector of moments. The weighting density is chosen to be the multivariate normal density for the main results. Other choices for are possible as long as it has full support and is such that
. As an example, the exponential distribution satisfies these two conditions, while the Cauchy distribution does not satisfy the second. In practice, choosing
to be the Gaussian density with same mean and variance as gave satisfying results in Sections 4 and 5.^{16}^{16}16MonteCarlo experiments not reported in this paper showed similar results when using the exponential density for instead of the Gaussian density. In the appendix, the results allow for a bounded linear operator which plays the role of the weight matrix in SMM and GMM as in Carrasco & Florens (2000). Carrasco & Florens (2000); Carrasco et al. (2007a) provide theoretical results for choosing and approximating the optimal operator in the parametric setting. Similar work is left to future research in this seminonparametric setting.Given the sieve basis, the moments and the objective function, the estimator is defined as an approximate minimizer of :
(6) 
where and corresponds to numerical optimization and integration errors. Indeed, since the integral in (5) needs to be evaluated numerically, some form of numerical integration is required. Quadrature and sparse quadrature were found to give satisfying results when is not too large (less than ). For larger dimensions, quasiMonteCarlo integration using either the Halton or Sobol sequence gave satisfying results.^{17}^{17}17See e.g. Heiss & Winschel (2008); Holtz (2011) for an introduction to sparse quadrature in economics and finance, and Owen (2003) for quasiMonteCarlo sampling. All MonteCarlo simulations and empirical results in this paper are based on quasiMonteCarlo integration. Additional computional details are given in Appendix A.2.
Example 1 (Continued) (Stochastic Volatility).
The following illustrates the steps involved in SieveSMM Algorithm for the stochastic volatility model with a Gaussian only mixture:

fix , and ,

construct a grid , e.g. BoxMuller transformed Sobol sequence,

compute the sample Characteristic Function over the grid

draw , and, where

minimize the objective , computed as follows:

compute

simulate using and

compute as above and

2.3 Approximation Rate and Smoothness of the Mixture Sieve
This subsection provides more details on the approximation and smoothness properties of the mixture sieve. It also provides the necessary restrictions on the true density to be estimated. Gaussian mixtures can approximate any smooth univariate density but the rate of this approximation depends on both the smoothness and the tails of the density (see e.g. Kruijer et al., 2010)
. The tail densities parametrically model asymmetric fat tails in the density. This is useful in the second empirical example where exchange rate data may exhibit larger tails. The following lemma extends the approximation results of
Kruijer et al. (2010) to multivariate densities with independent components and potentially fat tails.Lemma 1 (Approximation Properties of the Gaussian and Tails Mixture).
Suppose that the shocks are independent with density . Suppose that each marginal can be decomposed into a smooth density and the two tails of Definition 1:
Let each satisfy the assumptions of Kruijer et al. (2010):

Smoothness: is times continuously differentiable with bounded th derivative.

Tails: has exponential tails, i.e. there exists such that:

Monotonicity in the Tails: is strictly positive and there exists such that is weakly decreasing on and weakly increasing on .
and for all . Then there exists a Gaussian and tails mixture satisfying the restrictions of Kruijer et al. (2010):

Bandwidth: .

Location Parameter Bounds: with
such that as :
where or .
The space of true densities satisfying the assumptions will be denoted as and is the corresponding space of Gaussian and tails mixtures .
Note that additional restrictions on
may be required for identification, such as mean zero, unit variance or symmetry. The assumption that the shocks are independent is not too strong for structural models where this, or a parametric factor structure is typically assumed. Note that under this assumption, there is no curse of dimensionality because the components
can be approximated separately. Also, the restriction is only required for the approximation in supremum normAn important difficulty which arises in simulating from a nonparametric density is that draws are a very nonlinear transformation of the nonparametric density . As a result, standard regularity conditions such as Hölder and smoothness are difficult to verify and may only hold under restrictive conditions. The following discusses these regularity conditions for the methods used in the previous literature. Then, a smoothness result for the mixture sieve is provided in Lemma 2 below.
Bierens & Song (2012) use Inversion Sampling: they compute the CDF from the nonparametric density and draw
. Computing the CDF and its inverse to simulate is very computationally demanding. Also, while the CDF is linear in the density, its inverse is a highly nonlinear transformation of the density. Hence, Hölder and
smoothness results for the draws are much more challenging to prove without further restrictions.Newey (2001) uses Importance Sampling for which Hölder conditions are easily verified but requires for consistency alone. Furthermore, the choice of importance distribution is very important for the finite sample properties (the effective sample size) of the simulated moments. In practice, the importance distribution should give sufficient weight to regions for which the nonparametric density has more weight. Since the nonparametric density is unknown exante, this is hard to achieve in practice.
Gallant & Tauchen (1993) use Accept/Reject (outside of an estimation setting): however, it is not practical for simulationbased estimation. Indeed, the required number of draws to generate an accepted draw depends on both the instrumental density and the target density . The latter varies with the parameters during the optimization. This also makes the smoothnes properties challenging to establish. In comparison, the following lemma shows that the required smoothness condition is satisfied by draws from a mixture sieve.
Lemma 2 (Smoothness of Simulated Mixture Sieves).
Suppose that
with and , and . If then there exists a finite constant which only depends on such that:
Lemma 2 is key in proving the smoothness conditions of the moments required to establish the convergence rate of the objective function and stochastic equicontinuity results. Here, the smoothness constant depends on both the bound and the number of mixture components .^{18}^{18}18See e.g. Andrews (1994); Chen et al. (2003) for examples of smooth functions. Kruijer et al. (2010) showed that both the total variation and supremum norms are bounded above by the pseudonorm on the mixture parameters up to a factor which depends on the bandwidth . As a result, the pseudonorm controls the distance between densities and the simulated draws as well. Furthermore, draws from the tail components are shown in the Appendix to be smooth in . Hence, draws from the Gaussian and tails mixture are smooth in both and .
3 Asymptotic Properties of the Estimator
This section provides conditions under which the SieveSMM estimator in (6) is consistent, derives its rate of convergence and asymptotic normality results for linear functionals of .
3.1 Consistency
Consistency results are given under lowlevel conditions on the DGP using the Gaussian and tails mixture sieve with the CF.^{19}^{19}19Consistency results allowing for nonmixture sieves and other moments are given in Appendix F.1. First, the population objective is:
(7) 
The objective depends on because are not covariance stationary: the moments can depend on . Under geometric ergodicity, it has a welldefined limit:^{20}^{20}20Since the CF is bounded, the dominated convergence theorem can be used to prove the existence of the limit.
In the definition of the objective and its limit , the expectation is taken over both the data and the simulated samples . The following assumption, provide a set of sufficient conditions on the true density , the sieve space and a first set of conditions on the model (identification and timeseries properties) to prove consistency.
Assumption 1 (Sieve, Identification, Dependence).
Suppose the following conditions hold:

(Sieve Space) the true density and the mixture sieve space satisfy the assumptions of Lemma 1 with as and is compact and .

(Identification) where is the Gaussian density. For any and for all , is strictly positive and weakly decreasing in both and .

(Dependence) is strictly stationary and mixing with exponential decay, the simulated are geometrically ergodic, uniformly in .
Condition i. is stronger than the usual condition in the sieve literature (see e.g Chen, 2007). The additional term comes from the nonlinearity of the mixture sieve. The fourth power is due to the dependence: the inequality in Lemma G15 provides a bound of order instead of for iid data.
Condition ii. is the usual identification condition. It is assumed that the information from the joint distribution of uniquely identifies . Proving general global identification results is quite challenging in this setting and is left to future research. Local identification in the sense of Chen et al. (2014a) is also challenging to prove here because the dynamics imply that the distribution of is a convolution of with the distribution of . Since the stationary distributions of and are the same, the resulting distribution is the fixed point of its convolution with . This makes derivatives with respect to difficult to compute in many dynamic models. Note that the identification assumption does not exclude illposedness.^{21}^{21}21See e.g. Carrasco et al. (2007b) and Horowitz (2014) for a review of illposedness in economics. The space is assumed to include the necessary restrictions (if any) for identification such as mean zero and unit variance. Global identification results for the stochastic volatility model in Example 1 are given in Appendix A.4.
Condition iii. is common in SMM estimation with dependent data (see e.g. Duffie & Singleton, 1993). In this setting, it implies two important features: the simulated are mixing (Liebscher, 2005), and the initial condition bias is negligible: .^{22}^{22}22See Proposition F5 in the supplemental material for the second result.
Assumption 2 (Data Generating Process).
Conditions y(ii), u(ii) correspond to the usual Hölder conditions in GMM and Mestimation but placed on the DGP itself rather than the moments. Since the cosine and sine functions are Lipschitz, it implies that the moments are Hölder continuous as well.^{23}^{23}23For any choice of moments that preserve identification and are Lipschitz, the main results will hold assuming and are bounded. For both the Gaussian and the exponential density, these quantities turn out to be bounded. In general Lispchitz transformations preserve smoothness properties (see e.g. Andrews, 1994; van der Vaart & Wellner, 1996), here additional conditions on are required to handle the continuum of moments with unbounded support.
The decay conditions y(i), u(i) together with condition y(iii) ensure that the differences due to do not accumulate too much with the dynamics. As a result, keeping the shocks fixed, the Hölder condition applies to as a whole. It also implies that the nonparametric approximation bias does not accumulate too much. These conditions are similar to Duffie & Singleton (1993)’s Unit Circle condition which they propose as an alternative to geometric ergodicity for uniform LLNs and CLTs. The decay conditions play a crucial role here since they control the nonparametric bias of the estimator.
Condition u(iii) ensures that the DGP preserves the smoothness properties derived for mixture draws in Lemma 2. This condition does not appear in the usual sieve literature which does not simulate from a nonparametric density. In the SMM literature, a Lipschitz or Hölder condition is usually given on the moments directly. Note that a condition analogous to u(iii) would also be required for SMM estimation of a parametric distribution.
Assumption 2 does not impose that the DGP be smooth. This allows for kinks in or as in the sample selection model or the models of Deaton (1991) and Deaton & Laroque (1992). Assumption 2 in Appendix E.2 extends Assumption 2 to allow for possible discontinuities in . The following shows how to verify the conditions of Assumption 2 in Example 1 with volatility shocks.^{24}^{24}24Some additional examples are given in Appendix F.4. They are not tied to the use of mixtures, and as a result, impose stronger restrictions on the density such as bounded support.
Example 1 (Continued) (Stochastic Volatility).
implies y(i) holds. Also:
and thus condition y(ii) is satisfied assuming is bounded. Since has mean zero and unit variance, is bounded if , and for some . For condition y(iii), take and :