DeepAI

# A Sieve-SMM Estimator for Dynamic Models

This paper proposes a Sieve Simulated Method of Moments (Sieve-SMM) estimator for the parameters and the distribution of the shocks in nonlinear dynamic models where the likelihood and the moments are not tractable. An important concern with SMM, which matches sample with simulated moments, is that a parametric distribution is required but economic quantities that depend on this distribution, such as welfare and asset-prices, can be sensitive to misspecification. The Sieve-SMM estimator addresses this issue by flexibly approximating the distribution of the shocks with a Gaussian and tails mixture sieve. The asymptotic framework provides consistency, rate of convergence and asymptotic normality results, extending existing sieve estimation theory to a new framework with more general dynamics and latent variables. Monte-Carlo simulations illustrate the finite sample properties of the estimator. Two empirical applications highlight the importance of the distribution of the shocks for estimates and counterfactuals.

08/07/2021

### Culling the herd of moments with penalized empirical likelihood

Models defined by moment conditions are at the center of structural econ...
10/09/2022

### Inference in parametric models with many L-moments

L-moments are expected values of linear combinations of order statistics...
10/16/2019

### An Instrumental Variable Estimator for Mixed Indicators: Analytic Derivatives and Alternative Parameterizations

Methodological development of the Model-implied Instrumental Variable (M...
07/14/2021

### Generalized Covariance Estimator

We consider a class of semi-parametric dynamic models with strong white ...
08/10/2019

### Estimation of the Number of Components of Non-Parametric Multivariate Finite Mixture Models

We propose a novel estimator for the number of components (denoted by M)...
04/12/2017

### A Proof of Orthogonal Double Machine Learning with Z-Estimators

We consider two stage estimation with a non-parametric first stage and a...
07/07/2021

### Estimation and Inference in Factor Copula Models with Exogenous Covariates

A factor copula model is proposed in which factors are either simulable ...

## 1 Introduction

Complex nonlinear dynamic models with an intractable likelihood or moments are increasingly common in economics. A popular approach to estimating these models is to match informative sample moments with simulated moments from a fully parameterized model using SMM. However, economic models are rarely fully parametric since theory usually provides little guidance on the distribution of the shocks. The Gaussian distribution is often used in applications but in practice, different choices of distribution may have different economic implications; this is illustrated below. Yet to address this issue, results on semiparametric simulation-based estimation are few.

This paper proposes a Sieve Simulated Method of Moments (Sieve-SMM) estimator for both the structural parameters and the distribution of the shocks and explains how to implement it. The dynamic models considered in this paper have the form:

 yt=gobs(yt−1,xt,θ,f,ut) (1) ut=glatent(ut−1,θ,f,et),et∼f. (2)

The observed outcome variable is , are exogenous regressors and

is a vector of unobserved latent variables. The unknown parameters include

, a finite dimensional vector, and the distribution of the shocks . The functions are known, or can be computed numerically, up to and . The Sieve-SMM estimator extends the existing Sieve-GMM literature to more general dynamics with latent variables and the literature on sieve simulation-based estimation of some static models.

The estimator in this paper has two main building blocks: the first one is a sample moment function, such as the empirical characteristic function (CF) or the empirical CDF; infinite dimensional moments are needed to identify the infinite dimensional parameters. As in the finite dimensional case, the estimator simply matches the sample moment function with the simulated moment function. To handle this continuum of moment conditions, this paper adopts the objective function of

Carrasco & Florens (2000); Carrasco et al. (2007a) in a semi-nonparametric setting.

The second building block is to nonparametrically approximate the distribution of the shocks using the method of sieves, as numerical optimization over an infinite dimension space is generally not feasible. Typical sieve bases include polynomials and splines which approximate smooth regression functions. Mixtures are particularly attractive to approximate densities for three reasons: they are computationally cheap to simulate from, they are known to have good approximation properties for smooth densities, and draws from the mixture sieve are shown in this paper to satisfy the

-smoothness regularity conditions required for the asymptotic results. Restrictions on the number of mixture components, the tails and the smoothness of the true density ensure that the bias is small relative to the variance so that valid inferences can be made in large samples. To handle potentially fat tails, this paper also introduces a Gaussian and tails mixture. The tail densities in the mixture are constructed to be easy to simulate from and also satisfy

-smoothness properties. The algorithm below summarizes the steps required to compute the estimator.

To illustrate the class of models considered and the usefulness of the mixture sieve for economic analysis, consider the first empirical application in Section 5 where the growth rate of consumption is assumed to follow the following process:

 Δct =μc+ρcΔct−1+σtet,1,et,1∼f (3) σ2t =μσ+ρσσ2t−1+κσet,2,et,2∼χ21. (4)

Compared to the general model (1)-(2), the corresponds to the outcome , the latent variable is and the parameters are . This very simple model, with a flexible distribution for the shocks , can explain the low level of the risk-free rate with a simple power utility and recent monthly data. In comparison, the Long-Run Risks models relies on more complex dynamics and recursive utilities (Bansal & Yaron, 2004) and the Rare Disasters literature involves hard to quantify very large, low frequency shocks (Rietz, 1988; Barro, 2006b). Empirically, the Sieve-SMM estimates of distribution of in the model (3)-(4) implies both a larger higher welfare cost of business cycle fluctuations and an annualized risk-free rate that is up to 4 percentage points lower than predicted by Gaussian shocks. Also, in this example the risk-free rate is tractable, up to a quadrature over , when using Gaussian mixtures:

 rmixtt=−log(δ)+γμc+γρcΔct−log(k∑j=1ωjEt[e−γσt+1μj+γ22σ2t+1[σ2j−1]]).

In comparison, for a general distribution the risk-free rate depends on all moments but does not necessarily have closed form. The mixture thus combines flexible econometric estimation with convenient economic modelling.111Gaussian mixtures are also convenient in more complicated settings where the model needs to be solved numerically. For instance, all the moments of a Gaussian mixture are tractable and quadrature is easy so that it can be applied to both the perturbation method and the projection method (see e.g. Judd, 1996, for a review of these methods) instead of the more commonly applied Gaussian distribution.

As in the usual sieve literature, this paper provides a consistency result and derives the rate of convergence of the structural and infinite dimensional parameters, as well as asymptotic normality results for finite dimensional functionals of these parameters. While the main results only provide low-level conditions for a specific choice of moments and sieve basis, Appendix F provides high-level conditions which can be used for a larger class of bounded moments and sieve bases. These results also allow to nonparametrically estimate quantities other than the distribution of the shocks. While the results apply to both static and dynamic models alike, two important differences arise in dynamic models compared to the existing literature on sieve estimation: proving uniform convergence of the objective function and controlling the dynamic accumulation of the nonparametric approximation bias.

The first challenge is to establish the rate of convergence of the objective function for dynamic models. To allow for the general dynamics (1)-(2) with latent variables, this paper adapts results from Andrews & Pollard (1994) and Ben Hariz (2005) to construct an inequality for uniformly bounded empirical processes which may be of independent interest. It allows the simulated data to be non-stationary when the initial is not taken from the ergodic distribution. It holds under the geometric ergodicity condition found in Duffie & Singleton (1993). The boundedness condition is satisfied by the CF and the CDF for instance. Also, the inequality implies a larger variance than typically found in the literature for iid or strictly stationary data with limited dependence induced by the moments.222See Chen (2007, 2011) for a review of sieve M-estimation with iid and dependent data.

The second challenge is that in the model (1)-(2) the nonparametric bias accumulates dynamically. At each time period the bias appears because draws are taken from a mixture approximation instead of the true , this bias is also transmitted from one period to the next since depends on . To ensure that this bias does not accumulate too much, a decay condition is imposed on the DGP. For the consumption process (3)-(4), this condition holds if both and are strictly less than . The resulting bias is generally larger than in static models and usual sieve estimation problems. Together, the increased variance and bias imply a slower rate of convergence for the Sieve-SMM estimates. Hence, in order to achieve the rate of convergence required for asymptotic normality, the Sieve-SMM requires additional smoothness of the true density . Note that the problem of bias accumulation seems quite generic to sieve estimation of dynamic models: if the computation of the moments or likelihood involve a filtering step then the bias accumulates inside the prediction error of the filtered values.333This is related to the accumulation of errors studied in the approximation of DSGE models (see e.g. Peralta-Alva & Santos, 2014). Note that in the present estimation context, the error in the moments involves the difference between dimensional integral over the true and the approximated distribution of the shocks which complicates the analysis. This is also related to the propagation of prediction error in the filtering of unobserved latent variables using e.g. the Kalman or Particle filter.

Monte-Carlo simulations illustrate the properties of the estimator and the effect of dynamics on the bias and the variance of the estimator. Two empirical applications highlight the importance of estimating the distribution of the shocks. The first is the example discussed above, and the second estimates a different stochastic volatility model on a long daily series of exchange rate data. The Sieve-SMM estimator suggests notable asymmetry and fat tails in the shocks, even after controlling for the time-varying volatility. As a result, commonly used parametric estimates for the persistence are significantly downward biased which has implications for forecasting; this effect is confirmed by the Monte-Carlo simulations.

### Related Literature

The Sieve-SMM estimator presented in this paper combines two literatures: sieve estimation and the Simulated Method of Moments (SMM). This section provide a non-exhaustive review of the existing methods and results to introduce the new challenges in the combined setting.

A key aspect to simulation-based estimation is the choice of moments . The Simulated Method of Moments (SMM) estimator of McFadden (1989) relies on unconditional moments, the Indirect Inference (IND) estimator of Gouriéroux et al. (1993) uses auxliary parameters from a simpler, tractable model and the Efficient Method of Moments (EMM) of Gallant & Tauchen (1996) uses the score of the auxiliary model. Simulation-based estimation has been applied to a wide array of economic settings: early empirical applications of these methods include the estimation of discrete choice models (Pakes, 1986; Rust, 1987), DSGE models (Smith, 1993) and models with occasionally binding constraints (Deaton & Laroque, 1992). More recent empirical applications include the estimation of earning dynamics (Altonji et al., 2013), of labor supply (Blundell et al., 2016) and the distribution of firm sizes (Gourio & Roys, 2014). Simulation-based estimation can also applied to models that are not fully specified as in Berry et al. (1995), these models are not considered in this paper.

To achieve parametric efficiency, a number of papers consider using nonparametric moments but assume the distribution is known.444See e.g. Gallant & Tauchen (1996); Fermanian & Salanié (2004); Kristensen & Shin (2012); Gach & Pötscher (2010); Nickl & Pötscher (2011). To avoid dealing with the nonparametric rate of convergence of the moments Carrasco et al. (2007a) use the continuum of moments implied by the CF. This paper uses a similar approach in a semi-nonparametric setting. In statistics, Bernton et al. (2017) use the Wasserstein, or Kantorovich, distance between the empirical and simulated distributions. This distance relies on unbounded moments and is thus excluded from the analysis in this paper.

General asymptotic results are given by Pakes & Pollard (1989) for SMM with iid data and Lee & Ingram (1991); Duffie & Singleton (1993) for time-series. Gouriéroux & Monfort (1996) provide an overview of simulation-based estimation methods.

While most of the literature discussed so far deals with fully parametric SMM models, there are a few papers concerned with sieve simulation-based estimation. Bierens & Song (2012) provide a consistency result for Sieve-SMM estimation of a static first-price auction model.555In order to do inference on , they propose to invert a simulated version of Bierens (1990)

’s ICM test statistic. A recent working paper by

Bierens & Song (2017) introduces covariates in the same auction model and gives an asymptotic normality result for the coefficients on the covariates. Newey (2001) uses a sieve simulated IV estimator for a measurement error model and proves consistency as both and go to infinity. These papers consider specific static models and provide limited asymptotic results. Furthermore, they consider sampling methods for the simulations that are very computationally costly (see Section 2.3 for a discussion).666Additionally, an incomplete working paper by Blasques (2011) uses the high-level conditions in Chen (2007) for a Semi-NonParametric Indirect Inference” estimator. These conditions are very difficult to verify in practice and additional results are needed to handle the dynamics. Also, to avoid using sieves and SMM in moment conditions models that are tractable up to a latent variable, Schennach (2014) proposes an Entropic Latent Variable Integration via Simulation (ELVIS) method to build estimating equations that only involve the observed variables. Dridi & Renault (2000) propose a Semi-Parametric Indirect Inference based on a partial encompassing principle.

An alternative to using sieves in SMM estimation involves using more general parametric families to model the first 3 or 4 moments flexibly. Ruge-Murcia (2012, 2017)

considers the skew Normal and the Generalized Extreme Value distributions to model the first 3 moments of productivity and inflation shocks.

Gospodinov & Ng (2015); Gospodinov et al. (2017) use the Generalized Lambda famility to flexibly model the first 4 moments of the shocks in a non-invertible moving avergage and a measurement error model. However, in applications where the moments depend on the full distribution of the shocks, which is the case if the data is non-separable in the shocks , then the estimates will be sensitive to the choice of parametric family. Also, quantities of interest such as welfare estimates and asset prices that depend on the full distribution will also be sensitive to the choice of parametric family.

Another related literature is the sieve estimation of models defined by moment conditions. These models can be estimated using either Sieve-GMM, Sieve Empirical Likelihood or Sieve Minimum Distance (see Chen, 2007, for a review). Applications include nonparametric estimation of IV regressions777See e.g. Hall & Horowitz (2005); Carrasco et al. (2007b); Blundell et al. (2007); Darolles et al. (2011); Horowitz (2011).

, quantile IV regressions,

888See e.g. Chernozhukov & Hansen (2005); Chernozhukov et al. (2007); Horowitz & Lee (2007). and the semi-nonparametric estimation of asset pricing models,999See e.g. Hansen & Richard (1987); Chen & Ludvigson (2009); Chen et al. (2013); Christensen (2017). for instance. Existing results cover the consistency and the rate of convergence of the estimator as well as asymptotic normality of functional of the parameters for both iid and dependent data. See e.g. Chen & Pouzo (2012, 2015) and Chen & Liao (2015) for recent results with iid data and dependent data.

In the empirical Sieve-GMM literature, an application closely related to the dynamics encountered in this paper appears in Chen et al. (2013). The authors show how to estimate an Euler equation with recursive preferences when the value function is approximated using sieves. Recursive preferences require a filtering step to recover the latent variable. As in the Sieve-SMM setting, this has implications for bias accumulation in parameter dependent time-series properties. Exisinting results, based on coupling methods (see e.g. Doukhan et al., 1995; Chen & Shen, 1998), do not apply to this class of moments and the authors rely on Bootstrap inference without formal justification.

### Notation

The following notation and assumptions will be used throughout the paper: the parameter of interest is . The finite dimensional parameter space is compact and the infinite dimensional set of densities is possibly non-compact. The sets of mixtures satisfy , is the data dependent dimension of the sieve set . The dimension increases with the sample size: as . Using the notation of Chen (2007), is the mixture approximation of the density . The vector of shocks has dimension and density . The total variation distance between two densities is and the supremum (or sup) norm is . For simplification, the following convention will be used and , where and correspond the Euclidian norm of and respectively. is a norm on the mixture components: where is the Euclidian norm and are the mixture parameters. For a functional , its pathwise, or Gâteau, derivative at in the direction is , it will be assumed to be continuous in and linear in . For two sequences and , the relation implies that there exists such that for all .

### Structure of the Paper

The paper is organized as follows: Section 2 introduces the Sieve-SMM estimator, explains how to implement it in practice and provides important properties of the mixture sieve. Section 3 gives the main asymptotic results: under regularity conditions, the estimator is consistent. Its rate of convergence is derived, and under further conditions, finite dimensional functionals of the estimates are asymptotically normal. Section B provides two extensions, one to include auxiliary variables in the CF and another to allow for dynamic panels with small . Section 4 provides Monte-Carlo simulations to illustrate the theoretical results. Section 5 gives empirical examples for the estimator. Section 6 concludes. Appendix A gives some information about the CF and details on how to compute the estimator in practice as well as identification and additional asymptotic normality results for the stochastic volatility model. Appendix B provides extensions of the main results to moments of auxiliary variables and short panel data. Appendix C provides additional Monte-Carlo simulations for short panels. Appendix D provide additional empirical results to the ones presented in the main text. Appendix E provides the proofs to the main results and the extensions. The online supplement includes:101010The online supplement can be found at http://jjforneron.com/SieveSMM/Supplement.pdf. Appendix F which provides results for more general moment functions and sieve bases and Appendix G which provides the proofs for these results.

## 2 The Sieve-SMM Estimator

This section introduces the notation used in the remainder of the paper. It describes the class of DGPs considered in the paper and describes the DGP of the leading example in more details. It discusses the choice of mixture sieve, moments and objective function as well as some important properties of the mixture sieve. The simple running example used throughout the analysis is based on the empirical applications of Section 5.

###### Example 1 (Stochastic Volatility Models).

In both empirical applications, follows an AR(1) process with log-normal stochastic volatility

 yt=μy+ρyyt−1+σtet,1.

The first empirical application estimates a linear volatility process:

 σ2t=μσ+ρσσ2t−1+κσet,2,et,2∼χ21.

The second empirical application estimates a log-normal stochastic volatility process:

 log(σt)=μσ+ρσlog(σt−1)+κσet,2,et,2iid∼N(0,1).

In both applications with the restrictions and . The first application approximates with a mixture of Gaussian distributions, the second adds two tail components to model potential fat tails. Using the notation given in (1)-(2), the latent variable is given by , where and (or ).

Stochastic volatility (SV) models in Example 1 are intractable because of the latent volatility. With log-normal volatility, the model becomes tractable after taking the transformation (see e.g. Kim et al., 1998) and the problem can be cast as a deconvolution problem (Comte, 2004). However, the transformation removes all the information about asymmetries in , which turn out to empirically significant (see section 5). In the parametric case, alternatives to using the transformation involve Bayesian simulation-based estimators such as the Particle Filter and Gibbs sampling or EMM for frequentist estimation.

### 2.1 Sieve Basis - Gaussian and Tails Mixture

The following definition introduces the Gaussian and tails mixture sieve that will be used in the paper. It combines a simple Gaussian mixture with two tails densities which model asymmetric fat tails parametrically. Drawing from this mixture is computationally simple: draw uniforms and gaussian random variables, switch between the Gaussians and the tails depending on the uniform and the mixture weights

. The tail draws are simple functions of uniform random variables.

###### Definition 1 (Gaussian and Tails Mixture).

A random variable follows a component Gaussian and Tails mixture if its density has the form:

 fω,μ,σ(et)=k∑j=1ωjσjϕ(et−μjσj)+ωk+1σk+11et≤μk+1fL(et−μk+1σk+1)+ωk+2σk+21et≥μk+2fR(et−μk+2σk+2)

where is the standard Gaussian density and its left and right tail components are

 fL(et,ξL)=(2+ξL)|et|1+ξL[1+|et|2+ξL]2 for et≤0,fR(et,ξR)=(2+ξR)e1+ξRt[1+e2+ξR]2 for et≥0

with for and for . To simulate from the Gaussian and tails mixture, draw , and compute and . Then, for :

 et =k+2∑j=11ν∈[∑j−1l=0ωl,∑jl=0ωl](μj+σjZj)

follows the Gaussian and tails mixture .

For applications where fat tails are deemed unlikely, as in the first empirical application, the weights can be set to zero to get a Gaussian only mixture. If and then the left and right tails satisfy:

 fL(e)e→−∞∼|e|−3−ξL,fR(e)e→+∞∼e−3−ξR.

When then draws from the tail components have finite expectation, they also have finite variance if . More generally, for the -th moment to be finite, , is necessary. Gallant & Nychka (1987) also add a parametric component to model fat tails by mixing a Hermite polynomial density with a Student density. Neither the Hermite polynomial nor the Student distribution have closed-form quantiles, which is not practical for simulation. Here, the densities are constructed to be easy to simulated from. The tail indices will be estimates along with the remaining parameters of the mixture distribution.

The indicator function introduces discontinuities in the parameter . Standard derivative-free optimization routines such as the Nelder-Mead algorithm (Nelder & Mead, 1965) as implemented in the NLopt library of Johnson (2014) can handle this estimation problem as illustrated in Section 4.111111The NLopt library is available for C++, Fortran, Julia, Matlab, Python and R among others.

In the finite mixture literature, mixture components are known to be difficult to identify because of possible label switching and the likelihood is globally unbounded.121212See e.g. McLachlan & Peel (2000) for a review of estimation, identification and applications of finite mixtures. See also Chen et al. (2014b) for some recent results. Using the characteristic function rather than the likelihood resolves the unbounded likelihood problem as discussed in Yu (1998). More importantly, the object of interest in this paper is the mixture density itself rather than the mixture components. As a result, permutations of the mixture components are not a concern since they do not affect the density .

### 2.2 Continuunm of Moments and Objective Function

As in the parametric case, the moments need to be informative enough to identify the parameters. In Sieve-SMM estimation, the parameter is infinite dimensional so that no finite dimensional vector of moments could possibly identify . As a result, this paper relies on moment functions which are themselves infinite dimensional.

The leading choice of moment function in this paper is the empirical characteristic function for the joint vector of lagged observations :

 ^ψn(τ)=1nn∑t=1eiτ′(yt,xt),∀τ∈Rdτ

where is the imaginary number such that .131313The moments can also be expressed in terms of sines and cosines since .

The CF is one-to-one with the joint distribution of

, so that the model is identified by if and only if the distribution of identifies the true . Using lagged variables allows to identify the dynamics in the data, Knight & Yu (2002) show how the characteristic function can identify parametric dynamic models. Some useful properties of the CF are given in Appendix A.1.

Besides the CF, another choice of bounded moment function is the CDF. While the CF is a smooth transformation of the data, the empirical CDF has discontinuities at each point of support of the data which could make numerical optimization more challenging. Also, the CF around summarizes the information about the tails of the distribution (see Ushakov, 1999, page 30). This information is thus easier to extract from the CF than the CDF. The main results of this paper can be extended to any bounded moment function satisfying a Lipschitz condition.141414Appendix F allows for more general non-Lipschitz moment functions and other sieve bases. However, the conditions required for these results are more difficult to check.

Since the moments are infinite dimensional, this paper adopts the approach of Carrasco & Florens (2000); Carrasco et al. (2007a) to handle the continuum of moment conditions:151515Carrasco & Florens (2000) provide a general theory for GMM estimation with a continuum of moment conditions. They show how to efficiently weight the continuum of moments and propose a Tikhonov (ridge) regularization approach to invert the singular variance-covariance operator. Earlier results, without optimal weighting, include Koul (1986) for minimum distance estimation with a continuum of moments.

 ^QSn(β)=∫∣∣^ψn(τ)−^ψSn(τ,β)∣∣2π(τ)dτ. (5)

The objective function is a weighted average of the square norm between the empirical and the simulated moment functions. As discussed in Carrasco & Florens (2000) and Carrasco et al. (2007a), using the continuum of moments avoids the problem of constructing an increasing vector of moments. The weighting density is chosen to be the multivariate normal density for the main results. Other choices for are possible as long as it has full support and is such that

. As an example, the exponential distribution satisfies these two conditions, while the Cauchy distribution does not satisfy the second. In practice, choosing

to be the Gaussian density with same mean and variance as gave satisfying results in Sections 4 and 5.161616Monte-Carlo experiments not reported in this paper showed similar results when using the exponential density for instead of the Gaussian density. In the appendix, the results allow for a bounded linear operator which plays the role of the weight matrix in SMM and GMM as in Carrasco & Florens (2000). Carrasco & Florens (2000); Carrasco et al. (2007a) provide theoretical results for choosing and approximating the optimal operator in the parametric setting. Similar work is left to future research in this semi-nonparametric setting.

Given the sieve basis, the moments and the objective function, the estimator is defined as an approximate minimizer of :

 ^QSn(^βn)≤infβ∈Bk(n)^QSn(β)+Op(ηn) (6)

where and corresponds to numerical optimization and integration errors. Indeed, since the integral in (5) needs to be evaluated numerically, some form of numerical integration is required. Quadrature and sparse quadrature were found to give satisfying results when is not too large (less than ). For larger dimensions, quasi-Monte-Carlo integration using either the Halton or Sobol sequence gave satisfying results.171717See e.g. Heiss & Winschel (2008); Holtz (2011) for an introduction to sparse quadrature in economics and finance, and Owen (2003) for quasi-Monte-Carlo sampling. All Monte-Carlo simulations and empirical results in this paper are based on quasi-Monte-Carlo integration. Additional computional details are given in Appendix A.2.

###### Example 1 (Continued) (Stochastic Volatility).

The following illustrates the steps involved in Sieve-SMM Algorithm for the stochastic volatility model with a Gaussian only mixture:

• fix , and ,

• construct a grid , e.g. Box-Muller transformed Sobol sequence,

• compute the sample Characteristic Function over the grid

 ^ψn=(^ψn(τ1),…,^ψn(τm))=1nn∑t=L+1(eiτ′1yt,…,eiτ′myt),yt=(yt,…,yt−L),
• draw , and, where

• minimize the objective , computed as follows:

• compute

• simulate using and

• compute as above and

### 2.3 Approximation Rate and L2-Smoothness of the Mixture Sieve

This subsection provides more details on the approximation and -smoothness properties of the mixture sieve. It also provides the necessary restrictions on the true density to be estimated. Gaussian mixtures can approximate any smooth univariate density but the rate of this approximation depends on both the smoothness and the tails of the density (see e.g. Kruijer et al., 2010)

. The tail densities parametrically model asymmetric fat tails in the density. This is useful in the second empirical example where exchange rate data may exhibit larger tails. The following lemma extends the approximation results of

Kruijer et al. (2010) to multivariate densities with independent components and potentially fat tails.

###### Lemma 1 (Approximation Properties of the Gaussian and Tails Mixture).

Suppose that the shocks are independent with density . Suppose that each marginal can be decomposed into a smooth density and the two tails of Definition 1:

 fj=(1−ωj,1−ωj,2)fj,S+ωj,1fL+ωj,2fR.

Let each satisfy the assumptions of Kruijer et al. (2010):

1. Smoothness: is -times continuously differentiable with bounded -th derivative.

2. Tails: has exponential tails, i.e. there exists such that:

 fj,S(e)≤Mfe−a|e|b,∀|e|≥¯e.
3. Monotonicity in the Tails: is strictly positive and there exists such that is weakly decreasing on and weakly increasing on .

and for all . Then there exists a Gaussian and tails mixture satisfying the restrictions of Kruijer et al. (2010):

1. Bandwidth: .

2. Location Parameter Bounds: with

such that as :

 ∥f−Πkf∥F=O(log[k]2r/bkr)

where or .

The space of true densities satisfying the assumptions will be denoted as and is the corresponding space of Gaussian and tails mixtures .

Note that additional restrictions on

may be required for identification, such as mean zero, unit variance or symmetry. The assumption that the shocks are independent is not too strong for structural models where this, or a parametric factor structure is typically assumed. Note that under this assumption, there is no curse of dimensionality because the components

can be approximated separately. Also, the restriction is only required for the approximation in supremum norm

An important difficulty which arises in simulating from a nonparametric density is that draws are a very nonlinear transformation of the nonparametric density . As a result, standard regularity conditions such as Hölder and -smoothness are difficult to verify and may only hold under restrictive conditions. The following discusses these regularity conditions for the methods used in the previous literature. Then, a -smoothness result for the mixture sieve is provided in Lemma 2 below.

Bierens & Song (2012) use Inversion Sampling: they compute the CDF from the nonparametric density and draw

. Computing the CDF and its inverse to simulate is very computationally demanding. Also, while the CDF is linear in the density, its inverse is a highly non-linear transformation of the density. Hence, Hölder and

-smoothness results for the draws are much more challenging to prove without further restrictions.

Newey (2001) uses Importance Sampling for which Hölder conditions are easily verified but requires for consistency alone. Furthermore, the choice of importance distribution is very important for the finite sample properties (the effective sample size) of the simulated moments. In practice, the importance distribution should give sufficient weight to regions for which the nonparametric density has more weight. Since the nonparametric density is unknown ex-ante, this is hard to achieve in practice.

Gallant & Tauchen (1993) use Accept/Reject (outside of an estimation setting): however, it is not practical for simulation-based estimation. Indeed, the required number of draws to generate an accepted draw depends on both the instrumental density and the target density . The latter varies with the parameters during the optimization. This also makes the -smoothnes properties challenging to establish. In comparison, the following lemma shows that the required -smoothness condition is satisfied by draws from a mixture sieve.

###### Lemma 2 (L2-Smoothness of Simulated Mixture Sieves).

Suppose that

 est=k(n)∑j=11νst∈[∑j−1l=0ωl,∑jl=0ωl](μj+σjZst,j),~est=k(n)∑j=11νst∈[∑j−1l=0~ωl,∑jl=0~ωl](~μj+~σjZst,j)

with and , and . If then there exists a finite constant which only depends on such that:

Lemma 2 is key in proving the -smoothness conditions of the moments required to establish the convergence rate of the objective function and stochastic equicontinuity results. Here, the -smoothness constant depends on both the bound and the number of mixture components .181818See e.g. Andrews (1994); Chen et al. (2003) for examples of -smooth functions. Kruijer et al. (2010) showed that both the total variation and supremum norms are bounded above by the pseudo-norm on the mixture parameters up to a factor which depends on the bandwidth . As a result, the pseudo-norm controls the distance between densities and the simulated draws as well. Furthermore, draws from the tail components are shown in the Appendix to be -smooth in . Hence, draws from the Gaussian and tails mixture are -smooth in both and .

## 3 Asymptotic Properties of the Estimator

This section provides conditions under which the Sieve-SMM estimator in (6) is consistent, derives its rate of convergence and asymptotic normality results for linear functionals of .

### 3.1 Consistency

Consistency results are given under low-level conditions on the DGP using the Gaussian and tails mixture sieve with the CF.191919Consistency results allowing for non-mixture sieves and other moments are given in Appendix F.1. First, the population objective is:

 Qn(β)=∫∣∣E(^ψn(τ)−^ψSn(τ,β))∣∣2π(τ)dτ. (7)

The objective depends on because are not covariance stationary: the moments can depend on . Under geometric ergodicity, it has a well-defined limit:202020Since the CF is bounded, the dominated convergence theorem can be used to prove the existence of the limit.

 Qn(β)n→∞→Q(β)=∫∣∣limn→∞E(^ψn(τ)−^ψSn(τ,β))∣∣2π(τ)dτ.

In the definition of the objective and its limit , the expectation is taken over both the data and the simulated samples . The following assumption, provide a set of sufficient conditions on the true density , the sieve space and a first set of conditions on the model (identification and time-series properties) to prove consistency.

###### Assumption 1 (Sieve, Identification, Dependence).

Suppose the following conditions hold:

1. (Sieve Space) the true density and the mixture sieve space satisfy the assumptions of Lemma 1 with as and is compact and .

2. (Identification) where is the Gaussian density. For any and for all , is strictly positive and weakly decreasing in both and .

3. (Dependence) is strictly stationary and -mixing with exponential decay, the simulated are geometrically ergodic, uniformly in .

Condition i. is stronger than the usual condition in the sieve literature (see e.g Chen, 2007). The additional term comes from the non-linearity of the mixture sieve. The fourth power is due to the dependence: the inequality in Lemma G15 provides a bound of order instead of for iid data.

Condition ii. is the usual identification condition. It is assumed that the information from the joint distribution of uniquely identifies . Proving general global identification results is quite challenging in this setting and is left to future research. Local identification in the sense of Chen et al. (2014a) is also challenging to prove here because the dynamics imply that the distribution of is a convolution of with the distribution of . Since the stationary distributions of and are the same, the resulting distribution is the fixed point of its convolution with . This makes derivatives with respect to difficult to compute in many dynamic models. Note that the identification assumption does not exclude ill-posedness.212121See e.g. Carrasco et al. (2007b) and Horowitz (2014) for a review of ill-posedness in economics. The space is assumed to include the necessary restrictions (if any) for identification such as mean zero and unit variance. Global identification results for the stochastic volatility model in Example 1 are given in Appendix A.4.

Condition iii. is common in SMM estimation with dependent data (see e.g. Duffie & Singleton, 1993). In this setting, it implies two important features: the simulated are -mixing (Liebscher, 2005), and the initial condition bias is negligible: .222222See Proposition F5 in the supplemental material for the second result.

###### Assumption 2 (Data Generating Process).

is simulated according to the dynamic model (1)-(2) where and satisfy the following Hölder conditions for some , is either the total variation or supremum norm and:

1. ;

2. ;

3. ;

4. ;

5. ;

6. ;

for any , , and .

Conditions y(ii), u(ii) correspond to the usual Hölder conditions in GMM and M-estimation but placed on the DGP itself rather than the moments. Since the cosine and sine functions are Lipschitz, it implies that the moments are Hölder continuous as well.232323For any choice of moments that preserve identification and are Lipschitz, the main results will hold assuming and are bounded. For both the Gaussian and the exponential density, these quantities turn out to be bounded. In general Lispchitz transformations preserve -smoothness properties (see e.g. Andrews, 1994; van der Vaart & Wellner, 1996), here additional conditions on are required to handle the continuum of moments with unbounded support.

The decay conditions y(i), u(i) together with condition y(iii) ensure that the differences due to do not accumulate too much with the dynamics. As a result, keeping the shocks fixed, the Hölder condition applies to as a whole. It also implies that the nonparametric approximation bias does not accumulate too much. These conditions are similar to Duffie & Singleton (1993)’s -Unit Circle condition which they propose as an alternative to geometric ergodicity for uniform LLNs and CLTs. The decay conditions play a crucial role here since they control the nonparametric bias of the estimator.

Condition u(iii) ensures that the DGP preserves the -smoothness properties derived for mixture draws in Lemma 2. This condition does not appear in the usual sieve literature which does not simulate from a nonparametric density. In the SMM literature, a Lipschitz or Hölder condition is usually given on the moments directly. Note that a condition analogous to u(iii) would also be required for SMM estimation of a parametric distribution.

Assumption 2 does not impose that the DGP be smooth. This allows for kinks in or as in the sample selection model or the models of Deaton (1991) and Deaton & Laroque (1992). Assumption 2 in Appendix E.2 extends Assumption 2 to allow for possible discontinuities in . The following shows how to verify the conditions of Assumption 2 in Example 1 with volatility shocks.242424Some additional examples are given in Appendix F.4. They are not tied to the use of mixtures, and as a result, impose stronger restrictions on the density such as bounded support.

###### Example 1 (Continued) (Stochastic Volatility).

implies y(i) holds. Also:

 |μy,1+ρy,1yt−1−μy,2−ρy,2yt−1|≤(|μy,1−μy,2|+|ρy,1−ρy,2|)(1+|yt−1|)

and thus condition y(ii) is satisfied assuming is bounded. Since has mean zero and unit variance, is bounded if , and for some . For condition y(iii), take and :

 |σtet,1−~σtet,1|≤|et,1|√|σ2t−~σ2t|,|σte