Adaptive MCMC for Generalized Method of Moments with Many Moment Conditions

by   Masahiro Tanaka, et al.

A generalized method of moments (GMM) estimator is unreliable when the number of moment conditions is large, that is, it is comparable or larger than the sample size. While a number of provisions for this problem is proposed in classical GMM literature, the literature on its Bayesian counterpart (i.e., Bayesian inference using a GMM criterion as a quasi-likelihood) has paid scant attention to this problem. This paper fills this gap by proposing an adaptive Markov Chain Monte Carlo (MCMC) approach to a GMM inference with many moment conditions. Particularly, this paper focuses on the adaptive tuning of a weighting matrix on the fly. Our proposal consists of two elements. The first is the random update of a weighting matrix, which substantially reduces computational cost, while maintaining the accuracy of the estimation. The second is the use of the nonparametric eigenvalue-regularized precision matrix estimator, which contributes to numerical stability. A simulation study and a real data application then are presented to illustrate the performance of the proposed approach in comparison with existing approaches.



There are no comments yet.


page 1

page 2

page 3

page 4


An Annealed Sequential Monte Carlo Method for Bayesian Phylogenetics

The estimation of the probability of the data under a given evolutionary...

Adaptive Physics-Informed Neural Networks for Markov-Chain Monte Carlo

In this paper, we propose the Adaptive Physics-Informed Neural Networks ...

Sensitivity Analysis using Approximate Moment Condition Models

We consider inference in models defined by approximate moment conditions...

Bayesian Estimation and Comparison of Conditional Moment Models

We consider the Bayesian analysis of models in which the unknown distrib...

Improved Neuronal Ensemble Inference with Generative Model and MCMC

Neuronal ensemble inference is a significant problem in the study of bio...

Moment Inequalities in the Context of Simulated and Predicted Variables

This paper explores the effects of simulated moments on the performance ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The generalized method of moments (GMM) is a widely used statistical framework (Hansen, 1982; Hall, 2005)

. Under GMM, unknown parameters are estimated via a set of moment conditions. A parameter estimate is obtained by minimizing a GMM criterion constructed as a quadratic form and composed of a vector of the sample mean of the moment conditions and a weighting matrix. While GMM uses only lower-order moments, thus being statistically less efficient than full-information methods such as the maximum likelihood method, it has many advantages, including robustness to model misspecification, nonparametric treatment of heteroskedasticity, and computational simplicity.

This paper concerns the Bayesian version of the GMM. A GMM criterion can be viewed as a quasi-likelihood, being theoretically equivalent to the Laplace approximation of the true likelihood around its mode (Chernozhukov and Hong, 2003). Exploiting this feature, one can conduct a (quasi-)Bayesian inference by replacing true likelihood by a GMM criterion, as discussed by, for example, Kim (2002); Yin (2009).111See also Belloni and Chernozhukov (2009); Li and Jiang (2016) for a discussion of theoretical properties. Posterior draws from a quasi-posterior density (product of quasi-likelihood and prior density) can be simulated using standard Bayesian Markov Chain Monte Carlo (MCMC) techniques, such as the Metropolis-Hastings algorithm. In the following, we call this inferential approach Bayesian GMM, in contradistinction to the classical GMM.

For applications, a GMM criterion has many moment conditions, making the estimator considerably unreliable. There are cases where the number of moment conditions can be large, including dynamic panel models (e.g., Arellano and Bond, 1991; Blundell and Bond, 1998; Roberts and Rosenthal, 2009; Vieira et al., 2012), instrumental variable methods (e.g., Chernozhukov and Hansen, 2005, 2013), and identification through heteroskedasticity (Lewbel, 2012).

In the literature on classical GMM, many provisions to the problem are proposed such as systematic moment selection (Andrews, 1999; Andrews and Lu, 2001; Hall and Peixe, 2003; Hall et al., 2007; Okui, 2009; Donald et al., 2009; Canay, 2010; DiTraglia, 2016; Chang and DiTraglia, 2018), averaging (Chen et al., 2016), and shrinkage estimation (Liao, 2013; Fan and Liao, 2014; Cheng and Liao, 2015; Caner et al., 2018). On the other hand, the literature on Bayesian GMM has paid scant attention to the problem, although remedies tailored to classical GMM are not straightforwardly applicable to Bayesian GMM. The purpose of this paper is thus to fill this gap by proposing a novel method to deal with Bayesian GMM with many moment conditions.

For both classical and Bayesian GMM, choosing a good weighting matrix is not a trivial problem. It is theoretically optimal to set a weighting matrix to the precision matrix (i.e., the inverse of the covariance matrix) of moment conditions, evaluated based on true parameter values. Since this approach is infeasible in practice, a two-step and continuously updated estimators are commonly used in classical GMM (Hansen, 1982; Hansen et al., 1996). By contrast, the literature on Bayesian GMM has paid less attention to the weighting matrix choice. Chernozhukov and Hong (2003), who use the random-walk Metropolis-Hasting algorithm, suggest recomputing the weighting matrix each time a parameter proposal is drawn. This approach is motivated by setting a weighting matrix to a locally optimal one; a posterior mean estimate of the weighting matrix is supposed to be nearly optimal on average. In this approach, the unknown parameters and a weighting matrix are updated concurrently. Consequently, the surface of the quasi-posterior becomes complicated, making the MCMC algorithm inefficient and unstable. To tackle this problem, Yin et al. (2011) propose an approach they call stochastic GMM, where unknown parameters are updated one by one and the corresponding weighting matrix is also updated accordingly. Their approach improves the numerical stability of the posterior simulator by suppressing changes in the posterior in a single cycle. However, this approach requires so many matrix inversions of the weighting matrix that it is not practical for models with many moment conditions.

There are two difficulties in setting a weighting matrix when the number of moment conditions is large. First, it is computationally demanding because the inversion of the sample covariance matrix is repeatedly computed. This problem is peculiar to Bayesian GMM. Second, as in classical GMM, the sample estimate of the covariance matrix of the moment conditions is unreliable, and the inversion of the covariance matrix can amplify estimation errors.

In this paper, we develop an adaptive MCMC approach to deal with the problem of many moment conditions in Bayesian GMM. The proposal consists of two main contributions. First, we propose to update a weighting matrix randomly using the recursive mean of the posterior samples. In our approach, adaptation probabilities are set to be exponentially decreasing, which ensures the validity of the MCMC algorithm, and significantly saves computational cost. Second, we propose estimating the precision matrix of the moment conditions using the nonparametric eigenvalue-regularized precision matrix estimator developed by

Lam (2016). This estimator is more numerically stable than the standard estimator. Through a series of Monte Carlo experiments, we show that the proposed approach outperforms existing ones in terms of both statistical and computational efficiency. Even if the number of moment conditions is significantly smaller than the sample size, a GMM estimator can be ill-posed. While our primary focus is a problem posed by many moment conditions, the proposed approach can be also beneficial to cases where the number of moment conditions is not so many, as shown in the subsequent sections.

The rest of the paper is structured as follows. Section 2 introduces the proposed approach. Section 3 conducts a simulation study. In Section 4, we apply the approach to a real data problem as an example. Section 5 concludes the paper with a discussion.

2 Methodology

2.1 Setup and challenges

We consider the Bayesian inference of a statistical model by means of a set of moment conditions. Assume that a likelihood function can be approximated by a quasi-likelihood based on a generalized method of moments (GMM) criterion (Hansen, 1982). We call this inferential approach Bayesian GMM (Kim, 2002; Yin, 2009). Given data and an -dimensional parameter , a quasi-likelihood is derived from the GMM criterion:

where is a -dimensional vector of moment conditions, contains the sample means of the moment conditions, is a symmetric positive definite weighting matrix, and is the sample size. A GMM criterion can be seen as the Laplace approximation of the negative true likelihood evaluated around the mode (Chernozhukov and Hong, 2003). Given a prior density , the posterior density is approximated as


where the denominator is generally unknown but constant. The posterior samples are drawn from this target density (evaluated up to the normalizing constant) using Bayesian simulation techniques. For simplicity, we consider using the Random walk Metropolis-Hastings (RWMH) algorithm as in previous studies (e.g., Chernozhukov and Hong, 2003; Yin, 2009).

As in classical GMM, the statistical efficiency of the Bayesian GMM critically depends on the choice of the weighting matrix . is optimal when it is set to the precision matrix of the moment conditions based on true parameter values :

. This choice is optimal in that it minimizes the Kullback-Leibler divergence of the true data generating process to the set of all asymptotically less restrictive distributions:

Let denote an -by- matrix of the moment conditions. The optimal choice of weighting matrix in finite sample is

In classical GMM, it is common practice to employ the two-step (Hansen, 1982) or continuously updated estimators (Hansen et al., 1996).222

The two-step estimation method obtains a first-stage estimate using an arbitrary weighting matrix (e.g., an identity matrix), then obtains a second-stage estimate using a weighting matrix to a precision matrix of the moment conditions based on the first-stage estimate. The continuously updating estimation method repeats the two-step estimation for more than one time.

While in classical GMM, the choice of does not affect the consistency of the parameter estimate, Bayesian GMM does not inherit this property due to the use of a prior. A sub-optimal choice of can decrease the curvature of , making the inference undesirable from a Bayesian perspective: the more uncertain we are about the true values , the less the prior is likely to contribute to the posterior. Therefore, for Bayesian GMM, there is an urgent need to choose efficiently before or along with the posterior simulation. We take the latter route: choosing on the fly.

Despite its critical importance, the practical choice of in the context of Bayesian GMM has received rather scant attention. A straightforward approach to choosing , which is employed by, for instance, Chernozhukov and Hong (2003); Yin (2009), can be described as follows. At the th MCMC iteration, given the current parameters , a proposal is simulated for a proposal density . The weighting matrix is set to the precision matrix of the moment condition based on , that is, the parameter vector and weighting matrix are concurrently proposed and updated (i.e., accepted or rejected). We call this approach the concurrent GMM. The Metropolis-Hastings (MH) ratio is calculated as

Yin et al. (2011) argue this approach is numerically unstable, because the concurrent updating of and complicates the surface of the target kernel, resulting in an inefficient move of the MH sampler. They propose an alternative approach, named stochastic GMM, where the elements of are updated one by one, keeping unchanged. This approach is designed to update and gradually, suppressing instantaneous changes in the shape of the target kernel. Let denote a state at the th MCMC iteration after the th parameter was updated. Once a proposed value of is simulated, a proposal is constructed as , and the MH ratio is given by

As previously mentioned, when number of moment conditions is large, this approach is computationally heavy, because it requires many matrix inversions.

There are two challenges in regard the choice of the weighting matrix for Bayesian GMM, especially when the number of moment conditions is large, that is, is comparable or even larger than the sample . The first challenge is computational cost. The existing approaches require repeated inversion of the sample covariance of the moment conditions, thus imposing severe computational loads. Second, when is large, the covariance of the moment condition is ill-estimated, and estimation errors are amplified through matrix inversions. As mentioned in Section 1, remedies in the classical GMM literature cannot be straightforwardly imported to Bayesian GMM. Using the Moore-Penrose generalized inverse is a simple solution, but it does not work well, as shown by the simulation study reported in Section 3.333See Satchachai and Schmidt (2008) on this point for frequentist GMM.

2.2 Proposed approach

The proposal of this paper is comprised of two elements: random update of weighting matrix and regularized precision matrix estimation.

First, we consider randomly updating a weighting matrix . While the existing methods compute for each MCMC cycle, we treat as a tuning parameter, and update it on the fly as in adaptive MCMC algorithms (Haario et al., 2001; Andrieu and Thoms, 2008; Roberts and Rosenthal, 2009). Our adaptation procedure is motivated by Bhattacharya and Dunson (2011). At the th MCMC iteration, the adaptation of occurs with probability , regardless of the previous proposal being accepted or rejected. For example, in the simulation study below, we choose and so that the probability of adaptation is around 0.1 at the beginning of the MCMC, and then decreases exponentially to zero. If an adaptation occurs, is updated using the means of the hitherto sample obtained; at the th iteration, . After warmup iterations, is fixed to the end. This adaptation strategy satisfies the convergence condition in Theorem 5 of Roberts and Rosenthal (2007). In our implementation, at every

th iteration, a random variable is simulated from a standard uniform distribution,

, and is updated if .444 denotes a uniform distribution with support on interval . At the th iteration, given a proposal , the MH ratio is calculated as

This treatment of does not conflict with the theoretical results of Bayesian GMM in existing papers, since in the theoretical analyses, the weighting matrix is pre-fixed. 555See, e.g., Kim (2002); Chernozhukov and Hong (2003); Belloni and Chernozhukov (2009); Li and Jiang (2016). In a sense, there is a discrepancy between theory and practical computation in how a weighting matrix is treated, and our treatment of rather accords with the theoretical results than the existing approaches. A serious theoretical investigation on the effects of estimation/tuning of a weighting matrix on the posterior density is an important topic but will be addressed in future studies.

Next, we propose to compute using the nonparametric eigenvalue-regularized (NER) precision matrix estimator (Lam, 2016).666Abadir et al. (2014) consider a closely related covariance estimator.777In frequentist GMM, Doran and Schmidt (2006)

suggest using principal components of a weighting matrix. A strategy using the standard principal component analysis to estimate the weighting matrix does not work for Bayesian GMM, not being considered in the paper. The simulation results are available upon request.

Given , the moment conditions are partitioned as , where the sizes of and are -by- and -by- , respectively. The covariance matrices of the sub-samples are computed in a standard manner: , . Let and . The eigenvalue decomposition of is represented by , , where is a diagonal matrix containing the eigenvalues of , , and

is a matrix composed of the corresponding eigenvectors. Following

Lam (2016), the sample covariance matrix is estimated as

where denotes a diagonal matrix that has the same diagonal elements of a square matrix 888Using a conventional but rather confusing notation, is written as . . Therefore, the corresponding precision matrix is given by


Lam (2016) suggests improving this estimator by averaging many (e.g., 50) estimates using different sets of partitioned data that are generated via random permutation. We also randomly permute , , once a computation of for robustness.

The choice of the split location is non-trivial. Theorem 5 of Lam (2016, p. 941) suggests that when , it is asymptotically efficient to choose , with some constants . There are two difficulties in this regard. First, this asymptotic property is not applicable when goes to a constant smaller than 1. Second, there is no practical guidance for setting . Lam (2016) proposes to choose to minimize the following criterion by means of a grid search:


where the superscripts for s denote indices for different permutations, is a number of permutations executed, and denotes the Frobenius norm. He considers the following grid as a set of candidates for :


In our framework, one might consider tuning adaptively based on the above criterion as well. However, we do not adopt such a strategy, because the criterion is not informative enough to pin down the optimal choice of , as shown in the subsequent section. A default choice in this paper is , that is, the median of Lam’s (2016) grid. As shown in the next section, simulated posteriors are not sensitive to , as long as is within a moderate range.

3 Simulation Study

We compare the proposed approach with alternatives. 999The programs in this paper are written in Matlab 2016a (64bit), and executed on an Ubuntu Desktop 16.04 LTS (64bit), running on Intel Xeon E5-2607 v3 processors (2.6GHz).We compare the nonparametric eigenvalue-regularization precision matrix estimator given by (2) with the standard estimators specified by

where denotes the Moore-Penrose generalized inverse of a matrix . Six adaptation strategies are considered. The first is fixing the weighting matrix of the moment conditions based on the true parameter value (Oracle), the second is the concurrent Bayesian GMM (Concurrent) (Chernozhukov and Hong, 2003; Yin, 2009), and the third is the stochastic GMM (Stochastic) (Yin et al., 2011). The fourth is a MCMC version of the continuously updating GMM estimator (Hansen et al., 1996) (Continuous), that is, is updated in each cycle based on the current recursive means of the sampled parameters. The fifth is the random update strategy we propose (Random).

We adopt an instrumental variable (IV) regression as laboratory. A true data generating process is specified by the following two equations, for ,



is a response variable,

is an endogenous covariate, is a -dimensional vector of instruments, and

are normally distributed errors, and

denotes a normal distribution with mean

and variance

. is a coefficient to be inferred. The instruments and their corresponding coefficients are generated as follows, for ,

The signal-to-noise ratios of equations (5) and (6) are fixed to one. The variances of the errors are chosen as

Unknown parameter is inferred through a set of moment conditions,

We assign a uniform prior on , . The prior is set as centered at and symmetric around the true value to minimize prior-induced bias.

The sample size is fixed at . We consider three scenarios with different numbers of instruments . For posterior sampling, we employ an adaptive MH sampler of Vihola (2012), which automatically tunes the covariance of a proposal density. The tuning parameters of the sampler are chosen as in Vihola (2012). For all experiments, we simulate a total of 70,000 draws: the initial 20,000 draws are used for warmup and the subsequent 50,000 for posterior estimates. We evaluate the results according to three measures. The first is the failure rate (Fail): when the estimated interquantile range of a target posterior density is larger than 1 or smaller than 0.01, we regard the MCMC run as failed. The second is the mean squared error of the posterior mean estimate (MSE), and the third the total computation time measured in seconds (Speed). We conduct 500 experiments.

The upper part of Table 1 reports the results for , the middle part for , and the lower part for . The left half of Table 1 shows the results for the standard precision matrix estimator and the right half those for the NER estimator. There are three points worth mentioning. First, Concurrent is the obvious loser: high probability of failure, large MSE, and high computational cost. The relative advantage of Stochastic to Concurrent in terms of numerical stability is in line with Yin et al. (2011). Second, in terms of MSE, all Stochastic, Continuous and Random work well and are largely comparable. Third, Random is much faster than Stochastic and Continuous. Figure 1 provides a typical example of recursive posterior mean and occurrence of random adaptation (NER estimator, ). From this figure, a posterior mean is fairly fast to converge, which indicates that most updates of the weighting matrix in Continuous are essentially redundant. To conclude, we find Random has a good balance between statistical and computational efficiency.

Next, we compare the results of the alternative precision matrix estimators. When , while the posterior simulations using the standard estimator are unsuccessful, the NER estimator always provides reasonable posterior estimates. Therefore, when , only the NER estimator is a viable option. In terms of MSE, the NER estimator outperforms the standard estimator overall. Even when the number of moment conditions is smaller than the sample size , the NER estimator is likely to obtain a more accurate posterior estimate than the standard precision estimator. It is also worth mentioning that, when , the posterior simulation using the NER estimator is almost as precise as the cases with . A comparison between the results for the Oracle cases with different precision estimators and reveals that the NER estimator is not better than the standard one, but the gain from the numerical stability of the NER estimator outweighs its efficiency loss. The increased computation cost incurred by the NER estimator can be mitigated by using the Random adaptation method. When the NER estimator is used, Stochastic, Continuous and Random yield virtually the same MSEs. Therefore, a combination of Random and the NER estimator is preferred.

We also investigate the sensitivity of the above results to the choice of split location . We conduct Monte Carlo experiments using different and Random adaptation strategy. Following Lam (2016), we consider the grid of (4) (each is rounded to the nearest integer). Table 2 shows that the NER estimator consistently outperforms the standard estimator, irrespective of the split location choice. In our testing environment, as becomes smaller, MSE is likely to be small, regardless of . To investigate how much this result accords with the criteria based on the Frobenius norm (3), we simulate the values of (3) for different random permutations of the moment conditions using the true parameter. Panel (a) of Figure 2 reports the median and 90 percentile intervals of the simulated values for a fine grid . We only report the results for , as those for are qualitatively similar. As evident from the panel, an extremely high is not preferred, buy the criterion is not informative enough to select a good from a considerably large range. The variability of the criterion is not attributable to the small sample size. We conduct the same simulation as in panel (a) but the sample size increases to . Panel (b) of Figure 2 shows the results. When , the minimum is no longer the best choice. As is the case of , the values of the criterion based on the Frobenius norm are almost indifferent for a large range. As such, we recommend setting to approximately the half the sample size as default.

4 Application

To demonstrate the proposed method, we apply it to a demand analysis for automobiles. Berry et al. (1995) consider an IV regression model of demand for automobiles specified by

denotes the market share of product on market , with subscript denoting the outside option. A treatment is the product price. is an error term, and and are the parameters to be estimated. The primary focus of this application is inference of .

We consider two specifications.101010All data are extracted from R package hdm (version 0.2.3). The first specification coincides with Berry et al. (1995) as follows. A vector of covariates includes four covariates, namely, air conditioning dummy, horsepower to weight ratio, miles per dollar, and vehicle size. A set of instruments contains the four covariates and ten variables, namely, sum of each covariate taken across models made by product ’s firm, sum of each covariate taken across competitor firms’ products, total number of models produced by product ’s firm, and total number of models produced by the firm’s competitors. The second specification is an extension of the first, which is considered in Chernozhukov et al. (2015). and are extended from the first case by incorporating a time trend, quadratic and cubic terms of all continuous covariates, and first-order interaction terms. The numbers of the instruments in the first and second specifications are 10 and 48, respectively. The sample size is 2,217, being larger than the numbers of instruments. Nevertheless, because of ill-posedness of the data set, the covariance of a classical estimator is nearly singular. We use a constant prior; thus, if the relationship between the instruments and the treatment is linear and the distributions of residuals are normal, a posterior estimate coincides with a two-stage least square estimate. The posterior estimate is obtained using different combinations of precision matrix estimators and adaptation of proposal density. A total of 250,000 posterior draws are sampled and the last 200,000 for posterior analysis. We set and .

Table 3 summarizes the results of the posterior estimate for the coefficient on price. Although the number of moment conditions is fairly smaller than the sample size, MCMC runs using existing adaptation strategies (Concurrent and Stochastic) and the standard precision estimator fails to converge. By contrast, MCMC runs using the NER estimator obtain sensible posterior samples, irrespective of adaptation strategy. For comparison, Table 3 also includes the estimates obtained using four alternative methods. The first two are conventional: ordinary (OLS) and two-stage least square methods (2SLS). The second two are state-of-the-art: IV with instrument selection based on a least absolute shrinkage and selection operator (Chernozhukov et al., 2015), and Bayesian IV with a factor shrinkage prior (Hahn et al., 2018). Chernozhukov et al. (2015) propose to select fewer relevant instruments, while Hahn et al. (2018) propose to compress observed information into few latent factors. The two methods assume a linear relationship between instruments and the endogenous variable and Gaussianity of the error terms, while our method does not impose such assumptions. These alternative methods obtain larger estimates than the conventional ones, and the estimates considerably depend on a set of (potential) instruments. By contrast, our method estimated the coefficient to be intermediate between OLS and 2SLS, nearly irrespective of the choice of instruments.

5 Discussion

We propose a new adaptive MCMC approach to infer Bayesian GMM with many moment conditions. Our proposal consists of two elements. The first is the use of a nonparametric eigenvalue-regularized precision matrix estimator (Lam, 2016) for estimating the weighting matrix (i.e., the precision matrix of the moment conditions based on the recursive mean of the unknown parameters). This prevents us from ill-estimating the weighting matrix. The second is the use of random adaptation. By setting adaptation probability as exponentially decreasing, it can significantly reduce the computational burden, while retaining statistical efficiency. We show the superiority of the proposed approach over existing approaches through simulation, and demonstrate the approach by applying it to a demand analysis for automobiles.

There are many promising research areas that stem from this study. First, a theoretical investigation of the effects of tuning/estimation of a weighting matrix on the posterior density is needed but absent in the literature. Second, while the proposed approach seems to be fairly robust to , there is room for improvement by finding a better . Third, while this paper addresses only problems caused by many moment conditions, problems caused by many unknown parameters are also important. The proposed method should serve as a stepping stone for the further development of inferential methods for high-dimensional Bayesian GMM. Finally, it is worth conducting a thorough comparison between the proposed approach and existing frequentist and Bayesian approaches tailored to a specific class of models such as IV regressions and dynamic panel models.


  • Abadir et al. (2014) Abadir, K. M., Distaso, W., and Žikeš, F. (2014). Design-free estimation of variance matrices. Journal of Econometrics, 181(2):165–180.
  • Andrews (1999) Andrews, D. W. (1999). Consistent moment selection procedures for generalized method of moments estimation. Econometrica, 67(3):543–563.
  • Andrews and Lu (2001) Andrews, D. W. and Lu, B. (2001). Consistent model and moment selection procedures for gmm estimation with application to dynamic panel data models. Journal of Econometrics, 101(1):123–164.
  • Andrieu and Thoms (2008) Andrieu, C. and Thoms, J. (2008). A tutorial on adaptive mcmc. Statistics and computing, 18(4):343–373.
  • Arellano and Bond (1991) Arellano, M. and Bond, S. (1991). Some tests of specification for panel data: Monte carlo evidence and an application to employment equations. Review of Economic Studies, 58(2):277–297.
  • Belloni and Chernozhukov (2009) Belloni, A. and Chernozhukov, V. (2009). On the computational complexity of mcmc-based estimators in large samples. Annals of Statistics, 37(4):2011–2055.
  • Berry et al. (1995) Berry, S., Levinsohn, J., and Pakes, A. (1995). Automobile prices in market equilibrium. Econometrica, 63(4):841–890.
  • Bhattacharya and Dunson (2011) Bhattacharya, A. and Dunson, D. B. (2011). Sparse bayesian infinite factor models. Biometrika, 98(2):291–306.
  • Blundell and Bond (1998) Blundell, R. and Bond, S. (1998). Initial conditions and moment restrictions in dynamic panel data models. Journal of Econometrics, 87(1):115–143.
  • Canay (2010) Canay, I. A. (2010). Simultaneous selection and weighting of moments in gmm using a trapezoidal kernel. Journal of Econometrics, 156(2):284–303.
  • Caner et al. (2018) Caner, M., Han, X., and Lee, Y. (2018). Adaptive elastic net gmm estimation with many invalid moment conditions: Simultaneous model and moment selection. Journal of Business and Economic Statistics, 36(1):24–46.
  • Chang and DiTraglia (2018) Chang, M. and DiTraglia, F. J. (2018). A generalized focused information criterion for gmm. Journal of Applied Econometrics, 33(3):378–397.
  • Chen et al. (2016) Chen, X., Jacho-Chávez, D. T., and Linton, O. (2016). Averaging of an increasing number of moment condition estimators. Econometric Theory, 32(1):30–70.
  • Cheng and Liao (2015) Cheng, X. and Liao, Z. (2015). Select the valid and relevant moments: An information-based lasso for gmm with many moments. Journal of Econometrics, 186(2):443–464.
  • Chernozhukov and Hansen (2005) Chernozhukov, V. and Hansen, C. (2005).

    An iv model of quantile treatment effects.

    Econometrica, 73(1):245–261.
  • Chernozhukov and Hansen (2013) Chernozhukov, V. and Hansen, C. (2013). Quantile models with endogeneity. Annual Review of Economics, 5(1):57–81.
  • Chernozhukov et al. (2015) Chernozhukov, V., Hansen, C., and Spindler, M. (2015). Post-selection and post-regularization inference in linear models with many controls and instruments. American Economic Review, 105(5):486–90.
  • Chernozhukov and Hong (2003) Chernozhukov, V. and Hong, H. (2003). An mcmc approach to classical estimation. Journal of Econometrics, 115(2):293–346.
  • DiTraglia (2016) DiTraglia, F. J. (2016). Using invalid instruments on purpose: Focused moment selection and averaging for gmm. Journal of Econometrics, 195(2):187–208.
  • Donald et al. (2009) Donald, S. G., Imbens, G. W., and Newey, W. K. (2009). Choosing instrumental variables in conditional moment restriction models. Journal of Econometrics, 152(1):28–36.
  • Doran and Schmidt (2006) Doran, H. E. and Schmidt, P. (2006). Gmm estimators with improved finite sample properties using principal components of the weighting matrix, with an application to the dynamic panel data model. Journal of Econometrics, 133(1):387–409.
  • Fan and Liao (2014) Fan, J. and Liao, Y. (2014). Endogeneity in high dimensions. Annals of statistics, 42(3):872.
  • Haario et al. (2001) Haario, H., Saksman, E., and Tamminen, J. (2001). An adaptive metropolis algorithm. Bernoulli, 7(2):223–242.
  • Hahn et al. (2018) Hahn, P. R., He, J., and Lopes, H. (2018). Bayesian factor model shrinkage for linear iv regression with many instruments. Journal of Business and Economic Statistics, 36(2):278–287.
  • Hall (2005) Hall, A. R. (2005). Generalized Method of Moments. Oxford University Press.
  • Hall et al. (2007) Hall, A. R., Inoue, A., Jana, K., and Shin, C. (2007). Information in generalized method of moments estimation and entropy-based moment selection. Journal of Econometrics, 138(2):488–512.
  • Hall and Peixe (2003) Hall, A. R. and Peixe, F. P. (2003). A consistent method for the selection of relevant instruments. Econometric Reviews, 22(3):269–287.
  • Hansen (1982) Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica, 50(4):1029–1054.
  • Hansen et al. (1996) Hansen, L. P., Heaton, J., and Yaron, A. (1996). Finite-sample properties of some alternative gmm estimators. Journal of Business and Economic Statistics, 14(3):262–280.
  • Kim (2002) Kim, J.-Y. (2002). Limited information likelihood and bayesian analysis. Journal of Econometrics, 107(1-2):175–193.
  • Lam (2016) Lam, C. (2016). Nonparametric eigenvalue-regularized precision or covariance matrix estimator. Annals of Statistics, 44(3):928–953.
  • Lewbel (2012) Lewbel, A. (2012).

    Using heteroscedasticity to identify and estimate mismeasured and endogenous regressor models.

    Journal of Business and Economic Statistics, 30(1):67–80.
  • Li and Jiang (2016) Li, C. and Jiang, W. (2016). On oracle property and asymptotic validity of bayesian generalized method of moments.

    Journal of Multivariate Analysis

    , 145:132–147.
  • Liao (2013) Liao, Z. (2013). Adaptive gmm shrinkage estimation with consistent moment selection. Econometric Theory, 29(5):857–904.
  • Okui (2009) Okui, R. (2009). The optimal choice of moments in dynamic panel data models. Journal of Econometrics, 151(1):1–16.
  • Roberts and Rosenthal (2007) Roberts, G. O. and Rosenthal, J. S. (2007). Coupling and ergodicity of adaptive markov chain monte carlo algorithms. Journal of Applied Probability, 44(2):458–475.
  • Roberts and Rosenthal (2009) Roberts, G. O. and Rosenthal, J. S. (2009). Examples of adaptive mcmc. Journal of Computational and Graphical Statistics, 18(2):349–367.
  • Satchachai and Schmidt (2008) Satchachai, P. and Schmidt, P. (2008). Gmm with more moment conditions than observations. Economics Letters, 99(2):252–255.
  • Vieira et al. (2012) Vieira, F., MacDonald, R., and Damasceno, A. (2012). The role of institutions in cross-section income and panel data growth models: A deeper investigation on the weakness and proliferation of instruments. Journal of Comparative Economics, 40(1):127–140.
  • Vihola (2012) Vihola, M. (2012). Robust adaptive metropolis algorithm with coerced acceptance rate. Statistics and Computing, 22(5):997–1008.
  • Yin (2009) Yin, G. (2009). Bayesian generalized method of moments. Bayesian Analysis, 4(2):191–207.
  • Yin et al. (2011) Yin, G., Ma, Y., Liang, F., and Yuan, Y. (2011). Stochastic generalized method of moments. Journal of Computational and Graphical Statistics, 20(3):714–727.