DeepAI

# Detecting Identification Failure in Moment Condition Models

This paper develops an approach to detect identification failures in a large class of moment condition models. This is achieved by introducing a quasi-Jacobian matrix which is asymptotically singular under higher-order local identification as well as weak/set identification; in these settings, standard asymptotics are not valid. Under (semi)-strong identification, where standard asymptotics are valid, this matrix is asymptotically equivalent to the usual Jacobian matrix. After re-scaling, it is thus asymptotically non-singular. Together, these results imply that the eigenvalues of the quasi-Jacobian can detect potential local and global identification failures. Furthermore, the quasi-Jacobian is informative about the span of the identification failure. This information permits two-step identification robust subvector inference without any a priori knowledge of the underlying identification structure. Monte-Carlo simulations and empirical applications illustrate the results.

12/21/2020

### Weak Identification with Bounds in a Class of Minimum Distance Models

When parameters are weakly identified, bounds on the parameters may prov...
12/15/2021

### M-Estimation based on quasi-processes from discrete samples of Levy processes

We consider M-estimation problems, where the target value is determined ...
10/15/2018

### Inference When There is a Nuisance Parameter under the Alternative and Some Parameters are Possibly Weakly Identified

We present a new robust bootstrap method for a test when there is a nuis...
08/11/2018

### Identification and Bayesian inference for heterogeneous treatment effects under non-ignorable assignment condition

We provide a sufficient condition for the identification of heterogeneou...
02/16/2020

### Nonlinear MPC with Motor Failure Identification and Recovery for Safe and Aggressive Multicopter Flight

Safe and precise reference tracking is a crucial characteristic of MAVs ...
05/31/2019

### Counterfactual Analysis under Partial Identification Using Locally Robust Refinement

Structural models that admit multiple reduced forms, such as game-theore...
07/11/2022

### Repairing Neural Networks by Leaving the Right Past Behind

Prediction failures of machine learning models often arise from deficien...

## 1 Introduction

The Generalized Method of Moments (GMM) of Hansen & Singleton (1982)

is a powerful estimation framework which does not require the model to be fully specified parametrically. Under regularity conditions, the estimates are consistent and asymptotically normal. In particular, the moment conditions should uniquely identify the finite dimensional parameters. This is very difficult to verify in practice and, as noted in

, is often assumed. Yet, when identification fails or nearly fails, the Central Limit Theorem provides a poor finite sample approximation for the distribution of the estimates. This has motivated a vast amount of research on tests which are robust to identification failure. As discussed in the literature review, much of this work has focused on tests for the full parameter vector. Potentially conservative confidence intervals for scalar parameters can then be built by projecting confidence sets for the full parameter vector

(Dufour & Taamouti, 2005) or using a Bonferroni approach (McCloskey, 2017).

The contribution of this paper is two-fold: First, it introduces a quasi-Jacobian matrix which is singular under both local (first-order) and global identification failure and is informative about the coefficients involved in the failure. This is the main contribution of the paper and provides an approach similar to Cragg & Donald (1993) and Stock & Yogo (2005) but in a non-linear setting. Second, the information from the first step allows for two-step identification robust subvector inference, akin to type I inference in Andrews & Cheng (2012) but without a priori knowledge of the identification structure.

To detect identification failures, this paper constructs a quasi-Jacobian

matrix which corresponds to the best linear approximation of the sample moments function over a region of the parameters where these moments are close to zero, as defined by a bandwidth. To find the best linear approximation, two loss functions are considered: the supremum norm measures the largest difference between the moments and its approximation while the least-squares criterion focuses on the average difference. The sup-norm approximation provides strong and intuitive results while least-squares can be easily computed by OLS using the moments as a dependent variable.

The asymptotic behaviour of the quasi-Jacobian matrix, computed under these two loss functions, is studied under four identification regimes: strong, semi-strong,111Semi-strong identification is also known as nearly-weak identification (Antoine & Renault, 2009). higher-order local and weak (or set) identification. The GMM estimator is consistent and asymptotically normal in the first two regimes, consistent but not asymptotically normal in the third and is inconsistent in the fourth. Hence, the last two regimes correspond to settings where the finite sample distribution of the estimator is poorly approximated by standard asymptotics. Under (semi)-strong identification, the quasi-Jacobian matrix is asymptotically equivalent to the usual Jacobian matrix. After re-scaling, it is asymptotically non-singular. Under higher-order, weak or set identification the quasi-Jacobian matrix is asymptotically singular with eigenvalues vanishing at rate determined by the bandwidth used in the approximation and the nature of the identification failure. Furthermore, the quasi-Jacobian matrix is vanishing in the span of the identification failure, i.e. directions in which identification fails.

Building on these results, this paper constructs a two-step procedure for testing linear hypotheses on the parameter of the form:

 H0:Rθ=c vs. H1:Rθ≠c, (1)

for a given restriction matrix with and . Assuming there is evidence of identification failure, presented by a small value of the smallest eigenvalue of the quasi-Jacobian matrix, the two steps used to conduct inference can be summarized as follows:222Under strong and semi-strong identification, standard inference using the Wald, QLR or LM test will be valid. Lack of evidence for weak and higher-order identification would indicate that these tests can be used.

1. The first step splits the parameter vector into two sets of parameters: one set of parameters needs to be fixed given evidence that these might be weakly, set or higher-order identified.

is also fixed to match the null hypothesis (

1). Another set of parameters, for which there is no evidence of identification failure, will be assumed to follow (semi)-strong asymptotics.

2. The second step relies on projection inference333See e.g. Scheffe (1953); Dufour (1990); Dufour & Taamouti (2005, 2007). for

and the parameters that need to be fixed while concentrating out the remaining parameters. The test statistic needs to be robust to identification failure. One can use the S, K or CQLR statistic of

Stock & Wright (2000), Kleibergen (2005) and Andrews & Mikusheva (2016b), for instance.

Step 2 has previously been discussed in the literature.444See e.g. Kleibergen (2005); Andrews & Mikusheva (2016b), among others. The main challenge to implementing this step in practice has been in determining which nuisance parameters are (semi)-strongly identified when the others are fixed. When such decomposition is known ex-ante and identification strength depends on the value of the (semi)-strongly555The term (semi)-strong will refer to cases where identification can be either strong or semi-strong. identified parameters, Andrews & Cheng (2012) show how to conduct uniformly valid inference. In this paper, this ex-ante knowledge is not required since the quasi-Jacobian is vanishing on the span of the identification failure. In practice, a cutoff is required to distinguish between matrices that are vanishing from those that are not. A rule-of-thumb, similar to that of Stock & Yogo (2005), is provided to construct this cutoff when detecting weak/set as well as higher-order identification. It relies on a Nagar approximation of the size distortion under semi-strong asymptotics.

The two-step approach described above is shown to yield tests that are asymptotically valid under certain conditions. In particular, it is assumed that the search for the restrictions in the first step is sequential, nested and pre-determined. In practice, the researcher fixes an increasing number of coefficients until identification is restored, according to the quasi-Jacobian

. This more disciplined approach avoids the difficulties of studying data-driven search procedures which would complicate the analysis. Sequential procedures fit naturally in settings where some parameters are more credibly identified than others. The search procedure is shown to restore point identification with probability going to

. If the remaining parameters are (semi)-strongly identified,666Weak or higher-order identification of these parameters can be detected using the above, so this is not particularly restrictive assuming these are the only other possible identification regimes. then the second step yields valid inference procedures as discussed in the previous literature.

Also, under strong and semi-strong identification, the linear approximation can be used to construct estimates that are asymptotically equivalent to the GMM estimator. This approach effectively replaces the non-smooth/discontinuous moments with smoothed linear moments making global optimization simple. This may be of practical interest. Finally, the quasi-Jacobian

can be used in the usual sandwich formula when the moments are non-smooth as in quantile IV and SMM estimation of discrete choice models.

Monte-Carlo simulations illustrate the large sample behaviour of the quasi-Jacobian matrix and the two-step inference procedure in several designs. Those include a non-linear least-squares model where the nuisance parameter is not identified. This is similar to simulations in Andrews & Cheng (2012); Cheng (2015) but without assuming the identification structure is known.

The approach is then applied to two empirical settings. The first application considers the Euler equation in U.S. data. This is a well known example where identification is suspected to fail. The methods developed in this paper suggest that the discount rate is (semi)-strongly identified while the risk-aversion parameter is poorly identified as suggested in Stock & Wright (2000). Some investigation into the source of the identification failure reveals that the moments are highly redundant and amount to a single moment condition.777This implies that one should use one of the singulary and identification robust tests developed in Andrews & Guggenberger (2019). The second application considers quantile IV estimation of the demand for fish (Chernozhukov et al., 2007). The results suggest weak identification of the price elasticity of demand.

### Structure of the Paper

After a review of the literature and an overview of the notation used in the paper, Section 2 introduces the setting, the linear approximations, precise definitions of the identification regimes considered and the main assumptions used in the paper. Section 3 derives the asymptotic behaviour of the quasi-Jacobian matrix. Section 4 describes the two-step inference procedures in more details including: the Algorithms used to determine which parameters to fix, the rules-of-thumb for choosing the cutoffs and the asymptotic results for the inference procedures. Section 5 provides a Monte-Carlo example to illustrate some of the results from the previous sections. An empirical example is provided in Section 6. Section 7 concludes. Appendices A and B provide the proofs for the main results of Sections 3 and 4 respectively. The Supplement consists of Appendices C, D, E, F, G and H which provides additional and preliminary results for the main text and their proofs as well as additional Monte-Carlo and Empirical results.

### Related Literature

The literature on the identification of economic models is quite vast. An extensive review is given in Lewbel (2018). Within this literature, this paper mainly relates to three topics: local and global identification of finite dimensional parameters in the population, detecting identification failure in finite samples and identification robust inference.

Koopmans & Reiersol (1950) provide one of the earliest general formulation of the identification problem at the population level. To paraphrase the authors, the main problem is to determine whether the distribution of the data, assumed to be generated from a given class of models, is consistent with one, and only one, set of structural parameters. In the likelihood setting, Fisher (1967); Rothenberg (1971) give sufficient conditions for local and global identification of the structural parameters as the unique solution to a non-linear system of equations. These include the well-known rank condition and strict convexity. For GMM, Komunjer (2012) introduced weaker conditions for global identification. In the present paper, singularity of the quasi-Jacobian will appear when either global or local identification fails for a large class of moment conditions.

In linear models, global identification amounts to a rank condition on the slope of the moments. This insight was used to construct several pre-testing procedures in linear IV models for identification failure (Cragg & Donald, 1993; Stock & Yogo, 2005). Pre-tests based on the null of strong identification were given by Hahn & Hausman (2002) in linear IV and Inoue & Rossi (2011); Bravo et al. (2012) in non-linear models. Note that pre-testing for strong identification in the first step can be problematic for two-step inference procedures when power is low in the first step. For non-linear models, Wright (2003) tests the local identification condition with a rank test at every point of a robust confidence set. Antoine & Renault (2017) rely on a distorted J-statistic to detect local identification failure. Arellano et al. (2012) develop a test for underidentification when a single coefficient is unidentified. In this paper, identification strength is summarized by the smallest eigenvalue of the quasi-Jacobian matrix under weak and set identification. This is both convenient and easy to communicate. Residual curvature also matters when pre-testing for higher-order identification as discussed in Section 4.2.2.

Given the impact of (near) identification failure on standard inferences,888See e.g. Choi & Phillips (1992); Dufour (1997); Staiger & Stock (1997) in the case of IV regression. a large body of literature has developed identification robust tests. Most consider inference of the full parameter vector.999See e.g. Anderson & Rubin (1949); Stock & Wright (2000); Moreira (2003); Kleibergen (2005); Andrews & Mikusheva (2016b); Chen et al. (2018). Few consider the topological features of the identified set to conduct inferences, with the notable exception of Andrews & Mikusheva (2016a). For subvector inferences, a common approach is to construct a confidence set for the full vector and project it on the dimension of interest (Dufour & Taamouti, 2005, 2007) or to use a Bonferroni correction (McCloskey, 2017). These methods might be conservative.101010However, as discussed in Section 4, Remark 2, when the nuisance parameters are completely unidentified projection inference may actually have exact asymptotic coverage. To increase power, one can concentrate out nuisance parameters that are known to be strongly identified. A series of papers starting with Andrews & Cheng (2012)111111These include Andrews & Cheng (2013, 2014); Cheng (2015) and Han & McCloskey (2019); Cox (2017). considers uniformly valid subvector inferences in a class of model where the identification structure is known and identification strength is driven by some (semi)-strongly identified coefficients. As discussed in Andrews & Mikusheva (2016b), computing the least favorable distribution required for their uniform (type II) inference may be numerically challenging or unfeasible in some settings. Under higher-order local identification, the estimates are consistent but with non-standard limiting distribution (Rotnitzky et al., 2000; Dovonon & Hall, 2018). This issue is known121212For instance, van der Vaart (1998) when discussion higher-order Taylor expansions in Chapter 3.3, argues that -1[1]”it is necessary to determine carefully the rate of all terms in the expansion […] before neglecting the ‘remainder’.” but much less studied than weak and set identifications. Dovonon et al. (2019) study the properties of identification robust tests in second-order identified models. Lee & Liao (2018) show how to conduct standard inference in second-order identified models with known identification structure.

### Notation

For any matrix (or vector) , is the Frobenius (Euclidian) norm of . For any rectangular matrix

, the singular value

refers to the eigenvalue of . refer to the largest and smallest value of , respectively. With some abuse of notation, these singular values will be referred to as eigenvalues. For a weighting matrix , the norm is computed as . For any two positive sequences , ; ; . For

a sequence of random variables and

positive sequence, ; .

## 2 Setting and Assumptions

Following Hansen & Singleton (1982), the econometrician wants to estimate the solution vector to the system of unconditional moment equations:

 gn(θ0)def=E(¯gn(θ0))=0, (2)

where , a compact subset of , . , is a sample of iid or stationary random variables. Throughout, it is assumed that at least one such exists.131313This can be achieved in misspecified models by re-centering the moments: where . The population moments are allowed to depend on , as in Stock & Wright (2000). is assumed to be continuously differentiable on .

Given the sample moments and a sequence of positive definite weighting matrices , the GMM estimator solves the minimization problem:

 ^θn=argminθ∈Θ∥¯gn(θ)∥2Wn,

where .

### 2.1 Linear-Approximations and the quasi-Jacobian Matrix

The quasi-Jacobian matrix is defined below as the slope of a local linear approximation under a given norm.

###### Definition 1.

(Sup-Norm and Least-Squares Approximations) Let be a kernel function and a bandwidth. The sup-norm approximation solves:

 (An,∞,Bn,∞) =argminA,Bsupθ∈Θ∥A+Bθ−¯gn(θ)∥×^Kn(θ), (3)

where . The least-squares approximation solves:

 (An,LS,Bn,LS) =argminA,B∫Θ∥A+Bθ−¯gn(θ)∥2×^Kn(θ)dθ, (4)

where . The quasi-Jacobian refers to the computed using either the least-squares (LS) or sup-norm () approximation.

The sup-norm approximation solves a non-smooth optimization problem and is thus more computationally demanding. However, the theory for is very intuitive and it will be quite useful to understand the relation between the quasi-Jacobian and identification failure. In practice, it will be more convenient to compute the least-squares approximation:

 (A′n,LSB′n,LS)=(∫ΘX(θ)X(θ)′^Kn(θ)dθ)−1∫ΘX(θ)¯gn(θ)′^Kn(θ)dθ,X(θ)=(1,θ′)′.

The two integrals can be approximated using Monte-Carlo methods such as importance sampling, Markov-Chain Monte-Carlo and Sequential Monte-Carlo methods

(Robert & Casella, 2004). In this paper, quasi-Monte-Carlo integration with the low-discrepancy Sobol sequence was used and provided satisfying results. See e.g. Owen (2003); Lemieux (2009) for an overview of quasi-Monte Carlo integration.

Implementation is straightforward: the Sobol sequence provides a grid for over which and are evaluated. One then simply regresses the evaluated moments on the grid points and an intercept using weighted least-squares with as weights. If has compact support, one can omit all grid points with from the regression. The quasi-Jacobian

collects the slope coefficients in this weighted linear regression.

The theory for , while similar to , involves additional topological arguments and the convergence of a quasi-posterior under higher-order, weak and set identification making the intuition somewhat more difficult to convey.

For linear models such as OLS or linear IV, the approximation is exact and one would find and . The quasi-Jacobian is close to singular where the regressors are nearly multicollinear in OLS or when the instruments are not sufficiently relevant in IV. The rank of is thus informative about the identification failure in these models. This extends to non-linear models.

The following gives an heuristic description of the behaviour of the

quasi-Jacobian when identification holds or fails. Formal results will be provided in the next section. First note that the kernel and bandwidth play a very important role here as they select all potential solutions for the moment condition (2). When the moment equations have a unique solution , then holds only in small neighborhoods of - with high probability. If, in addition, is smooth then the discrepancy becomes:

 ¯gn(θ)−A−Bθ≃[¯gn(θ0)+∂θ¯gn(θ0)(θ−θ0)]−[A+Bθ0+B(θ−θ0)]

so that is a smoothed approximation of the usual Jacobian matrix.

In locally point identified models, the Jacobian and quasi-Jacobian will have full rank. Local, or first-order, identification failure appears when is singular. implies that the eigenvalues of the quasi-Jacobian are informative about local identification failure.

When the model is set identified there are, by definition, at least two solutions to the moment equations (2). The linear approximation implies that:

 An,LS/∞+Bn,LS/∞θ≃¯gn(θ)=O(κn), for θ∈{θ0,θ1}.

For small this implies:

 Bn,LS/∞[θ0−θ1]=Op(κn)≃0.

Given that , this implies that the quasi-Jacobian must be close to singular in large samples. Both and will determine how close to singular it will be. Overall, both local and global identification failures imply near singularity of the quasi-Jacobian in large samples.

### 2.2 Identification Regimes

The following describes the four identification regimes considered in this paper. Their implications for the GMM estimator are summarized in Table 1. Examples 1, 2 illustrate the definitions.

###### Example 1 (Non-Linear Least-Squares).

This example is adapted from Cheng (2015). Consider the following non-linear regression model:

 yt=θ1x1,t+θ1θ2x2,t+et

with iid with mean

and variance

such that and . The estimating moments are:

###### Example 2 (Possibly Noninvertible MA(1) Model).

This example is adapted from Gospodinov & Ng (2015). Consider the MA(1) model:

 yt=σ[et−ϑet−1]

where is iid with mean , variance

and skewness

known. Using the moments and only identifies when . Assuming invertibility () restores point identification. Gospodinov & Ng (2015) show that when , the additional information provided by allows to identify in the population without imposing invertibility.

###### Definition 2.

(Point Identification) The model is point identified if such that , :

 inf∥θ−θ0∥≥ε∥gn(θ)∥W≥η(ε),∀n≥1, (5)

where , is a non-stochastic positive semi-definite weighting matrix.

Definition 2 corresponds to the case where is unique and thus globally identified. Additional regularity conditions combined with this assumption imply that is consistent for (see e.g. Newey & McFadden, 1994, Theorem 2.6).

###### Definition 3.

(Strong Identification) The model is strongly identified if it is point identified and and such that implies:

 ∥gn(θ)∥W≥C––∥θ−θ0∥,∀n≥1. (6)

Definition 3 is satisfied when the Jacobian has full rank, its smallest eigenvalue is bounded below and, around . With additional regularity conditions, it implies that is asymptotically Gaussian (see e.g. Newey & McFadden, 1994, Theorem 3.2). Standard inferences using the Wald, QLR and LM test are asymptotically valid.

###### Example 1 (Continued).

The Jacobian of the moments evaluated at implies the following:

 ∂θgn(θ0,n)=(100θ1,0)⇒(10−θ21)gn(θ)=∂θgn(θ0,n)(θ−θ0).

Note that is the only eigenvalue of the matrix on the left-hand side which implies that . bounded away from zero implies that the eigenvalues of are bounded below as well.

###### Example 2 (Continued).

The estimating moments are given by:

Suppose that is bounded away from , and is bounded away from . Point identification holds since: unless , or . It can also be shown that the eigenvalues of the Jacobian are bounded below when is bounded away from zero.

###### Definition 4.

(Semi-Strong Identification) The model is semi-strongly identified if it is point identified and

1. , such that implies:

 ¯¯¯¯C∥∂θgn(θ0)(θ−θ0)∥≥∥gn(θ)∥W≥C––∥∂θgn(θ0)(θ−θ0)∥,∀n≥1 (7)
2. ,

3. for any such that :

 ∥gn(θ1)−gn(θ2)−∂θgn(θ2)(θ1−θ2)∥=O(∥∂θgn(θ2)(θ1−θ2)∥2),
4. for any :

 [∂θgn(θ1)−∂θgn(θ0)][∂θgn(θ0)′∂θgn(θ0)]−1[∂θgn(θ1)−∂θgn(θ0)]′=o(1).

Definition 4 ii. implies that the Jacobian can be vanishing in one or several directions - but not too fast. When , conditions iii.-iv. also imply that the second-order term is vanishing. As a result, the moments remain approximately linear around , as in Definition 3. Together with additional regularity conditions this implies that, after re-scaling, will be asymptotically Gaussian. However, the convergence is slower than the usual -rate (Antoine & Renault, 2009; Andrews & Cheng, 2012). Standard inferences using the Wald, QLR and LM tests are asymptotically valid.

###### Example 1 (Continued).

Consider the drifting sequence with and : if and .

###### Definition 5.

(Higher-Order Local Identification) The model is locally identified at a higher order if it is point identified and , for together with projection matrices satisfying , when such that implies:

 r∑j=1¯¯¯¯Cj∥Pj(θ−θ0)∥j≥∥gn(θ)∥W≥r∑j=1C––j∥Pj(θ−θ0)∥j,∀n≥1. (8)

Definition 5 corresponds to cases where the moments are not approximately linear around . As a result, the first higher-order terms affect the limiting distribution of the GMM estimator . Together with additional regularity conditions, this assumption implies that some components of converges at a -rate to a non-Gaussian limiting distribution. Wald, QLR and LM statistics have non- limiting distributions (see e.g. Rotnitzky et al., 2000; Dovonon & Hall, 2018); standard inferences are not asymptotically valid.

###### Example 2 (Continued).

Suppose that and . Condition iii.a. holds since there is a unique solution and the moments are continuous. Omitting the third moment condition, the Jacobian becomes:

 ∂θgn(θ0)=(−2σ20ϑ0−(1+ϑ20)σ20ϑ0)=(−2σ20−2σ201)

which is singular and implies first-order identification failure of the model. Taking the derivative again:

 ∂2θ,ϑgn(θ0)=(−2σ20ϑ0−2ϑ001)=(−2σ20010),∂2θ,σ2gn(θ0)=(−2010).

Note that

is the eigenvector which spans the null space of the Jacobian, and both second-order derivatives are non-singular on the span of

which implies second-order identification (see Dovonon & Hall, 2018). Indeed, consider the parametrization , then for :

 gn(θ)= ∂θgn(θ0)(11)h1+∂2θ,ϑgn(θ0)2((h1+h2)2h21−h22)+∂2θ,σ2gn(θ0)2(h21−h22(h1−h2)2)+o(∥(h1,h2)∥2).

The conditions are then satisfied by taking small enough. More generally, Gospodinov & Ng (2015) show that first-order identification generally fails when and .

###### Definition 6.

(Weak and Set Identification) The model is said to be weakly or set identified if there exists at least two in the weakly identified set:

 Θ0={θ∈Θ,limn→∞√n∥gn(θ)∥W<+∞}. (9)

Definition 6 occurs when global identification fails or nearly fails. Under strong, semi-strong and higher-order identification, a robust and conservative confidence set would concentrate around a single point . Definition 6 collects all models where this phenomenon does not occur. The GMM estimator is typically not consistent (Staiger & Stock, 1997; Stock & Wright, 2000; Andrews & Cheng, 2012) and has non-Gaussian limiting distribution. Standard inferences using the Wald, QLR and LM tests are not asymptotically valid.

Definition 6 nests the definition of Stock & Wright (2000) who consider a drifting sequence of moments:

 gn(θ)=g1(γ)+g2(β,γ)√n,

where , are two functions satisfying Definition 3, for instance. They show that both and are inconsistent even though is consistent for a fixed . Definition 6 also nests the setting of Andrews & Cheng (2012) where the identification strength for is determined by a drifting sequence of a (semi)-strongly identified scalar coefficient .

###### Example 1 (Continued).

Consider the drifting sequence . Take for any , then . As a result .

###### Example 2 (Continued).

Consider the drifting sequence of moments . The moment conditions become:

 gn(1/ϑ0,σ20ϑ20)=⎛⎜⎝00cσ30/√n[sign(ϑ0)ϑ20−ϑ0]⎞⎟⎠.

This implies that . As a result, is not a singleton when and .

### 2.3 Main Assumptions

The following provides the main assumptions on the moments , weighting matrix , kernel and bandwidth to derive the results in Section 3 for .

###### Assumption 1.

(Bandwidth, Kernel)

1. (Bandwidth) . , and as ,

2. (Compact Kernel) is Lipschitz-continuous on with for , for ,

3. (Exponential Kernel) is exponential in , i.e. such that , . Define and assume , as .

In line with the heuristic discussion above, the bandwidth is assumed to be small. Condition i. ensures that it converge to at a slower than -rate, but faster than a -rate. When , would also capture second-order non-linearities under (semi)-strong identification. When , a Law of the Iterated Logarithm can be invoked to set:

 κn=√2log(log[n])/n, (10)

so that .141414See also Andrews & Soares (2010); Andrews & Cheng (2012) for choices of such sequences. In smaller samples, one can also set where is a (e.g. ) quantile of a distribution (recall that ). Two types of kernels are considered. Compact kernels (condition i.), are used in both sup-norm and least-squares approximations. The Lipschitz-continuity condition simplifies some of the proofs in Section 3, but numerical experiments showed almost no numerical difference with the uniform kernel . Exponential kernels are considered only for the least-squares approximation. A simple example is , the Gaussian density, which provides a quasi-Bayesian interpretation to as discussed in Section 3.2. Again, there was only negligible numerical differences in the quasi-Jacobian computed with the compact and the exponential kernel in the examples considered in this paper.

###### Assumption 2.

(Sample Moments, Weighting Matrix)

1. (Uniform CLT, Tightness) the empirical process converges weakly to a Gaussian process, as ; ,

2. (Discoverability of ) the weakly identified set satisfies:

 supn≥1supθ∈Θ0√n∥gn(θ)∥W<+∞,
3. (Stochastic Equicontinuity) uniformly in ,

 √n[¯gn(θ1)−¯gn(θ2)−(gn(θ1)−gn(θ2))]=op(1),
4. (Smoothness) is continuously differentiable on ; uniformly in ,

 ∥gn(θ1)−gn(θ2)−∂θgn(θ2)(θ1−θ2)∥=O(∥θ1−θ2∥2),
5. (Weighting Matrix) , is Lipschitz continuous in , such that , .

The high-level conditions in Assumption 2 are quite common in GMM estimation. Condition i. allows for non-smooth and possibly discontinuous sample moments as in quantile-IV (Chernozhukov & Hansen, 2005) or SMM estimation (Pakes & Pollard, 1989). For primitive conditions see van der Vaart & Wellner (1996) for iid and Dedecker & Louhichi (2002) for strictly stationary time-series data. Condition ii. ensures that the weakly identified set can be conservatively estimated using so that all directions of the identification failure can be detected.151515For just-identified models and the exponential kernel, one can use instead. For the exponential kernel, one can see that . Since is exponential only appears as a multiplicative constant in which cancels out in

. A similar argument appears in the proof of the Bernstein-von Mises Theorem in Bayesian statistics

(van der Vaart, 1998; Chernozhukov & Hong, 2003). Conditions iii. is the usual stochastic equicontinuity condition (Andrews, 1994). Condition iv. is only required under strong identification. Definition 4 provides stronger conditions to control the higher-order terms. It is not required under higher-order and weak identification. Condition v. is automatically satisfied for

, the identity matrix, but also the optimal weighting matrix

under uniform consistency for for iid data or the HAC estimator for time-series data and additional conditions on its eigenvalues as well as the Lipschitz continuity. Given the generality of the high-level assumptions, the results accommodate models where a (semi)-strongly identified nuisance parameter is concentrated out:

 ^θn=argminθ∈Θ∥¯gn(θ,^η(θ))∥Wn.

The results could be further extended to well identified infinite dimensional nuisance parameters ; this is left to future research.

## 3 Asymptotic Behaviour of the Linear Approximations

This section derives the asymptotic behaviour of the pair under strong and semi-strong identification and characterizes the behaviour of the quasi-Jacobian under higher-order and weak/set identification. The sup-norm and least-squares approximations are treated separately. Table 2 summarizes the results. At the population level, the results imply (by taking and ) that the quasi-Jacobian is the usual Jacobian for first-order globally identified models and is singular under either local or global identification failure. This provides a simple characterization of first-order and global identification failure for GMM in the population.

### 3.1 Sup-norm approximation

###### Theorem 1.

(Behaviour of the Sup-Norm Approximation under Strong Identification)
Suppose that the model is strongly identified and that Assumptions 1 i., ii. and 2 hold. Then the sup-norm approximation satisfies:

 An,∞ =¯gn(θ0)−Bn,∞θ0+op(n−1/2), Bn,∞ =∂θgn(θ0)+op(n−1/2κ−1n).

Theorem 1 shows that the sup-norm approximation is asymptotically equivalent to the usual expansion . This implies that

is a consistent estimator for the sandwich formula when computing standard errors. This can be particularly useful when

is non-smooth or discontinuous.

###### Theorem 2.

(Behaviour of the Sup-Norm Approximation under Semi-Strong Identification)
Suppose that the model is semi-strongly identified, Assumptions 1 i., ii. and 2 hold, that the bandwidth and moments are such that:

 κ2n=o(∣∣λmin(∂θgn(θ0)′∂θgn(θ0))∣∣),

then the sup-norm approximation satisfies:

 An,∞ =¯gn(θ0)−Bn,∞θ0+op(n−1/2), Bn,∞Hn =∂θgn(θ0)Hn+op(n−1/2κ−1n),

where .

Under semi-strong identification, is asymptotically Gaussian. The rate of convergence of each coefficient depends on the eigenvalues of - i.e. the singular values of 161616

Consider the singular value decomposition

where is the diagonal matrix of singular values. Then ; this implies that is an singular value with multiplicity . - and its eigenvectors. In practice, the standard errors adjust for the rate of convergence automatically, similarly to series and sieve inferences (Pagan & Ullah, 1999; Chen & Pouzo, 2015), so that the usual -statistic is asymptotically Gaussian. And again, can be used in the sandwich formula to compute standard errors. The scaled convergence of in Theorem 2 has implications in terms of convergence of the spectral decomposition of . Indeed, let be the th right singular vector of 171717 is also an orthogonal eigenvector of