Penalized Sieve GEL for Weighted Average Derivatives of Nonparametric Quantile IV Regressions

02/26/2019 ∙ by Xiaohong Chen, et al. ∙ berkeley college Yale University 0

This paper considers estimation and inference for a weighted average derivative (WAD) of a nonparametric quantile instrumental variables regression (NPQIV). NPQIV is a non-separable and nonlinear ill-posed inverse problem, which might be why there is no published work on the asymptotic properties of any estimator of its WAD. We first characterize the semiparametric efficiency bound for a WAD of a NPQIV, which, unfortunately, depends on an unknown conditional derivative operator and hence an unknown degree of ill-posedness, making it difficult to know if the information bound is singular or not. In either case, we propose a penalized sieve generalized empirical likelihood (GEL) estimation and inference procedure, which is based on the unconditional WAD moment restriction and an increasing number of unconditional moments that are implied by the conditional NPQIV restriction, where the unknown quantile function is approximated by a penalized sieve. Under some regularity conditions, we show that the self-normalized penalized sieve GEL estimator of the WAD of a NPQIV is asymptotically standard normal. We also show that the quasi likelihood ratio statistic based on the penalized sieve GEL criterion is asymptotically chi-square distributed regardless of whether or not the information bound is singular.



There are no comments yet.


This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Since the seminal paper by Koenker and Bassett (1978), quantile regressions and functionals of quantile regressions have been the subjects of ever-expanding theoretical research and applications in economics, statistics, biostatistics, finance, and many other science and social science disciplines. See Koenker (2005) and the forthcoming Handbook of Quantile Regression (2017) for the latest theoretical advances and empirical applications.

The presence of endogenous regressors is common in many empirical applications of structural models in economics and other social sciences. The Nonparametric Quantile Instrumental Variable (NPQIV) regression, , was, to our knowledge, first proposed in Chernozhukov and Hansen (2005) and Chernozhukov et al. (2007). This model is a leading important example of nonlinear and non-separable ill-posed inverse problems in econometrics, which has been an active research topic following the Nonparametric (mean) Instrumental Variables (NPIV) regression, , studied by Newey and Powell (2003), Hall and Horowitz (2005), Blundell et al. (2007), Carrasco et al. (2007), Darolles et al. (2011) and others. See, for example, Horowitz and Lee (2007), Chen and Pouzo (2009, 2012, 2015), Gagliardini and Scaillet (2012), Chernozhukov and Hansen (2013), Chen et al. (2014) and others for recent work on the NPQIV and its various extensions.

In this paper, we consider estimation and inference for a Weighted Average Derivative (WAD) functional of a NPQIV. For models without nonparametric endogeneity, WAD functionals of nonparametric (conditional) mean regression, , and of quantile regression, , have been extensively studied in both statistics and econometrics. In particular, under some mild regularity conditions, plug-in estimators for WADs of any nonparametric mean and quantile regressions can be shown to be semiparametrically efficient and root- asymptotically normal (where is the sample size). See, for example, Newey and Stoker (1993), Newey (1994), Newey and Powell (1999), Ackerberg et al. (2014) and the references therein. Although unknown functions of endogenous regressors occur frequently in empirical work, due to the ill-posed nature of NPIV and NPQIV, there is not much research on WAD functionals of NPIV and NPQIV yet. In fact, even for the simpler NPIV model that is a linear and separable ill-posed inverse problem, it is still a difficult question whether a linear functional of a NPIV could be estimated at the root- rate; see, e.g., Severini and Tripathi (2012) and Davezies (2016). Although Ai and Chen (2007) provide low-level sufficient conditions for a root- consistent and asymptotically normal estimator of the WAD of the NPIV model, and Ai and Chen (2012) provide a semiparametric efficient estimator of WAD for that model, to our knowledge, there is no published work on semiparametric efficient estimation of the WAD for the NPQIV model yet.

We first characterize the semiparametric efficiency bound for the WAD functional of a NPQIV model. Unfortunately, the bound depends on an unknown conditional derivative operator and hence an unknown degree of ill-posedness. Therefore, it is difficult to know if the semiparametric information bound is singular or not. Further, even if a researcher assumes that the information bound is non-singular and the WAD is root- consistently estimable, the results in Ai and Chen (2012) and Chen and Santos (2018) show that a simple plug-in estimator of a WAD might not be semiparametrically efficient. This is in contrast to the results of Newey and Stoker (1993) and Ackerberg et al. (2014) who show that plug-in estimators of a WAD of a nonparametric mean and quantile regression are semiparametrically efficient.

We then propose penalized sieve Generalized Empirical Likelihood (GEL) estimation of the WAD for the NPQIV model, which is based on the unconditional WAD moment restriction and an increasing number of unconditional moments implied by the conditional moment restriction of the NPQIV model, where the unknown quantile function is approximated by a flexible penalized sieve. Under some regularity conditions, we show that the self-normalized penalized sieve GEL estimator of the WAD of a NPQIV is asymptotically standard normal. We also show that the Quasi Likelihood Ratio (QLR) statistic based on the penalized sieve GEL criterion is asymptotically chi-squared distributed regardless of whether the information bound is singular or not; this can be used to construct confidence sets for the WAD of NPQIV without the need to estimate the variance nor the need to know the precise convergence rates of the WAD estimator.

Our estimation procedure builds upon Donald et al. (2003), who approximate a conditional moment restriction by an increasing sequence of unconditional moment restrictions, and then consider estimation of the Euclidean parameter (of fixed and finite dimension) and specification tests based on GEL (and related) procedures. For the same model , Kitamura et al. (2004) directly estimate the conditional moment restriction via kernel and then apply a kernel-based conditional empirical likelihood (EL) to estimate . However, the model considered in these papers does not contain any unknown functions (say ) and the residuals are assumed to be twice continuously differentiable with respect to at . For the semiparametric conditional moment restriction when the unknown function could depend on an endogenous variable, Otsu (2011) and Tao (2013) consider a sieve conditional EL extension of Kitamura et al. (2004), and Sueishi (2017) provides a sieve unconditional GEL extension of Donald et al. (2003), where the unknown function is approximated by a finite dimensional linear sieve (series) as in Ai and Chen (2003). However, like Ai and Chen (2003), all these papers assume twice continuously differentiable residuals with respect to , and hence rule out the NPQIV model.

Parente and Smith (2011) study GEL properties for non-smooth residuals in the unconditional moment models , but require the dimensions of both and to be fixed and finite. Finally, Horowitz and Lee (2007), Gagliardini and Scaillet (2012), Chen and Pouzo (2009, 2012, 2015), and Chernozhukov et al. (2015) do include the NPQIV model, but none of these papers addresses the issues of estimation and inference for the WAD of the NPQIV.

The rest of the paper is organized as follows. Section 2 introduces notation and the model. Section 3 characterizes the semiparametric efficiency bound for the WAD of the NPQIV model. Section 4 introduces a flexible penalized sieve GEL procedure. Section 5 derives the consistency and the convergence rates of the penalized sieve GEL estimator for the NPQIV model. Section 6 establishes the asymptotic distributions of the WAD estimator and of the QLR statistic based on penalized sieve GEL for the WAD of a NPQIV. Section 7 concludes with a discussion of extensions.

2 Preliminaries and Notation


be the observable data vector, where

is the outcome variable, is the endogenous variable and is the instrumental variable (IV); we assume the observable data,

, is distributed according to a probability distribution

. In order to simplify the exposition, we restrict attention to real-valued continuous random variables, i.e., we assume

has a density with support given by ; extending our results to vector-valued endogenous and instrumental variables would be straightforward but cumbersome in terms of notation.

Notation. For any subset, , of an Euclidean space let be the class of Borel probability measures over . For any , we use

to denote its probability density function (pdf) (with respect to Lebesgue (Leb) measure) and

to denote its support. We also use () to denote the marginal probability (pdf) of a random variable ; and () to denote the conditional probability (pdf) of given . For expectation, we write to be explicit about the fact that is the measure of integration; throughout we sometimes use when is the true probability of the data. The term “wpa1” stands for “with probability approaching one (under )”; for any two real-valued sequences denotes for some finite and universal; is defined analogously. For any , we use to denote the class of measurable functions such that ; as usual denotes the class of essentially bounded real-valued functions. We use to denote the Euclidean norm, and .

For any subset of a vector space , denotes the smallest linear space containing ; for any subspace , denotes its orthogonal complement in . For any linear operator, , let and ; it is bounded if and only if . For any linear bounded operator , denotes its generalized inverse; see, e.g., Engl et al. (1996).

2.1 The WAD of the NPQIV model

Let , where , i.e., is a Sobolev space of order , here should be viewed as a weak derivative of (see Brezis (2010)). We note that is a Hilbert space under the norm , and is a Hilbert space under the norm . In this paper we measure convergence in using another norm for (such as ). The parameter set is given by , where is bounded and convex and is a set that contains additional restrictions on which will be specified below. We assume that is such that there exists a parameter that satisfies


for , where is a nonnegative, continuously differentiable scalar function in and should be viewed as the weighting function of the average derivative, , of .

The following assumption ensures that the conditions above uniquely identify ; it will be maintained throughout the paper and will not be explicitly referenced in the results below.

Assumption 1.

There is a unique that satisfies model (1)-(2).

The interior assumption is needed only for the asymptotic distribution results in Section 6. In cases where has an empty interior, one can use the concept of relative interior of . This assumption is clearly high level. The goal of this paper is to characterize the asymptotic behavior of a modified GEL estimator of , taking as given the identification part; for a discussion of primitive conditions for Assumption 1, we refer the reader to Chen et al. (2014) and references therein.

The following assumption imposes additional restrictions over the primitives: , and .

Assumption 2.

(i) has a continuously differentiable pdf, , such that: the marginal density of is uniformly bounded, zero at the boundary of the support and ; the marginal density of is uniformly bounded away from 0 on its support; , ; (ii) is convex and such that for all , ; (iii) for all in a -neighborhood of .

Part (i) of this condition imposes differentiability and boundedness restrictions on different elements of ; part (ii) ensures that which allows for an alternative representation for using integration by parts (see expression 3 below); part (iii) is a high level assumption and essentially implies as well as continuity of .

3 Efficiency Bound for

By definition of , Assumption 2 and integration by parts, it follows that



For the derivations of the efficiency bound, it is important to recall that depends on , so we sometimes use to denote . Finally, observe that under our assumptions over and , .

The formal definition of the efficiency bound for the unknown parameter is given at the beginning of Appendix A. Loosely speaking, the efficiency bound is a lower bound for the asymptotic variance of all locally regular and asymptotically linear estimators of ; see Bickel et al. (1998) for details and formal definitions. If it is infinite, then the parameter cannot be estimated at root- rate by these estimators. We now derive this bound. For this, we introduce some useful notation. For any , let

Let be given by

for all and . The fact that maps into follows from Jensen inequality and the fact that (see Assumption 2). Its adjoint operator is denoted as . Finally, let

and . Then and .

Theorem 3.1.

Suppose Assumptions 1 and 2 hold and . Then

  1. The efficiency bound of is finite iff .

  2. If it is finite, its efficient variance is given by


See Appendix A. ∎

The first result in Theorem 3.1 is obtained following the approach of Bickel et al. (1998). The condition ensures that only the “identified part” of — that is, the part of that is orthogonal to the kernel of — matters for computing the weighted average derivative; we refer the reader to Appendix A and the paper by Severini and Tripathi (2012) for further discussion.

Severini and Tripathi (2012) provides an analogous result to Theorem 3.1(1) for linear functionals in a nonparametric linear IV regression model. Our condition , is analogous to theirs, but with a subtle yet important difference. In Severini and Tripathi (2012), the object that plays the role of does not depend on , whereas in our case it does. This observation changes the nature of our condition vis-a-vis theirs, because, in our setup, implies a restriction on since both quantities, and depend on it.111It is worth pointing out that this restriction was not imposed as one of the conditions that defined the model used to construct the tangent space; see Appendix A for a definition. It is also important to note that, if is compact, then the range of is a strict subset of so that may not hold. Hence, in this case the weighted average derivative may not be root-n estimable, and, moreover, the condition that determines the finiteness of the efficiency bound depends on unknown quantities. This observation highlights a difference with the no-endogeneity case, where the efficiency bound is always finite, provided that (see Newey and Stoker (1993)).

Another discrepancy between the no-endogeneity case and ours is that in the former case the “plug in” is always efficient (see Newey and Stoker (1993), Newey (1994)) due to the fact that the tangent space is the whole of . On the other hand, for NPQIV Chen and Santos (2018) show that the closure of the tangent space is the whole space iff the is dense in , which in turn is equivalent to . This last condition is comparable to a completeness condition on the conditional distribution of the exogenous variable given the endogenous ones, which may or may not hold for a particular .222In the NPIV setting, is equivalent to the pdf of given satisfying a completeness condition.

The second result in Theorem 3.1 follows from projecting the influence function onto the closure of the tangent space (see Bickel et al. (1998) and Van der Vaart (2000) and references therein). So as to shed some light on the expression for the efficiency bound, we point out that it corresponds to the efficiency bound of the semiparametric sequential conditional moment model via the “orthogonalized moments” approach in Ai and Chen (2012). In their notation, let and . Note that (and ). The model (1)-(2) becomes equivalent to their orthogonalized moment model:


The expression in our Theorem 3.1(2) coincides with their theorem 2.3 semiparametric efficient variance bound for of the model (4). Also see proposition 3.3 in Ai and Chen (2012) for the semiparametric efficient variance bound for the WAD of a NPIV model.

4 The Penalized-Sieve-GEL Estimator

In this section we introduce our estimator for . In order to do this, it will be useful to define some quantities. Given the i.i.d. sample , let be the corresponding empirical probability. Let be a complete basis in . For any , let be vector-valued function of , and for any , let

Let be an open interval that contains . For any , any and any , denote , and .

Let be strictly concave, twice-continuously differentiable with Lipschitz continuous second derivative; and ; see, e.g., Smith (1997) and Donald et al. (2003) for examples of such functions. For any , let

If were a finite-dimensional compact set with , then could be estimated by the GEL procedure: (see, e.g., Donald et al. (2003)).

Due to the presence of the infinite-dimensional nuisance parameter in the NPQIV model (1), the parameter space is an infinite-dimensional function space that is typically non-compact subset in and hence the identifiable uniqueness condition needed for consistency in -norm might fail; see, e.g., Newey and Powell (2003) and Chen (2007). The above GEL procedure needs to be regularized to regain consistency and/or to speed up rate of convergence in -norm. To this end, we introduce a regularizing structure, which, jointly with , consists of a sequence of sieve spaces in , and a sequence of penalties with tuning parameters and a penalty function .

The Penalized-Sieve-GEL (PSGEL) estimator is defined as

for any . If the “arg min” in the previous expression is empty, one can replace it by an approximate minimizer.

The following assumption imposes restrictions over the regularizing structure . Let be a basis functions in , and .

Assumption 3.

(i) is a basis in , and for each finite ;
(ii) For all , is closed and convex, and , i.e., for any there is an such that ; and for some finite , ;
(iii) (a) is lower semi-compact (in ), , , and , and (b) there exists an such that for any , any and any , if then .

Condition (i) is mild (see Donald et al. (2003) (DIN) and the discussion therein). Condition (ii) essentially defines the sieve space. Part (a) of Condition (iii) is standard in ill-posed problems (see Chen and Pouzo (2012)); Part (b) is not. If is bounded, then the condition is vacuous. If this is not the case, then the condition requires to be “stronger” than the norm. The need to bound arises from the fact that, in many instances, in the proofs we need to control uniformly on (e.g., see Lemma SM.II.3 in the Supplemental Material SM.II). Additionally, in our setup, is useful to link to because the structure of the problem implies a natural bound for — and thus, through Assumption 3(iii), a bound for —, as shown in the following lemma.

Lemma 4.1.

For any and any ,


See Appendix B. ∎

The bound, however, may depend on and thus may affect the convergence rate. Below, we will set in the right-hand-side (RHS) to a particular value in and use the resulting bound to construct what we call an “effective sieve space”.

5 Consistency and Convergence Rates of the PSGEL Estimator

This section establishes the consistency and the rates of convergence of the PSGEL estimator to the true parameter under a given norm over . In this and the next section, we note that the implicit constants inside the do not depend on .

5.1 Effective sieve space

Throughout the paper we use the following notation. Let ; and for any . For any , let

Let be a slowly diverging positive sequence, e.g., , which is introduced solely to avoid keeping track of constants. Finally we let

The sequence of sets, , can be viewed as the sequence of “effective” sieve spaces, because, as the following lemma shows, wpa1 the estimator (and, trivially, the sieve approximator ) both belong to it.

Assumption 4.

(i) ; (ii) , , for some ; (iii) .

Lemma 5.1.

Let Assumptions 1, 2, 3 and 4 hold. Then, for any , wpa1.


See Appendix D. ∎

The proof of this Lemma follows from Lemma 4.1 with and Lemma D.1 with and in Appendix D. The latter lemma provides a bound for in terms of and . With this in mind, the components of are intuitive: is related to the “variance” of , where is a bound for . The term is related to the “bias” and reflects the fact that is a sieve approximate to .

Remark 5.1.

As explained above, Lemma 5.1 and Assumption 3(iii) are used to ensure that is bounded. If the construction of directly implies for some fixed constant , then should replace in the definition of . This is applicable every time appears below.

5.2 Relation to Penalized Sieve GMM

As expected, the asymptotic properties of the PSGEL estimator are closely related to an approximate minimizer of a GMM criterion associated to the following expression: for any and any , let

where . That is, is the optimally weighted (population) GMM criterion function associated with the vector of moments .

For what follows, it will be useful to define the following intermediate quantity which can be viewed as a (sequence) of pseudo-true parameters. For each , let

We note that for any ; but as we restrict to the effective sieve space , it could be that for any . The following lemma guarantees that is in fact non-empty.

Lemma 5.2.

Let Assumptions 2 and 3 hold. Then, for each , is non-empty.


See Appendix D. ∎

While this lemma shows that is non-empty, it may not be a singleton. Nevertheless, for model (1)-(2), it is easy to choose some finite-dimensional linear sieve and some strict convex penalty such that is in fact a singleton. Therefore the next assumption is effectively a way to suggest choices of a regularizing structure:

Assumption 5.

For any , is single-valued.

Let . For each , the projection of onto the linear span of is denoted as , where

where by Assumption 3.

The next lemma provides sufficient conditions that ensure convergence of to the true parameter .

Lemma 5.3.

Let Assumptions 1, 2, 3 and 5 hold. Suppose . Then: .


See Appendix D. ∎

5.3 Convergence rates

A crucial part of establishing the convergence rate of is to bound the rate of . For this it is important to quantify how well the population sieve GMM criterion function separates points in around . To do this, we define, for each, , as


The function is analogous to the one used in the standard identifiable uniqueness condition (see White and Wooldridge (1991), Newey and McFadden (1994)). Within the ill-posed inverse literature this function is akin to the notion of sieve measure of ill-posedness used in Blundell et al. (2007) and Chen and Pouzo (2012, 2015). The following lemma establishes some useful properties.

Lemma 5.4.

Let Assumptions 2, 3 and 5 hold. Then: for each , iff and is continuous and non-decreasing in .


See Appendix D. ∎

It is worth noting that even though for all , it could happen that as diverges. This behavior reflects the ill-posed nature of the problem.

We now present some high-level assumptions used to establish the convergence rate of the PSGEL estimator. The first of these assumptions introduces, and imposes restrictions on, a positive real-valued sequence that is common in the GEL literature (see the Appendix in Donald et al. (2003)). It ensures that the ball belongs to for any (see Lemma SM.II.3 in the Supplemental Material SM.II). The assumption also restricts the rates of and the rate at which diverges relative to :

Assumption 6.

(i) Assumption 4 holds; (ii) , for some , and .

Recall that the sequence diverges arbitrary slowly like , and the bound is allowed to grow (slowly) at the rate of . Assumption 6 slightly strengthens Assumption 4.

The following assumption is a high-level condition that controls the supremum of the process over the classes and .

Assumption 7.

There exists a positive real-valued sequence, , such that, for any , and for all , .

For instance, if and are P-Donsker, then is uniformly bounded.333Restrictions on the “complexity” of these classes are implicit restrictions on the “complexity” of ; see Chen et al. (2003) and Van der Vaart (2000). But if this is not the case, then may diverge as (or ) grows.

The next theorem establishes the convergence rate of the PSGEL estimator; in particular it establishes the rate for the estimator of the infinite dimensional component .

Theorem 5.1.

Suppose Assumptions 1, 2, 3, 5 and 7 hold. For any satisfying Assumption 6, there exists a finite constant such that



See Appendix C. ∎

The rate of convergence of the PSGEL estimator is composed of two standard terms reflecting the “approximation error” and the “sampling error” . The component , reflects the ill-posed nature of the estimation problem. As noted previously, even though, for a fixed , for , this relationship can deteriorate as diverges, which implies that