A unified study of nonparametric inference for monotone functions

The problem of nonparametric inference on a monotone function has been extensively studied in many particular cases. Estimators considered have often been of so-called Grenander type, being representable as the left derivative of the greatest convex minorant or least concave majorant of an estimator of a primitive function. In this paper, we provide general conditions for consistency and pointwise convergence in distribution of a class of generalized Grenander-type estimators of a monotone function. This broad class allows the minorization or majoratization operation to be performed on a data-dependent transformation of the domain, possibly yielding benefits in practice. Additionally, we provide simpler conditions and more concrete distributional theory in the important case that the primitive estimator and data-dependent transformation function are asymptotically linear. We use our general results in the context of various well-studied problems, and show that we readily recover classical results established separately in each case. More importantly, we show that our results allow us to tackle more challenging problems involving parameters for which the use of flexible learning strategies appears necessary. In particular, we study inference on monotone density and hazard functions using informatively right-censored data, extending the classical work on independent censoring, and on a covariate-marginalized conditional mean function, extending the classical work on monotone regression functions. In addition to a theoretical study, we present numerical evidence supporting our large-sample results.

Authors

• 7 publications
• 10 publications
10/21/2018

Correcting an estimator of a multivariate monotone function with isotonic regression

In many problems, a sensible estimator of a possibly multivariate monoto...
10/08/2018

Causal isotonic regression

In observational studies, potential confounders may distort the causal r...
01/08/2019

Monotone Least Squares and Isotonic Quantiles

We consider bivariate observations (X_1,Y_1),...,(X_n,Y_n) such that, co...
04/23/2019

A penalized likelihood approach for efficiently estimating a partially linear additive transformation model with current status data

Current status data are commonly encountered in medical and epidemiologi...
07/10/2019

Nonparametric estimation of the conditional density function with right-censored and dependent data

In this paper, we study the local constant and the local linear estimato...
05/30/2020

Bayesian Nonparametric Monotone Regression

In many applications there is interest in estimating the relation betwee...
11/29/2019

Generalized inferential models for censored data

Inferential challenges that arise when data are censored have been exten...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

1.1 Background

In many scientific settings, investigators are interested in learning about a function known to be monotone, either due to probabilistic constraints or in view of existing scientific knowledge. The statistical treatment of nonparametric monotone function estimation has a long and rich history. Early on, Grenander (1956) derived the nonparametric maximum likelihood estimator (NPMLE) of a monotone density function, now commonly referred to as the Grenander estimator. Since then, monotone estimators of many other parameters, including hazard and regression functions, have been proposed and studied.

In the literature, most monotone function estimators have been constructed via empirical risk minimization. Specifically, these are obtained by minimizing the empirical risk over the space of non-decreasing, or non-increasing, candidate functions based on an appropriate loss function. The theoretical study of these estimators has often hinged strongly on their characterization as empirical risk minimizers. This is the case, for example, for the asymptotic theory developed by

Prakasa Rao (1969) and Prakasa Rao (1970) for the NPMLE of monotone density and hazard functions, respectively, and by Brunk (1970) for the least-squares estimator of a monotone regression function. Kim and Pollard (1990) unified the study of these various estimators by studying the argmin process typically driving the pointwise distributional theory of monotone empirical risk minimizers.

Many of the parameters treated in the literature on monotone function estimation can be viewed as an index of the statistical model, in the sense that the model space is in bijection with the product space corresponding to the parameter of interest and an additional variation-independent parameter. In such cases, identifying an appropriate loss function is often easy, and a risk minimization representation is therefore usually available. However, when the parameter of interest is a complex functional of the data-generating mechanism, an appropriate loss function may not be readily available. This occurs often, for example, when identification of the parameter of interest based on the observed data distribution requires adjustment for sampling complications (e.g., informative treatment attribution, missing data or loss to follow-up). It is thus imperative to develop and study estimation methods that do not rely upon risk minimization.

It is a simple fact that the primitive of a non-decreasing function is convex. This observation serves as motivation to consider as an estimator of the function of interest the derivative of the greatest convex minorant (GCM) of an estimator of its primitive function. In the literature on monotone function estimation, many estimators obtained as empirical risk minimizers can alternatively be represented as the left derivative of the GCM of some primitive estimator. This is because the definition of the GCM is intimately tied to the necessary and sufficient conditions for optimization of certain risk functionals over the convex cone of monotone functions (see, e.g., Chapter 2 of Groeneboom and Jongbloed, 2014). In particular, Grenander’s NPMLE of a monotone density equals the left derivative of the GCM of the empirical distribution function. In the recent literature, estimators obtained in this fashion have thus been referred to as being of Grenander-type. Leurgans (1982) is an early example of a general study of Grenander-type estimators for a class of regression problems.

In a seminal paper, Groeneboom (1985) introduced an approach to studying GCMs based on an inversion operation. This approach has facilitated the theoretical study of certain Grenander-type estimators without the need to utilize their representation as empirical risk minimizers. For example, under the assumption of independent right-censoring, Huang and Wellner (1995) used this approach to derive large-sample properties of a monotone hazard function estimator obtained by differentiating the GCM of the Nelson-Aalen estimator of the cumulative hazard function. This general strategy was also used by van der Vaart and van der Laan (2006), who derived and studied an estimator of a covariate-marginalized survival curve based on current-status data, including possibly high-dimensional and time-varying covariates. More recently, there has been interest in deriving general results for Grenander-type estimators applicable to a variety of cases. For instance, Anevski and Hössjer (2006) derived pointwise distributional limit results for Grenander-type estimators in a very general setting including, in particular, dependent data. Durot (2007), Durot et al. (2012) and Lopuhaä and Musta (2016) derived limit results for the estimation error of Grenander-type estimators under , supremum and Hellinger norms, respectively. Durot et al. (2013) studied the problem of testing the equality of generic monotone functions with independent samples. Durot and Lopuhaä (2014), Beare and Fang (2017) and Lopuhaä and Musta (2018a) studied properties of the least concave majorant of an arbitrary estimator of the primitive function of a monotone parameter. The monograph of Groeneboom and Jongbloed (2014) also summarizes certain large-sample properties for these estimators.

1.2 Contribution and organization of the article

In this paper, we wish to address the following three key objectives:

1. to provide a unified framework for studying a large class of nonparametric monotone function estimators that implies classical results but also applies in more complicated, modern applications;

2. to derive tractable sufficient conditions under which estimators in this class are known to be consistent and have a non-degenerate limit distribution upon proper centering and scaling;

3. to illustrate the use of this general framework to construct targeted estimators of monotone parameters that are possibly complex summaries of the observed data distribution, and whose estimation may require the use of data-adaptive estimators of nuisance functions.

Our first goal is to introduce a class of monotone estimators that allow the greatest convex minorization process to be performed on a possibly data-dependent transformation of the domain. For many monotone estimators in the literature, the greatest convex minorization is performed on a transformation of the domain. A strategic domain transformation can lead to significant benefits in practice, including in some cases the elimination of the need to estimate challenging nuisance parameters. Unfortunately, to our knowledge, existing results for general Grenander-type estimators do not apply in a straightforward manner in cases in which a data-dependent transformation of the domain has been used. We will define a class that permits such transformations, and demonstrate both how this class encompasses many existing estimators in the literature and how a transformation can be strategically selected in novel problems.

Our second goal is to derive sufficient conditions on the estimator of the primitive function and domain transformation that imply consistency and pointwise convergence in distribution of the monotone function estimator. As noted above, general results on pointwise convergence in distribution for the class of Grenander-type estimators, applicable in a wide variety of settings, were provided in Anevski and Hössjer (2006). Our work differs from that of Anevski and Hössjer (2006) in a few important ways. First, the role and implications of domain transformations – which, as we show, are often important in practice – were not explicitly considered in Anevski and Hössjer (2006). To our knowledge, the class of generalized Grenander-type estimators we consider in this paper, which allow for domain transformations, has not previously been studied in a unified manner, and hence, general results for this class do not currently exist. Second, in addition to pointwise distributional results, we study weak consistency. Third, in Sections 45 and 6, we pay special attention to parameters for which asymptotically linear estimators of the primitive and transformation functions can be constructed – in such cases, relatively straightforward sufficient conditions can be developed, and the limit distribution has a simpler form. While these results are weaker than those in Section 3 and in Anevski and Hössjer (2006) because they apply only to a special case, they are useful in many settings. We demonstrate the utility of these results for three groups of examples – estimation of monotone density, hazard and regression functions – and show that our results coincide with established results in these settings.

Our third goal is to discuss and illustrate Grenander-type estimation in cases in which nonparametric estimation of the primitive function requires estimation of challenging nuisance parameters. In this sense, our work follows the lead of van der Vaart and van der Laan (2006), whose setting is of this type. More generally, such primitive functions arise frequently, for example, when the observed data unit represents a coarsened version of an ideal data structure, and the coarsening occurs randomly conditional on observed covariates (Heitjan and Rubin, 1991)

. In our general results, we provide sufficient conditions that can be readily applied to such primitive estimators. To demonstrate the application of our theory in coarsened data structures, we consider extensions of the three classical monotone problems above to more complex settings in which covariates must be accounted for, because either the censoring process or the treatment allocation mechanism are informative, as is typical in observational studies. Specifically, we derive novel estimators of monotone density and hazard functions for use when the survival data are subject to right-censoring that may depend on covariates, and a novel estimator of a monotone dose-response curve for use when the relationship between the exposure and outcome is confounded by recorded covariates. Unlike for their classical analogues, in these more difficult problems, nonparametric estimation of the primitive function involves nuisance functions for which flexible estimation strategies (e.g., machine learning) must be employed. As

van der Vaart and van der Laan (2006) was able to achieve in a particular problem, our general framework explicitly allows the integration of such strategies while still yielding estimators with a tractable limit theory.

Our paper is organized as follows. In Section 2, we define the class of estimators we consider and briefly introduce our three working examples. In Section 3, we present our most general results for the consistency and convergence in distribution of our class of estimators. We provide refined results, including simpler sufficient conditions and distributional results, for the special case in which the primitive and transformation estimators are asymptotically linear in Section 4. In Section 5, we apply our general theory in three examples, both for classical parameters and for the novel extensions we consider. In Section 6, we provide results from simulation studies that evaluate the validity of the theory in two examples. We provide concluding remarks in Section 7. The proofs of all theorems are provided in Supplementary Material. Additional technical details are found in Supplementary Material.

2 Generalized Grenander-type estimators

2.1 Statistical setup and definitions

Throughout, we make use of the following definitions. For intervals , define as the space of bounded, real-valued functions on , as the subset of non-decreasing and càdlàg (right-continuous with left-hand limits) functions on , and as the further subset of functions whose range is contained in . The GCM operator is defined for any as the pointwise supremum over all convex functions on . We note that is necessarily convex. For , we denote by the generalized inverse mapping , and for a left-differentiable , we denote by the left derivative of .

We are interested in making inference about an unknown function determined by the true data-generating mechanism for an interval . We denote the endpoints of by and . We define the primitive function of pointwise for each as , where if we assume the integral exists. The general results we present in Section 3 apply in contexts with either independent or dependent data. Starting in Section 4, we focus on problems in which the data consist of independent observations from an unknown distribution contained in a nonparametric model . In such cases, we denote by a prototypical data unit, by the support of under , and we set .

In its simplest formulation, a Grenander-type estimator of is given by for some estimator of . However, as a critical step in unifying classical estimators and constructing procedures with possibly improved properties, we wish to allow the GCM procedure to be performed on a possibly data-dependent transformation of the domain . To do so, we first define for any interval the operator as for each and . We set , with possibly depending on , and suppose that a domain transform is chosen. We may then consider the domain-transformed parameter , which has primitive defined pointwise as for . As with and , is non-decreasing and is convex. Thus, it must be true that for each at which is left-continuous and such that for all . This observation motivates us to consider estimators of of the form , where , and are estimators of , and , respectively, and we define . We refer to any such estimator as being of the generalized Grenander-type. This class, of course, contains the standard Grenander-type estimators: setting and for the identity mapping yields . We note that, in this formulation, we require the domain over which the GCM is performed to be bounded, but not so for the domain of . Additionally, we assume that the left endpoint of is fixed at 0, while the upper endpoint may depend on . However, this entails no loss in generality, since if the desired domain is instead , where now also depends on , we can define and similarly shift by to obtain the new domain .

Defining , we suppose that we have at our disposal estimators and of and , respectively, as well as a weakly consistent estimator of . In this work, we study the properties of a generic generalized Grenander-type estimator of of the form

 θn:=IsoJn(Γn∘Φ−n,Φn) . (1)

Specifically, our goal is to provide sufficient conditions on the triple under which is consistent, and under which a suitable standardization of converges in distribution to a nondegenerate limit. As stated above, our only requirement for

is that it tend in probability to

. Therefore, our focus will be on the pair .

We note that estimators taking form constitute a more restrictive class than the set of all estimators of the form for arbitrary . Our focus on this slightly less general form is motivated by two reasons. First, as we will see in various examples, often has a simpler form than , and in such cases, it may be significantly easier to verify required regularity conditions for and to derive limit distribution properties based on rather than . Second, many celebrated monotone estimators in the literature follow this particular form. This can be seen by noting that, if is a right-continuous step function with jumps at points , then for each the estimator given in (1) equals the slope at of the greatest convex minorant of the diagram of points , where . We highlight well-known examples of estimators of this type below. In brief, we sacrifice a little generality for a substantial gain in the ease of application of our results, both for well-known and novel monotone estimators. Nevertheless, conditions on the pair under which consistency and distributional results hold for can be derived similarly.

2.2 Examples

Before proceeding to our main results, we briefly discuss the several examples we will use to illustrate how our framework allows us to not only obtain results on classical estimators in the monotone estimation literature directly, but also tackle more complex problems for which no estimators are currently available. These examples will be studied extensively in Section 5.

Example 1: monotone density function

Suppose that

is a univariate positive random variable with non-decreasing density function

, and that is right-censored by an independent random censoring time . The observed data unit is , where and , with distribution implied by the true marginal distributions of and . The parameter of interest is , the density function of with support . Taking to be the identity function, we get that . Here, both and represent the distribution function of , and plays no role. A natural estimator of can be obtained by taking to be the Kaplan-Meier estimator of the distribution function . With the identity map, and , the estimator is precisely the estimator studied by Huang and Wellner (1995). When with probability one, is the empirical distribution function based on , and is precisely the Grenander estimator, the NPMLE of .

In Section 5, we extend estimation of a monotone density function to the setting in which the data are subject to possibly informative right-censoring. Specifically, we only require and

to be independent conditionally upon a vector

of baseline covariates. We will study the estimator defined by differentiating the GCM of a one-step estimator of . In this context, estimation of requires estimation of nuisance functions. We will use our general results to provide conditions on the nuisance estimators that imply consistency and distributional results for .

Example 2: monotone hazard function

Suppose now that is a univariate positive random variable with non-decreasing hazard function . In this example, we are interested in . Setting to be the survival function of , we note that , and so, taking to satisfy makes . The restricted mean lifetime function satisfies this condition. Using this transformation, the estimator of the monotone hazard function only requires estimation of .

In Section 5, we again extend estimation of a monotone hazard function to allow the data to be subject to possibly informative right-censoring using the same one-step estimator of that will be introduced in Example 1 and the data-dependent transformation . We will show that, once the simpler details regarding the estimation of a monotone density are established, the asymptotic properties of this estimator of a monotone hazard are obtained essentially for free.

Example 3: monotone regression function

As our last example, we study estimation of a non-decreasing regression function. In the simplest setup, the data unit is and we are interested in . Assume without loss of generality that the data are sorted according to the observed values of . Taking to be the support of and to be the marginal distribution function of , we have that for each , and for each . Thus, and are natural nonparametric estimators of and , respectively. Then, is the classical monotone least-squares estimator of .

In Section 5, we consider an extension to estimation of a covariate-marginalized regression function, for use when the relationship between exposure and outcome of interest is confounded. Specifically, we will consider the data unit , with representing a vector of potential confounders, and focus on . Under untestable causal identifiability conditions, is the mean of the counterfactual outcome obtained by setting exposure at level . This parameter plays a critical role in causal inference, particularly when the available data are obtained from an observational study and the exposure assignment process may be informative. As before, tackling this more complex parameter will require estimation of certain nuisance functions.

3 General results

We begin with our first set of results on the large-sample properties of . Our goal is to establish conditions under which consistency and pointwise convergence in distribution hold. First, we provide general results on the consistency of , both pointwise and uniformly. We note that the results of Anevski and Hössjer (2006), Durot (2007), Durot et al. (2012) and Lopuhaä and Musta (2016) imply conditions for consistency of Grenander-type estimators. However, because the objective of their work is to establish distributional theory for a global discrepancy between the estimated and true monotone function, the conditions they require are stronger than needed for consistency alone. Also, their work is restricted to Grenander-type estimators, without data-dependent transformations of the domain.

Below, we refer to the sets and for .

Theorem 1 (Weak consistency).
1. Suppose is continuous at and, for some such that , is strictly increasing and continuous on . If , and tend to zero in probability, then .

2. Suppose and are uniformly continuous on , and is strictly increasing on . If and tend to zero in probability, then for each fixed .

We note that in part 1 of Theorem 1, we require uniform convergence of and to obtain a pointwise result for – this will also be the case for Theorem 2 below. This is because the GCM is a global procedure, and so, the value of depends on even for not near . Without uniform consistency of , may indeed fail to be pointwise consistent. Also, we note that in part 1 of Theorem 1, we require that and tend to zero uniformly over the set . This requirement stems from the fact that only depends on through the composition , and so, values of only matter at points in the range of . In part 1, we also require that tend to zero uniformly in a neighborhood of , while in part 2, we require that tend to zero uniformly over . These requirements allow us to obtain results for values that are possibly outside for all . In many applications, it may be the case that and both tend to zero in probability uniformly over , which implies convergence over .

The weak conditions required for Theorem 1 are especially important for the extensions of the classical parameters that we consider in Section 5. The estimators we propose often require estimating difficult nuisance parameters, such as conditional hazard, density and mean functions. While under mild conditions it is typically possible to construct uniformly consistent estimators of these nuisance parameters, ensuring a given local or uniform rate of convergence often requires additional knowledge about the true function. Thus, Theorem 1 is useful for guaranteeing consistency under weak conditions.

We now provide lower bounds on the convergence rate of , both pointwise and uniformly, depending on (a) the uniform rates of convergence of and , and (b) the moduli of continuity of and .

Theorem 2 (Rates of convergence).

Let be given. Suppose that, for some , and is strictly increasing and continuous on . Let be a fixed sequence such that , and are bounded in probability.

1. If there exist and such that for all and for all , then

 rα1α21+α1α2n[θn(x)−θ0(x)]=OP(1) .
2. If is constant on , then .

Let be a fixed sequence such that and are bounded in probability, and suppose that is strictly increasing on .

1. If there exist and such that for all and for all , then

 rα1α21+α1α2n∥θn−θ0∥∞,In,βn=OP(1)

for any (possibly random) positive real sequence such that .

We note here that the uniform results only cover subintervals of the interval over which the GCM procedure is performed. This should not be surprising given the poor behavior of Grenander-type estimators at the boundary of the GCM interval, as discussed, for example, in Woodroofe and Sun (1993), Kulikov and Lopuhaä (2006) and Balabdaoui et al. (2011). Various boundary corrections have been proposed – applying these in our general framework is an interesting avenue for future work.

We also note that, in Theorem 2, when and are locally or globally Lipschitz, then and the resulting rate is , which yields when . This rate is slower than the rate that is often achievable for pointwise convergence when and are differentiable at and the primitive estimator converges at rate , as we discuss below. However, the assumptions in Theorems 2 are significantly weaker than typically required for the rate of convergence: they constrain the supremum norm of the estimation error rather than its modulus of continuity, and hold when the true function is Lipschitz but not differentiable. Our results also cover situations in which or are in Hölder classes. The rates provided by Theorem 2 should thus be seen as lower bounds on the true rate, for use when less is known about the properties of the estimation error or of the true functions. The distributional results we provide below recover the usual rates under stronger conditions.

For a fixed sequence of positive real numbers, we now study the pointwise convergence in distribution of at an interior point at which has a strictly positive derivative. The rate depends on two interdependent factors. First, we suppose that there exists some such that as for some constant . Second, writing and , we suppose that there exists a sequence of positive real numbers such that the appropriately localized process

 Wn,x:u↦cα+1n{Γn,0(x+uc−1n)−Γn,0(x)−θ0(x)[Φn,0(x+uc−1n)−Φn,0(x)]}

converges weakly. We note that depends on . As we formalize below, if , then has a nondegenerate limit distribution under some conditions. We now introduce some of the conditions that we build upon:

[style=multiline,leftmargin=1cm]

(A1)

for each , converges weakly in to a tight limit process with almost surely lower semi-continuous sample paths;

(A2)

for every , ;

(A3)

there exist , and a sequence such that is decreasing, , and for all large and .

In addition, we introduce conditions on the uniform convergence of estimators and :

[style=multiline,leftmargin=1cm]

(A4)

for some ;

(A5)

.

Theorem 3 (Convergence in distribution).

If is an interior point of at which is continuously differentiable with positive derivative and satisfies , conditions (A1)–(A5) imply that

 rn[θn(x)−θ0(x)]d⟶Φ′0(x)−1∂−GCMR{v↦Wx(v)+[π0(x)Φ′0(x)α+1]|v|α+1}(0)

with . If in addition , and possesses stationary increments, then

 rn[θn(x)−θ0(x)]d⟶−θ′0(x)argminu∈R{Wx(u)+12θ′0(x)Φ′0(x)u2}.

Furthermore, if with a standard two-sided Brownian motion process satisfying , then with and .

The latter limit distribution is referred to as a scaled Chernoff distribution, since is said to follow the standard Chernoff distribution. This distribution appears prominently in classical results in nonparametric monotone function estimation and has been extensively studied (e.g., Groeneboom and Wellner, 2001). It can also be defined as the distribution of the slope at zero of .

Theorem 3 applies in the common setting in which is differentiable at with positive derivative – in other words, when . However, as in Wright (1981) and Anevski and Hössjer (2006), Theorem 3 also applies in additional situations, including when has derivatives at , with null derivatives of order and positive derivative of order . Nevertheless, Theorem 3 does not cover situations in which is flat in a neighborhood of . The limit distribution of the Grenander estimator at flat points was studied in Carolan and Dykstra (2008), but it appears that similar results have not been derived for Grenander-type or generalized Grenander-type estimators.

We note the similarity of our Theorem 3 to Theorem 2 of Anevski and Hössjer (2006). For the special case in which is the identity transform, the consequents of the two results coincide. Our result explicitly permits alternative transforms. Both results require weak convergence of a stochastic part of the primitive process, and also require the same local rate of growth of . Additionally, condition (A2) is implied if for every and positive, there exists a finite such that , as in Assumption A5 of Anevski and Hössjer (2006). However, the remaining conditions and methods of proof differ. To prove our result, we first generalize the switch relation of Groeneboom (1985) and use it to convert into the probability that the minimizer of a process involving falls below some value. After establishing weak convergence of this process, we then use conditions (A2) through (A5) to justify application of the argmin continuous mapping theorem. In contrast, Anevski and Hössjer (2006) establish their result using a direct appeal to convergence in distribution of to , where is a local limit process and its weak limit. They also provide lower-level sufficient conditions for this convergence. It may be possible to establish the consequent of Theorem 3, permitting in particular the use of a non-trivial transformation , using Theorem 2 of Anevski and Hössjer (2006) or a suitable generalization thereof. We have specified our sufficient conditions with applications to the setting and in mind, as we discuss at length in the next section.

Suppose that is the limit process that arises when no domain transformation is used in the construction of a generalized Grenander-type estimator, that is, when both and are taken to be the identity map. In this case, under (A1)–(A5), Theorem 3 indicates that

It is natural to ask how this limit distribution compares to the one obtained using a non-trivial transformation . In particular, does using change the pointwise distributional results for ? The answer is of course negative whenever and are equal in distribution, since is a homogeneous operator. A more detailed discussion of this question and lower-level conditions are provided in the next section.

4 Refined results for asymptotically linear primitive and transformation estimators

4.1 Distributional results

In applications of their main result, Anevski and Hössjer (2006) focus primarily on providing lower-level conditions to characterize the relationship between various dependence structures and asymptotic results for monotone regression and density function estimation. Anevski and Soulier (2011), Dedecker et al. (2011) and Bagchi et al. (2016) provide additional applications of Anevski and Hössjer (2006) to monotone function estimation with dependent data. Our Theorem 3 could be used, for instance, to relax the common assumption of a uniform design in the analysis of monotone regression estimators. Here, we pursue an alternative direction, focusing instead on providing lower-level conditions for consistency of and convergence in distribution of for use in the important setting in which , , the data are independent and identically distributed, and and are asymptotically linear estimators. Such settings arise frequently, for instance, when the primitive and transformation parameters are smooth mappings of the data-generating mechanism.

Below, we write to denote for any probability measure and -integrable function . We also use to denote the empirical distribution of independent observations from so that for any .

Suppose that there exist functions and depending on such that, for each , and both and are finite, and

 Γn(x)−Γ0(x) =PnD∗x,0+Hx,nandΦn(x)−Φ0(x)=PnL∗x,0+Rx,n , (2)

where and are stochastic remainder terms. If and tend to zero in probability, we say that and are uniformly asymptotically linear over as estimators of and , respectively. The objects and are referred to as the influence functions of and , respectively, under sampling from .

Assessing consistency and uniform consistency of is straightforward when display (2) holds. For example, if the classes and are -Donsker, and and are bounded in probability, then and are both bounded in probability. Thus, Theorems 1 and 2 can be directly applied with provided the required conditions on and hold. As such, we focus here on deriving a refined version of Theorem 3 for use whenever display (2) holds.

It is reasonable to expect the linear terms and to drive the behavior of the standardized difference in Theorem 3. The natural rate here is , for which Kim and Pollard (1990) provide intuition. Our first goal in this section is to provide sufficient conditions for weak convergence of the process , where is the empirical process and we define the localized difference function . Kim and Pollard (1990) also provide detailed conditions for weak convergence of processes of this type. Building upon their results, we are able to provide simplified sufficient conditions for convergence in distribution of when and are uniformly asymptotically linear estimators.

We begin by introducing conditions we will refer to. First, we define and suppose that has envelope function . The first two conditions concern the size of for small in terms of bracketing or uniform entropy numbers, which for completeness we define here – see van der Vaart and Wellner (1996) for a comprehensive treatment. Denote by the norm of a given -square-integrable function . The bracketing number of a class with respect to the norm is the smallest number of -brackets needed to cover , where an -bracket is any set of functions with and such that . The covering number of with respect to the norm is the smallest number of -balls in required to cover . The uniform covering number is the supremum of over all discrete probability measures such that , where is an envelope function for . We consider conditions on the size of :

[style=multiline,leftmargin=1cm]

(B1)

for some constants and , either (B1a) or (B1b) for all and small enough;

(B2)

, and for all , , as .

Condition (B1) replaces the notion of uniform manageability of the class for small as defined in Kim and Pollard (1990), whereas condition (B2) directly corresponds to their condition (vi). Since bounds on the bracketing and uniform entropy numbers have been derived for many common classes of functions, condition (B1) can be readily checked in practice. Together, conditions (B1) and (B2) ensure that is a relatively small class, and this helps to establish the weak convergence of the localized process .

As in Kim and Pollard (1990), to guarantee that the covariance function of this localized process stabilizes, it suffices that be bounded for small enough and that, up to a scaling factor possibly depending on , tend to the covariance function of a two-sided Brownian motion as . Below, we provide simple conditions that imply these two statements for a broad class of settings that includes our examples.

The covariance function of the Gaussian process to which converges weakly is defined pointwise as . The behavior of near dictates the covariance of the local limit process and hence the scale parameter . If is differentiable in at , it follows that and converges at a faster rate, although possibly with an asymptotic bias. When instead scaled Chernoff asymptotics apply, the covariance function can typically be written as

 Σ0(s,t)=Σ∗0(s,t)+∬s∧t−∞A0(s,t,v,w)H0(dv,w)Q0(dw) (3)

for some functions , and depending on , where is a probability measure induced by on some measurable space . In this representation, is taken to be the differentiable portion of the covariance function, which does not contribute to the scale parameter. The second summand is not differentiable at and makes tend to a non-zero limit. We consider cases in which , and satisfy the following conditions:

[style=multiline,leftmargin=1cm]

(B3)

Representation (3) holds, and for some , setting , it is also true that:

[style=multiline,leftmargin=1.25cm]

(B3a)

is symmetric in its arguments and continuously differentiable on ;

(B3b)

is symmetric in its first two arguments, and is differentiable for -almost every and each , with derivative continuous in each in for -almost every and satisfying the boundedness condition

 ∬x+δ−∞sups,t∈Bδ(x)|A′0(s,t,v,w)|H0(dv,w)