1 Introduction
Consider the model
(1) 
where are independent and identically distributed (i.i.d.) univariate random design points, and are i.i.d. with zero mean, unit variance, and are independent of . In this paper, we study the optimal estimation of under both local and global squared risks. Variance estimation is a fundamental statistical problem [Von Neumann, 1941, 1942; Rice, 1984; Hall et al., 1990] with wide applications. It is useful in, for example, construction of confidence bands for the mean function, estimation of the signaltonoise ratio [Verzelen and Gassiat, 2018], and selection of the optimal kernel bandwidth [Fan, 1992].
When are fixed, estimation of in (1) has been studied in, for example, Muller and Stadtmuller [1987], Hall and Carroll [1989], Ruppert et al. [1997], Härdle and Tsybakov [1997], Fan and Yao [1998], Wang et al. [2008]. In particular, when , and and in (1) are  and Hölder smooth, respectively, Wang et al. [2008] established a minimax rate of the order under both local and global squared risks. Here, for two real numbers , .
In contrast, our study focuses on the case where are i.i.d. random design points on the real line. For this, we show that when and in (1) are  and Hölder smooth, respectively, the minimax rate of estimating is of the order under both local and global squared risks. This result has several noteworthy implications:

The minimax rates in random and fixed design settings share a common component, , as well as the same transition boundary .

For , a faster rate is achievable with a random design.

Unlike the fixed design setting, for , and are now entangled in the minimax rate in the random design case.
We now discuss in more detail this minimax rate. The upper bound of the minimax rate is achieved by an estimator that combines local polynomial regression with pairwise differences, the latter of which is formulated via Ustatistics. Our analysis of this estimator hence relies on the fourterm Bernstein inequality in Giné et al. [2000], and unlike classic kernel methods, requires no smoothness assumption on the design density.
For the lower bound, due to the entanglement of and in the minimax rate and the additional randomness of , the derivation is much more involved than its counterpart in the fixed design setting. We tackle the first difficulty of entangled and via a proper localization technique in the construction of the mean function , depicted in Figure 2 in Section 3.2. The second difficulty caused by the randomness of is resolved with a new trapezoidshaped construction of the mean , aided by a result due to Kolchin et al. [1978] on the sparse multinomial distribution. This result helps characterize the asymptotic behavior of the locations of and plays a key role in our proof, but to our knowledge has not been well used in the nonparametric statistics literature.
In the special case of constant variance, (1) is reduced to
(2) 
and the goal becomes estimation of . In this case, the problem is linked to estimation of a quadratic functional, which has been studied in depth in the other two benchmark nonparametric models, the density model [Bickel and Ritov, 1988; Laurent, 1996; Giné and Nickl, 2008] and the white noise model [Donoho and Nussbaum, 1990; Fan, 1991; Laurent and Massart, 2000]. In the density model, one observes an i.i.d. univariate sequence from some unknown density , and the goal is to estimate . In the white noise model, one observes a continuoustime process from for with a standard Wiener process. The goal is to estimate . Under an smoothness condition on , the minimax rate in both of the aforementioned two cases is (cf. Theorem 1(ii) and 2(ii) in Bickel and Ritov [1988], Theorem 4 in Fan [1991]).
Following Doksum and Samarov [1995], a quadratic functional of interest under (2) is
(3) 
where is the design density and is some weight function. Assuming in (2) that is Hölder smooth, we show that the minimax rate of estimating and (when is unknown) is , thereby unifying the minimax rate of quadratic functional estimation in all three benchmark nonparametric models.
In this paper, we also provide extensions of (2) to multivariate cases, with a focus on the multivariate nonparametric regression model
(4) 
and the nonparametric additive model
(5) 
in both fixed and random designs. Here, , , for some fixed positive integer . Details are given in Sections 4.1 and 4.2.
stated in  minimax rate  boundary  
(1), fixed  Wang et al. [2008]  
(1), random  Theorems 3, 4, 5  
(2), fixed  Wang et al. [2008]  
(2), random  Theorems 1, 2  
(4), fixed (GD)  Proposition 3  
(4), fixed (DD)  Proposition 4  
(4), random  Propositions 1, 2  
(5), fixed (GD)  Proposition 5  [height = 0.6cm, width=4cm]  
(5), fixed (DD)  Proposition 6  
(5), random  Propositions 7, 8 
We summarize the minimax rates in all of the aforementioned models in Table 1.
The rest of the paper is organized as follows. Section 2 presents the simple model (2) with constant variance. Section 3 discusses its heteroscedastic extension (1). Section 4 discusses the multivariate nonparametric regression model (4), the additive model (5), and connections of our study to quadratic functional estimation as well as to variance estimation in the linear model. The essential lower bound proof of the minimax rate under model (2) is presented in Section 5, with the rest of the proofs given in a supplement.
The notation used throughout the paper is as follows. For any positive integer , denotes the set . For any real number , we use to denote the smallest integer greater than or equal to . For any positive integer ,
denotes the zero vector of dimension
anddenotes the identity matrix of dimension
. For a real vector , denotes its infinity norm. For a real matrix , denotes its spectral norm and denotes its determinant. For an times differentiable function with some positive integer , we use to denote its th derivative for. For identically distributed random variables
and , we use and to denote the distribution and density of , to denote , and to denote the density of . Similar notation applies to identically distributed random vectors and . For a positive integer and , stands for thedimensional normal distribution with mean
and covariance . We will drop the subscript for simplicity when . andrepresent the standard normal distribution and density. For two probability measures
defined on a common space , denotes their total variation distance, that is, . For two real positive sequences and , if for some absolute constant . We say if and .2 Homoscedastic case
To illustrate some of the main ideas developed in this paper, we begin with a discussion of the elementary univariate homoscedastic nonparametric regression model (2):
Here, are i.i.d. copies of a univariate random variable , belongs to an Hölder class that will be specified soon, and are i.i.d. copies of a variable with zero mean and unit variance and are independent of . Both the mean function and the distribution of are assumed unknown.
Model (2) has been extensively studied using residualbased and differencebased methods; see, among many others, Von Neumann [1941], Von Neumann [1942], Rice [1984], Gasser et al. [1986], Hall et al. [1990], Hall and Marron [1990], Thompson et al. [1991], Wang et al. [2008]. A related functional estimation problem has also been studied in semiparametric models [Robins et al., 2008, 2009]. Most of the previous studies focus on the case of fixed design, especially the equidistant design with , , for which the minimax rate of estimating under an Hölder smoothness constraint on is known to be (cf. Theorems 1 and 2 in Wang et al. [2008]).
In detail, let be a fixed (possibly infinite) interval on the real line. Define the Hölder class on as follows:
(6)  
where is the largest integer strictly smaller than and . Denote the support of as .
Define the joint distribution class
(where “cv” stands for “constant variance”) with the following standard conditions:
satisfies .

has density and there exists a fixed positive constant such that

There exist two fixed constants and such that for any , there exists a set such that
where represents the Lebesgue measure on the real line, and .

for some fixed positive constant .
The lower bound in Condition (c) above is posed on since the estimator constructed below solely requires a sufficient number of close pairs. In addition, Condition (c) essentially requires the density to be “dense” around , and is strictly weaker than a uniform lower bound of over a fixed neighborhood of . Moreover, no smoothness condition is placed on the density of .
The rest of the section is devoted to proving the following minimax rate:
(7) 
where denotes the joint distribution of , and ranges over all estimators of .
2.1 Upper bound
The upper bound is achieved by a difference estimator based on Ustatistics (with convention ):
(8) 
Here, , where is a bandwidth parameter satisfying as , and is a symmetric density kernel supported on that satisfies
(9) 
for two fixed constants and ; one example is the box kernel which satisfies (9) with .
The following error bound is derived via the exponential inequality for degenerate Ustatistics due to Giné et al. [2000].
2.2 Lower bound
The derivation of the lower bound in (7) is much more involved. In particular, the construction in the fixed design setting (cf. Theorem 2 in Wang et al. [2008]) cannot be extended to the random design case, since the spiketype construction of located at each deterministic design point leads to a suboptimal rate in the random design setting. To achieve a sharp rate, we have to exploit the randomness of ; this requires us to handle a highly convoluted alternative hypothesis that no longer leads to a product measure of given each realization of in LeCam’s twopoint method. This calls for a careful analysis of the locations of .
We now sketch a proof of the component in (7) for , with a particular emphasis on where the difference arises with the fixed design setting. The proof can be roughly divided into two steps. In the first step, we construct a twopoint testing problem with the null being a Gaussian () and the alternative a Gaussian location mixture (). In the second step, we approximate the Gaussian location mixture () by a location mixture with compact support (), which, unlike the alternative in the first step, belongs to the considered model class.
We start by introducing the construction of , , , and under the null and the alternative in the first step. For each , let
and divide the unit interval into intervals of length , with large enough and chosen such that is a positive integer.

Choice of : Under , let . Under , let be a piecewisetrapezoidal function on the intervals. That is, for each , takes on a value of on the intervals and then linearly decreases to zero on the two endpoints and , with i.i.d. standard normal variables.

Choice of : Under , let . Under , let .

Choice of : Under both and , let .

Choice of : Under both and , let
be uniformly distributed over the union of the upper bases of the trapezoids, that is, over
.
See Figure 1 for an illustration of the construction.
In contrast to the spiketype construction of in the fixed design setting, our construction is trapezoidshaped, which guarantees a maximal variation in the mean to compensate for the difference in the variance under the null and alternative. This is unnecessary in the fixed design setting since the point of maximal variation in the mean (center of each spike) can be directly placed at each fixed , resulting in evenly spaced spikes as the construction of .
Under the above construction, conditioning on , are distributed as
and
where is the location index sequence of defined as
Denote the joint distribution of under and by and , respectively. With the aid of Lemma 2 that will be stated in Section 5, one can then upper bound
which is smaller than a sufficiently small constant by choosing sufficiently small.
The second step of the proof aims to find a sequence of compactly supported variables to replace the standard normal sequence in , so that for each realization of , the corresponding in the alternative is Hölder smooth with a fixed constant. Then, denoting the distribution of such as , one wishes to approximate the conditional distribution in by
in
. Even with the aid of moment matching techniques already established in the literature, upper bounding
is still nontrivial. Specifically, unlike in the fixed design setting, now with high probability the conditional distribution of given is no longer a product measure. This is because multiple ’s could fall into the same trapezoid in the construction of . This can be handled relatively easily in the first step since there we only have to analyze the pairwise correlation of and depending on whether and fall into the same trapezoid, but it is much less tractable in the second step. More specifically, in order to match moments, we now have to divide the ’s into groups based on their memberships among the trapezoids, which naturally requires us to monitor the locations of , and in particular the number of ’s that fall into the same trapezoid. This is possible by observing that the memberships of now follow a sparse multinomial distribution ( bins, balls) so that a result in Kolchin et al. [1978] can be applied. This allows us to show that with high probability the maximum number of ’s in each trapezoid is bounded by a fixed constant, which, along with Lemma 1 in Section 5, allows us to calculatefor . This indicates that is smaller than some sufficiently small constant . Then, by the triangle inequality,
Details of the above derivation will be given in Section 5. The resulting lower bound is as follows.
Theorem 2.
Under (2) with random design, it holds that
where is some fixed positive constant that only depends on and in , and ranges over all estimators of .
3 Heteroscedastic case
We now study the heteroscedastic model (1),
where are i.i.d. copies of on the real line, and are  and Hölder smooth on the fixed (possibly infinite) interval , respectively, and are i.i.d. copies of with zero mean and unit variance and are independent of . As in Section 2, smoothness indices and are assumed known, while , and the distribution of are unknown. For any estimator , the estimation accuracy is measured both locally via
(11) 
at a point in the support of , , and globally via
(12) 
with the distribution of .
Model (1) has been studied in, for example, Muller and Stadtmuller [1987], Hall and Carroll [1989], Ruppert et al. [1997], Härdle and Tsybakov [1997], Fan and Yao [1998], Wang et al. [2008], with a focus on the fixed design case. Theorems 1 and 2 in Wang et al. [2008] established a minimax rate of the order under equidistance design , when and are  and Hölder smooth on [0,1].
Define (where “vf” stands for “variance function”) with the following conditions:

satisfies .

has density , and there exists a fixed positive constant such that

There exist fixed positive constants and such that
where is the Lebesgue measure on the real line.

for some fixed positive constant .
One can readily verify that , with the latter defined in the beginning of Section 2. Compared to , Condition (c) in is posed on the marginal density and support of , since in the variance function case we require a sufficient number of close pairs around each target . We also note that, as in , no smoothness assumption is posed on the design density in .
The rest of the section is devoted to proving the following minimax rates
(13)  
where denotes the joint distribution of , and ranges over all estimators of .
3.1 Upper bound
We now propose an estimator of for some fixed by combining pairwise differences with local polynomial regression. We first introduce some notation. Let be the largest integer strictly smaller than and
For any , define
where are two bandwidths. Define an matrix
and as its adjugate such that . For example, when , we have
where
Following Fan [1993], we propose the following robust local polynomial estimator:
(14) 
where is some sufficiently small positive constant that decays to polynomially with . Let
Then, it holds that , , and
(15) 
The last property (15) is referred to as the reproducing property of local polynomial estimators (cf. Proposition 1.12 in Tsybakov [2009]).
Comments
There are no comments yet.