As a flexible modeling tool, smoothing splines provide a general framework for statistical analysis in a variety of fields; see W90 , W11 , G02 . The asymptotic studies on smoothing splines in the literature focus primarily on the estimation performance, and in particular the global convergence. However, in practice it is often of great interest to conduct asymptotic inference on the unknown functions. The procedures for inference developed in this article, together with their rigorously derived asymptotic properties, fill this long-standing gap in the smoothing spline literature.
As an illustration, consider two popular nonparametric regression models: (i) normal regression: for some unknown
; (ii) logistic regression:. The function is assumed to be smooth in both models. Our goal in this paper is to develop asymptotic theory for constructing pointwise confidence intervals and simultaneous confidence bands for , testing on the value of at any given point , and testing whether satisfies certain global properties such as linearity. Pointwise confidence intervals and tests on a local value are known as local inference. Simultaneous confidence bands and tests on a global property are known as global inference. To the best of our knowledge, there has been little systematic and rigorous theoretical study of asymptotic inference. This is partly because of the technical restrictions of the widely used equivalent kernel method. The functional Bahadur representation (FBR) developed in this paper makes several important contributions to this area. Our main contribution is a set of procedures for local and global inference for a univariate smooth function in a general class of nonparametric regression models that cover both the aforementioned cases. Moreover, we investigate issues relating to optimality and efficiency that have not been treated in the existing smoothing spline literature.
The equivalent kernel has long been used as a standard tool for handling the asymptotic properties of smoothing spline estimators, but this method is restricted to least square regression; see S84 , MG93 . Furthermore, the equivalent kernel only “approximates” the reproducing kernel function, and the approximation formula becomes extremely complicated when the penalty order increases or the design points are nonuniform. To analyze the smoothing spline estimate in a more effective way, we employ empirical process theory to develop a new technical tool, the functional Bahadur representation, which directly handles the “exact” reproducing kernel, and makes it possible to study asymptotic inference in a broader range of nonparametric models. An immediate consequence of the FBR is the asymptotic normality of the smoothing spline estimate. This naturally leads to the construction of pointwise asymptotic confidence intervals (CIs). The classical Bayesian CIs in the literature W83 , N88
are valid on average over the observed covariates. However, our CIs are proved to be theoretically valid at any point, and they even have shorter lengths than the Bayesian CIs. We next introduce a likelihood ratio method for testing the local value of a regression function. It is shown that the null limiting distribution is a scaled Chi-square with one degree of freedom, and that the scaling constant converges to one as the smoothness level of the regression function increases. Therefore, we have discovered an interesting Wilks phenomenon (meaning that the asymptotic null distribution is free of nuisance parameters) arising from the proposed nonparametric local testing.
Procedures for global inference are also useful. Simultaneous confidence bands (SCBs) accurately depict the global behavior of the regression function, and they have been extensively studied in the literature. However, most of the efforts were devoted to simple regression models with additive Gaussian errors based on kernel or local polynomial estimates; see Har89 , SL94 , CK03 , FZ00 , ZP10 . By incorporating the reproducing kernel Hilbert space (RKHS) theory into BR73 , we obtain an SCB applicable to general nonparametric regression models, and we demonstrate its theoretical validity based on strong approximation techniques. To the best of our knowledge, this is the first SCB ever developed for a general nonparametric regression model in smoothing spline settings. We further demonstrate that our SCB is optimal in the sense that the minimum width of the SCB achieves the lower bound established by GW08 . Model assessment is another important aspect of global inference. Fan et al. FZZ01 used local polynomial estimates for testing nonparametric regression models, namely the generalized likelihood ratio test (GLRT). Based on smoothing spline estimates, we propose an alternative method called the penalized likelihood ratio test (PLRT), and we identify its null limiting distribution as nearly Chi-square with diverging degrees of freedom. Therefore, the Wilks phenomenon holds for the global test as well. More importantly, we show that the PLRT achieves the minimax rate of testing in the sense of I93 . In comparison, other smoothing-spline-based tests such as the locally most powerful (LMP) test, the generalized cross validation (GCV) test and the generalized maximum likelihood ratio (GML) test (see CKWY88 , W90 , J96 , C94 , RG00 , LW02 ) either lead to complicated null distributions with nuisance parameters or are not known to be optimal.
As a by-product, we derive the asymptotic equivalence of the proposed procedures based on periodic and nonperiodic smoothing splines under mild conditions; see Remark 5.1. In general, our findings reveal an intrinsic connection between the two rather different basis structures, which in turn facilitates the implementation of inference.
Our paper is mainly devoted to theoretical studies. However, a few practical issues are noteworthy. GCV is currently used for the empirical tuning of the smoothing parameter, and it is known to result in biased estimates if the true function is spatially inhomogeneous with peaks and troughs. Moreover, the penalty order is prespecified rather than data-driven. Future research is needed to develop an efficient method for choosing a suitable smoothing parameter for bias reduction and an empirical method for quantifying the penalty order through data. We also note that some of our asymptotic procedures are not fully automatic since certain quantities need to be estimated; see Example 1. A large sample size may be necessary for the benefits of our asymptotic methods to become apparent. Finally, we want to mention that extensions to more complicated models such as multivariate smoothing spline models and semiparametric models are conceptually feasible by applying similar FBR techniques and likelihood-based approaches.
The rest of this paper is organized as follows. Section 2 introduces the basic notation, the model assumptions, and some preliminary RKHS results. Section 3 presents the FBR and the local asymptotic results. In Sections 4 and 5, several procedures for local and global inference together with their theoretical properties are formally discussed. In Section 6, we give three concrete examples to illustrate our theory. Numerical studies are also provided for both periodic and nonperiodic splines. The proofs are included in an online supplementary document SC12 .
2.1 Notation and assumptions
Suppose that the data , , are i.i.d. copies of , where
is the response variable,is the covariate variable and . Consider a general class of nonparametric regression models under the primary assumption
where is some unknown smooth function and is a known link function. This framework covers two subclasses of statistical interest. The first subclass assumes that the data are modeled by for a conditional distribution unknown up to
. Instead of assuming known distributions, the second subclass specifies the relation between the conditional mean and variance as, where is a known positive-valued function. The nonparametric estimation of in the second situation uses the quasi-likelihood as an objective function (see W74 ), where . Despite distinct modeling principles, the two subclasses have a large overlap since coincides with under many common combinations of ; see Table 2.1 of MN89 .
From now on, we focus on a smooth criterion function that covers the above two cases, that is, or . Throughout this paper, we define the functional parameter space as the th-order Sobolev space:
where is assumed to be known and larger than . With some abuse of notation, may also refer to the homogeneous subspace of . The space is also known as the class of periodic functions such that a function has the additional restrictions for . Let . Consider the penalized nonparametric estimate :
where is the roughness penalty and is the smoothing parameter, which converges to zero as . We use (rather than ) to simplify future expressions. The existence and uniqueness of are guaranteed by Theorem 2.9 of G02 when the null space is finite dimensional and is concave and continuous w.r.t. .
We next assume a set of model conditions. Let be the range of , which is obviously compact. Denote the first-, second- and third-order derivatives of w.r.t. by , and , respectively. We assume the following smoothness and tail conditions on :
(a) is three times continuously differentiable and concave w.r.t. . There exists a bounded open interval and positive constants and s.t.
(b) There exists a positive constant such that a.s.
(c) satisfies and a.s.
Assumption 2.1(a) implies the slow diverging rate of
where , and . Hence, Assumption 2.1(a) holds if , , , are all bounded over and
Assumption 2.1(c) follows from the definition of . The sub-exponential tail condition (6) and the boundedness condition (7) are very mild quasi-likelihood model assumptions (e.g., also assumed in MVG97 ). The assumption that and are both bounded over could be restrictive and can be removed in many cases, such as the binary logistic regression model, by applying empirical process arguments similar to those in Section 7 of MVG97 .
2.2 Reproducing kernel Hilbert space
The reproducing kernel defined on is known to have the following property:
Obviously, is symmetric with . We further introduce a positive definite self-adjoint operator such that
for any . Let . Then and , where denotes the identity operator.
Next, we assume that there exists a sequence of basis functions in the space that simultaneously diagonalizes the bilinear forms and
. Such eigenvalue/eigenfunction assumptions are typical in the smoothing spline literature, and they are critical to control the local behavior of the penalized estimates. Hereafter, we denote positive sequencesand as if they satisfy . If , we write . Let denote the sum over for convenience. Denote the sup-norm of as .
There exists a sequence of eigenfunctions satisfying , and a nondecreasing sequence of eigenvalues such that
where is the Kronecker’s delta. In particular, any admits a Fourier expansion with convergence in the -norm.
For any and , we have , and under Assumption 2.2.
For future theoretical derivations, we need to figure out the underlying eigensystem that implies Assumption 2.2. For example, when and , Assumption 2.2 is known to be satisfied if is chosen as the trigonometric polynomial basis specified in (62) of Example 6. For general with , Proposition 2.2 below says that Assumption 2.2 is still valid if is chosen as the (normalized) solution of the following equations:
where is the marginal density of the covariate . Proposition 2.2 can be viewed as a nontrivial extension of U88 , which assumes . The proof relies substantially on the ODE techniques developed in Birk1908 , Stone1926 . Let be the class of the th-order continuously differentiable functions over .
Finally, for later use we summarize the notation for Fréchet derivatives. Let , for . The Fréchet derivative of can be identified as
Note that and can be expressed as
The Fréchet derivative of is denoted . These derivatives can be explicitly written as [or ].
Define , and , where . Since for any , we have the following result:
, where is the identity operator on .
3 Functional Bahadur representation
In this section, we first develop the key technical tool of this paper: functional Bahadur representation, and we then present the local asymptotics of the smoothing spline estimate as a straightforward application. In fact, FBR provides a rigorous theoretical foundation for the procedures for inference developed in Sections 4 and 5.
3.1 Functional Bahadur representation
We first present a relationship between the norms and in Lemma 3.1 below, and we then derive a concentration inequality in Lemma 3.2 as the preliminary step for obtaining the FBR. For convenience, we denote as .
There exists a constant s.t. for any and . In particular, is not dependent on the choice of and . Hence, .
Define , where is specified in Lemma 3.1. Recall that denotes the domain of the full data variable . We now prove a concentration inequality on the empirical process defined, for any and as
where is a real-valued function (possibly depending on ) defined on .
Suppose that satisfies the following Lipschitz continuity condition:
where is specified in Lemma 3.1. Then we have
To obtain the FBR, we need to further assume a proper convergence rate for :
Simple (but not necessarily the weakest) sufficient conditions for Assumption 3.1 are provided in Proposition 3.3 below. Before stating this result, we introduce another norm on the space , that is, more commonly used in functional analysis. For any , define
When , is a type of Sobolev norm dominating defined in (8). Denote
Note that is known as the optimal order when we estimate .
Classical results on rates of convergence are obtained through either linearization techniques, for example, CO90 , or quadratic approximation devices, for example, GQ93 , G02 . However, the proof of Proposition 3.3 relies on empirical process techniques. Hence, it is not surprising that Proposition 3.3 requires a different set of conditions than those used in CO90 , GQ93 , G02 , although the derived convergence rates are the same and in all approaches the optimal rate is achieved when . For example, Cox and O’Sullivan CO90 assumed a weaker smoothness condition on the likelihood function but a more restrictive condition on , that is, for some .
Now we are ready to present the key technical tool: functional Bahadur representation, which is also of independent interest. Shang S10 developed a different form of Bahadur representation, which is of limited use in practice. This is due to the intractable form of the inverse operator , constructed based on a different type of Sobolev norm. However, by incorporating into the norm (8), we can show based on Proposition 2.3, and thus obtain a more refined version of the representation of S10 that naturally applies to our general setting for inference purposes.
3.2 Local asymptotic behavior
In this section, we obtain the pointwise asymptotics of as a direct application of the FBR. The equivalent kernel method may be used for this purpose, but it is restricted to regression, for example, S84 . However, the FBR-based proof applies to more general regression. Notably, our results reveal that several well-known global convergence properties continue to hold locally.
Theorem 3.5 ((General regression))
From Theorem 3.5, we immediately obtain the following result.
To illustrate Corollary 3.6 in detail, we consider regression in which (also ) has an explicit expression under the additional boundary conditions:
In fact, (24) is also the price we pay for obtaining the boundary results, that is, . However, (24) could be too strong in practice. Therefore, we provide an alternative set of conditions in (3.7) below, which can be implied by the so-called “exponential envelope condition” introduced in N95 . In Corollary 3.7 below, we consider two different cases: and .
Corollary 3.7 (( regression))
Suppose satisfies the boundary conditions (24). If , then we have, for any ,
If for some , then we have, for any ,
We note that in (25) the asymptotic bias is proportional to , and the asymptotic variance can be expressed as a weighted sum of squares of the (infinitely many) terms ; see (21). These observations are consistent with those in the polynomial spline setting insofar as the bias is proportional to , and the variance is a weighted sum of squares of ( finitely many) terms involving the normalized B-spline basis functions evaluated at ; see ZSW98 . Furthermore, (26) describes how to remove the asymptotic bias through undersmoothing, although the corresponding smoothing parameter yields suboptimal estimates in terms of the convergence rate.
The existing smoothing spline literature is mostly concerned with the global convergence properties of the estimates. For example, Nychka N95 and Rice and Rosenblatt RR83 derived global convergence rates in terms of the (integrated) mean squared error. Instead, Theorem 3.5 and Corollaries 3.6 and 3.7 mainly focus on local asymptotics, and they conclude that the well-known global results on the rates of convergence also hold in the local sense.
4 Local asymptotic inference
4.1 Pointwise confidence interval
We consider the confidence interval for some real-valued smooth function of at any fixed , denoted , for example, in logistic regression. Corollary 3.6 together with the Delta method immediately implies Proposition 4.1 on the pointwise CI where the asymptotic estimation bias is assumed to be removed by undersmoothing.
Proposition 4.1 ((Pointwise confidence interval))
From now on, we focus on the pointwise CI for and compare it with the classical Bayesian confidence intervals proposed by Wahba W83 and Nychka N88 . For simplicity, we consider , and under which Proposition 4.1 implies the following asymptotic 95% CI for :
) is the first rigorously proven pointwise CI for smoothing spline. It is well known that the Bayesian type CI only approximately achieves the 95% nominal level on average rather than pointwise. Specifically, its average coverage probability over the observed covariates isnot exactly 95% even asymptotically. Furthermore, the Bayesian type CI ignores the important issue of coverage uniformity across the design space, and thus it may not be reliable if only evaluated at peaks or troughs, as pointed out in N88 . However, the asymptotic CI (29) is proved to be valid at any point, and is even shorter than the Bayesian CIs proposed in W83 , N88 .
As an illustration, we perform a detailed comparison of the three CIs for the special case . We first derive the asymptotically equivalent versions of the Bayesian CIs. Wahba W83
proposed the following heuristic CI under a Bayesian framework:
where and , and showed that
see his equation (2.3) and the Appendix. Hence, we have
Therefore, Nychka’s Bayesian CI (32) is asymptotically equivalent to
In view of (31) and (36), we find that the Bayesian CIs of Wahba and Nychka are asymptotically 15.4% and 6.1%, respectively, wider than (29). Meanwhile, by (35) we find that (29) turns out to be a corrected version of Nychka’s CI (32) by removing the random bias term . The inclusion of in (32) might be problematic in that (i) it makes the pointwise limit distribution nonnormal and thus leads to biased coverage probability; and (ii) it introduces additional variance, which unnecessarily increases the length of the interval. By removing , we can achieve both pointwise consistency and a shorter length. Similar considerations apply when . Furthermore, the simulation results in Example 6 demonstrate the superior performance of our CI in both periodic and nonperiodic splines.
4.2 Local likelihood ratio test
In this section, we propose a likelihood ratio method for testing the value of at any . First, we show that the null limiting distribution is a scaled noncentral Chi-square with one degree of freedom. Second, by removing the estimation bias, we obtain a more useful central Chi-square limit distribution. We also note that as the smoothness order approaches infinity, the scaling constant eventually converges to one. Therefore, we have unveiled an interesting Wilks phenomenon arising from the proposed nonparametric local testing. A relevant study was conducted by Banerjee B07 , who considered a likelihood ratio test for monotone functions, but his estimation method and null limiting distribution are fundamentally different from ours.
For some prespecified point , we consider the following hypothesis:
The “constrained” penalized log-likelihood is defined as , where . We consider the likelihood ratio test (LRT) statistic defined as
where is the MLE of under the local restriction, that is,
Endowed with the norm , is a closed subset in , and thus a Hilbert space. Proposition 4.2 below says that also inherits the reproducing kernel and penalty operator from . The proof is trivial and thus omitted.
(a) Recall that is the reproducing kernel for under . The bivariate function