Specification testing in semi-parametric transformation models

In transformation regression models the response is transformed before fitting a regression model to covariates and transformed response. We assume such a model where the errors are independent from the covariates and the regression function is modeled nonparametrically. We suggest a test for goodness-of-fit of a parametric transformation class based on a distance between a nonparametric transformation estimator and the parametric class. We present asymptotic theory under the null hypothesis of validity of the semi-parametric model and under local alternatives. A bootstrap algorithm is suggested in order to apply the test. We also consider relevant hypotheses to distinguish between large and small distances of the parametric transformation class to the `true' transformation.

Authors

• 3 publications
• 6 publications
• 15 publications
01/09/2019

Tests for validity of the semiparametric heteroskedastic transformation model

There exist a number of tests for assessing the nonparametric heterosced...
02/07/2020

Distribution free testing for linear regression. Extension to general parametric regression

Recently a distribution free approach for testing parametric hypotheses ...
04/16/2018

Semi-parametric transformation boundary regression models

In the context of nonparametric regression models with one-sided errors,...
11/18/2019

Does Regression Approximate the Influence of the Covariates or Just Measurement Errors? A Model Validity Test

A criterion is proposed for testing hypothesis about the nature of the e...
12/20/2017

Transformation Models in High-Dimensions

Transformation models are a very important tool for applied statistician...
04/28/2022

On the Use of L-functionals in Regression Models

In this paper we survey and unify a large class or L-functionals of the ...
04/04/2020

Estimation of the Transformation Function in Fully Nonparametric Transformation Models with Heteroscedasticity

Completely nonparametric transformation models with heteroscedastic erro...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

It is very common in applications to transform data before investigation of functional dependence of variables by regression models. The aim of the transformation is to obtain a simpler model, e.g. with a specific structure of the regression function, or a homoscedastic instead of a heteroscedastic model. Typically flexible parametric classes of transformations are considered from which a suitable one is selected data-dependently. A classical example is the class of Box-Cox power transformations (see

Box and Cox (1964)). For purely parametric transformation models see Carroll and Ruppert (1988) and references therein. Powell (1991) and Mu and He (2007)

consider transformation quantile regression models. Nonparametric estimation of the transformation in the context of parametric regression models has been considered by

Horowitz (1996) and Chen (2002), among others. Horowitz (2009) reviews estimation in transformation models with parametric regression in the cases where either the transformation or the error distribution or both are modeled nonparametrically. Linton et al. (2008) suggest a profile likelihood estimator for a parametric class of transformations, while the error distribution is estimated nonparametrically and the regression function semi-parametrically. Heuchenne et al. (2015) suggest an estimator for the error distribution in the same model. Neumeyer et al. (2016)

consider profile likelihood estimation in heteroscedastic semi-parametric transformation regression models, i.e. the mean and variance function are modeled nonparametrically, while the transformation function is chosen from a parametric class. A completely nonparametric (homoscedastic) model is considered by

Chiappori et al. (2015). Their approach was modified and corrected by Colling and Van Keilegom (2019). The version of the nonparametric transformation estimator considered in the latter paper was then applied by Colling and Van Keilegom (2018) to suggest a new estimator of the transformation parameter if it is assumed that the transformation belongs to a parametric class.

In general asymptotic theory for nonparametric transformation estimators is sophisticated and parametric transformation estimators show much better performance if the parametric model is true. A parametric transformation will thus lead to better estimates of the regression function. Moreover, parametric transformations are easier to interpret and allow for subsequent inference in the transformation model. For the latter purpose note that for transformation models with parametric transformation lack-of-fit tests for the regression function as well as tests for significance for covariate components have been suggested by Colling and Van Keilegom (2016), Colling and Van Keilegom (2017), Allison et al. (2018) and Kloodt and Neumeyer (2017). Those test cannot straightforwardly generalized to nonparametric transformation models because known estimators in that model do not allow for uniform rates of convergence over the whole real line, see Chiappori et al. (2015) and Colling and Van Keilegom (2019).

However, before applying a transformation model with parametric transformation it would be appropriate to test the goodness-of-fit of the parametric transformation class. In the context of parametric quantile regression, Mu and He (2007) suggest such a goodness-of-fit test. In the context of nonparametric mean regression Neumeyer et al. (2016) develop a goodness-of-fit test for the parametric transformation class based on an empirical independence process of pairs of residuals and covariates. The latter approach was modified by Hušková et al. (2018)

, who applied empirical characteristic functions. In a linear regression model with transformation of the response

Szydłowski (2017) suggests a goodness-of-fit test for the parametric transformation class that is based on a distance between the nonparametric transformation estimator considered by Chen (2002)

and the parametric class. We will follow a similar approach but consider a nonparametric regression model. The aim of the transformations we consider is to induce independence between errors and covariates. The null hypothesis is that the unknown transformation belongs to a parametric class. Note that when applied to the special case of a class of transformations that contains as only element the identity, our test provides indication on whether a classical homoscedastic regression model (without transformation) is appropriate or whether first the response should be transformed. Our test statistic is based on a minimum distance between a nonparametric transformation and the parametric transformations. We present the asymptotic distribution of the test statistic under the null hypothesis of a parametric transformation and under local alternatives of

-rate. Under the null hypothesis the limit distribution is that of a degenerate U-statistic. With a flexible parametric class applying an appropriate transformation can reduce the dependence enormously, even if the ‘true’ transformation does not belong to the class. Thus for the first time in the context of transformation goodness-of-fit tests we suggest a test for so-called precise or relevant hypotheses. Here the null hypothesis is that the distance between the true transformation and the parametric class is large. If this hypothesis is rejected, then the model with the parametric transformation fits well enough to be considered for further inference. Under the new null hypothesis the test statistic is asymptotically normally distributed. Throughout we assume that the nonparametric transformation estimator fulfills an asymptotic linear expansion. It is then shown that the estimator considered by

Colling and Van Keilegom (2019) fulfills this expansion and thus can be used for evaluating the test statistic.

The remainder of the paper is organized as follows. In Section 2 we present the model and the test statistic. Asymptotic distributions under the null hypothesis of a parametric transformation class and under local alternatives are presented in Section 3, which also contains a consistency result and asymptotic results under relevant hypotheses. Section 4 presents a bootstrap algorithm and a simulation study. Appendix A contains assumptions, while Appendix B treats a specific nonparametric transformation estimator and shows that it fulfills the required conditions. The proofs of the main results are given in Appendix C. A supplement contains a rigorous treatment of bootstrap asymptotics.

2 The model and test statistic

Assume we have observed , , which are independent with the same distribution as that fulfill the transformation regression model

 h(Y)=g(X)+ε, (2.1)

where holds and is independent of the covariate , which is -valued, while is univariate. The regression function will be modelled nonparametrically. The transformation

is strictly increasing. Throughout we assume that, given the joint distribution of

and some identification conditions, there exists a unique transformation such that this model is fulfilled. It then follows that the other model components are identified via and . See Chiappori et al. (2015) for conditions under which the identifiability of holds. In particular conditions are required to fix location and scale and we will assume throughout that

 h(0)=0andh(1)=1. (2.2)

Now let be a class of strictly increasing parametric transformation functions , where is a finite dimensional parameter space. Our purpose is to test whether a semi-parametric transformation model holds, i.e.

 Λθ0(Y)=~g(X)+~ε,

for some parameter , where and are independent. Due to the assumed uniqueness of the transformation one obtains under validity of the semi-parametric model, where

 h0(⋅)=Λθ0(⋅)−Λθ0(0)Λθ0(1)−Λθ0(0).

Thus we can write the null hypothesis as

 H0: h∈{Λθ(⋅)−Λθ(0)Λθ(1)−Λθ(0):θ∈Θ} (2.3)

which thanks to (2.2) can be formulated equivalently as

 H0: h∈{Λθ(⋅)−c2c1:θ∈Θ,c1∈R+,c2∈R}. (2.4)

Our test statistics will be based on the following -distance

 d(Λθ,h) = (2.5)

where is a positive weight function with compact support . Its empirical counterpart is

 dn(Λθ,^h):=minc1∈C1,c2∈C21nn∑j=1w(Yj){^h(Yj)c1+c2−Λθ(Yj)}2,

where denotes a nonparametric estimator of the true transformation as discussed below, and , are compact sets. Assumption 6 in Appendix A assures that the sets are large enough to contain the true values. The test statistic is defined as

 Tn=minθ∈Θdn(Λθ,^h) (2.6)

and the null hypothesis should be rejected for large values of the test statistic. We will derive the asymptotic distribution under the null hypothesis and local and fixed alternatives in Section 3 and suggest a bootstrap version of the tests in Section 4.

Remark 2.1.

Colling and Van Keilegom (2019) consider the estimator

 ^θ:=argminθ∈Θdn(Λθ,^h)

for the parametric transformation (assuming ) and observe that outperforms the version without minimization over , i.e.  in simulations.

Nonparametric estimation of the transformation has been considered by Chiappori et al. (2015) and Colling and Van Keilegom (2019). For our main asymptotic results we need that has a linear expansion, not only under the null hypothesis, but also under fixed alternatives and the local alternatives as defined in the next section. The linear expansion should have the form

 ^h(y)−h(y)=1nn∑i=1ψ(Zi,T(y))+oP(n−1/2) uniformly in y∈Yw. (2.7)

Here, needs to fulfil condition 8 in Appendix A and we use the definitions ()

 Zi = (Ui,Xi),Ui=T(Yi),T(y)=FY(y)−FY(0)FY(1)−FY(0), (2.8)

where denotes the distribution of and is assumed to be strictly increasing on the support of . To ensure that is well defined the values and are w.l.o.g. assumed to belong to the support of , but can be replaced by arbitrary values (in the support of ). The expansion (2.7) could also be formulated with a linear term . In Appendix B we reproduce the definition of the estimator that was suggested by Colling and Van Keilegom (2019) as modification of the estimator by Chiappori et al. (2015). We give regularity assumptions under which the desired expansion holds, see Lemma B.2. Other nonparametric estimators for the transformation that fulfill the expansion could be applied as well.

3 Asymptotic results

In this section we will derive the asymptotic distribution under the null hypothesis and under local and fixed alternatives. For the formulation of the local alternatives consider the null hypothesis as given in (2.4), i.e.  for some , , , and instead assume

 H1,n:h(⋅)c1+c2=Λθ0(⋅)+n−1/2r(⋅)% for some θ0∈Θ,c1∈R+,c2∈R% and some function r.

Due to the identifiability conditions (2.2) one obtains and . Assumption 5 yields boundedness of , so that we rewrite the local alternative as

 h(⋅) = Λθ0(⋅)−Λθ0(0)+n−1/2(r(⋅)−r(0))Λθ0(1)−Λθ0(0)+n−1/2(r(1)−r(0)) (3.1) = h0(⋅)+n−1/2r0(⋅)+o(n−1/2),

where and

 r0(⋅) =r(⋅)−r(0)−h0(⋅)(r(1)−r(0))Λθ0(1)−Λθ0(0).

Note that the null hypothesis is included in the local alternative by considering which gives . We assume the following data generating model under the local alternative . Let the regression function , the errors and the covariates be independent of and define (), which under local alternatives depends on through the transformation . Throughout we use the notation ()

 Si=h(Yi)=g(Xi)+εi. (3.2)

Further, recall the definition of in (2.8). Note that the distribution of does not depend on , even under local alternatives, because

, while due to (2.2), and similarly .

To formulate our main result we need some more notations. For notational convenience, define , which is assumed to be compact (see 1 in Appendix A). Then, note that

 Tn=minγ=(c1,c2,θ)∈Υn∑j=1w(Yj){^h(Yj)c1+c2−Λθ(Yj)}2.

Further, with from (2.8) and from (3.2) define ()

 ˙Λθ(y) = (∂∂θkΛθ(y))k=1,...,dΘ R(s) = (s,1,−˙Λθ0(h−10(s)))t (3.3) Γ0 = E[w(h−10(S1))R(S1)R(S1)t] (3.4) φ(z) = E[w(h−10(S2))ψ(Z1,U2)R(S2)∣Z1=z] (3.5) ζ(z1,z2) = E[w(h−10(S3)){ψ(Z1,U3)−φ(Z1)tΓ−10R(S3)} (3.6) ×{ψ(Z2,U3)−φ(Z2)tΓ−10R(S3)}∣Z1=z1,Z2=z2] ¯r(s) = r0(h−10(s))−E[w(h−10(S1))r0(h−10(S1))R(S1)]tΓ−10R(s) (3.7) ~ζ(z1) = 2E[w(h−10(S2))ψ(Z1,U2)¯r(S2)∣Z1=z] (3.8)

and let and denote the law and distribution function, respectively, of .

Theorem 3.1.

Assume 18 given in Appendix A. Let

be the eigenvalues of the operator

 Kρ(z1):=∫ρ(z2)ζ(z1,z2)dFZ(z2)

with corresponding eigenfunctions

, which are orthonormal in the -space corresponding to the distribution . Let

be independent and standard normally distributed random variables and let

be centred normally distributed with variance such that for all

the random vector

follows a multivariate normal distribution with for all . Then, under the local alternative , converges in distribution to

 (Λθ0(1)−Λθ0(0))2(∞∑k=1λkW2k+W0+E[w(h−10(S1))¯r(S1)2]).

In particular, under (i.e. for ), converges in distribution to

 T=(Λθ0(1)−Λθ0(0))2∞∑k=1λkW2k.

The proof is given in Appendix C. An asymptotic level- test should reject if is larger than the -quantile of the distribution of . As the distribution of depends in a complicated way on unknown quantities, we will propose a bootstrap procedure in Section 4.

Remark 3.2.

Note that with

 I(z):=w(h−10(S1))1/2(ψ(z,U1)−φ(z)tΓ−10R(S1)).

Thus, the operator defined in Theorem 3.1 is positive semi-definite.

Next we consider fixed alternatives of a transformation that do not belong to the parametric class, i. e.

 H1:d(h,Λθ)>0forallθ∈Θ.
Theorem 3.3.

Assume 14, 1 and let estimate uniformly consistently on compact sets. Then, under , for all , that is, the proposed test is consistent.

The proof is given in Appendix C. The transformation model with a parametric transformation class might be useful in applications even if the model does not hold exactly. With a good choice of applying the transformation can reduce the dependence between covariates and errors enormously. Estimating an appropriate is much easier than estimating the transformation nonparametrically. Consequently, one might prefer the semiparametric transformation model over a completely nonparametric one. It is then of interest how far away we are from the true model. Therefore, in the following we consider testing precise hypotheses (relevant hypotheses)

 H′0:minθ∈Θd(h,Λθ)≥ηandH′1:minθ∈Θd(h,Λθ)<η.

If a suitable test rejects for some small (fixed beforehand by the experimenter) the model is considered “good enough” to work with, even if it does not hold exactly. To test those hypotheses we will use the same test statistic as before, but we have to standardize differently. Assume , then is a transformation which does not belong to the parametric class, i.e. the former fixed alternative holds. Let

 M(γ)=M(c1,c2,θ)=E{w(Y)(h(Y)c1+c2−Λθ(Y))2},

and let

 γ0=(c1,0,c2,0,θ0):=argmin(c1,c2,θ)∈ΥM(c1,c2,θ).

Note that for all . Assume that

 Γ′=E⎡⎢ ⎢ ⎢⎣w(Y1)⎛⎜ ⎜ ⎜⎝h(Y1)2h(Y1)−h(Y1)˙Λθ0(Y1)h(Y1)1−˙Λθ0(Y1)−h(Y1)˙Λθ0(Y1)t−˙Λθ0(Y1)tΓ′3,3⎞⎟ ⎟ ⎟⎠⎤⎥ ⎥ ⎥⎦ (3.9)

is positive definite, where with

 ¨Λθ(y)=(∂2∂θk∂θℓΛθ(y))k,ℓ=1,...,dΘ

and .

Theorem 3.4.

Assume 14, 1, 1, let 7 hold with from 1 and let be positive definite. Then

 n1/2(Tn/n−M(γ0))D→N(0,σ2)

with , where .

The proof is given in Appendix C. A consistent asymptotic level--test rejects if , where is the -quantile of the standard normal distribution and is a consistent estimator for .

Remark 3.5.

Let , , be an intermediate sequence, that is , , and define . Moreover, denote for some the nonparametric estimator for depending on the subsample by , so that

 Hm,s:=supy∈Yw∣∣∣√m(^h(s)(y)−h(y))−1√msm∑j=(s−1)m+1ψ(Zj,T(y))∣∣∣=oP(1)

(compare to (2.7)). Then, if , can be estimated consistently by

 ^σ2 :=1qq∑s=1(2√mnnn∑k=1w(Yk)(^h(s)(Yk)−^h(Yk))(^h(Yk)^c1+^c2−Λ^θ(Yk)) +1√mnsmn∑j=(s−1)mn+1(w(Yj)(^h(Yj)^c1+^c2−Λ^θ(Yj))2 −1nn∑i=1w(Yi)(^h(Yi)^c1+^c2−Λ^θ(Yi))2))2.

One can show that such a sequence always exists.

4 A bootstrap version and simulations

Although Theorem 3.1 shows how the test statistic behaves asymptotically under , it is hard to extract any information about how to choose appropriate critical values of a test that rejects for large values of . The main reasons for this are that first for any function the eigenvalues of the operator defined in Theorem 3.1 are unknown, that second this function is unknown and has to be estimated as well, and that third even (which would be needed to estimate ) mostly is unknown and rather complex (see e.g. Appendix B). Therefore, approximating the -quantile, say , of the distribution of in Theorem 3.1 in a direct way is difficult and instead we suggest a smooth bootstrap algorithm to approximate .

Algorithm 4.1.

Let denote the observed data, define

 hθ(y)=Λθ(y)−Λθ(0)Λθ(1)−Λθ(0)andgθ(x)=E[hθ(Y)|X=x]

and let be a consistent estimator of , where is defined as in 6 under the null hypothesis and as in 1 under the alternative. Let and be smooth Lebesgue densities on and , respectively, where is strictly positive, has bounded support and . Let and be positive sequences with , , , . Denote by the sample size of the bootstrap sample.

1. [label=(0)]

2. Calculate . Estimate the parametric residuals by and denote centered versions by , .

3. Generate , , independently (given the original data) from the density

 fX∗(x)=1nbdXnn∑i=1κ(x−Xibn)

(which is a kernel density estimator for

with kernel and bandwidth ). For define bootstrap observations as

 Y∗j=(h∗)−1(^g(X∗j)+ε∗j)forh∗(⋅)=Λ^θ(⋅)−Λ^θ(0)Λ^θ(1)−Λ^θ(0), (4.1)

where is generated independently (given the original data) from the density

 1nn∑i=11anℓ(~εi−⋅an)

(which is a kernel density estimator for the density of with kernel and bandwidth ).

4. Calculate the bootstrap estimate for from .

5. Calculate the bootstrap statistic .

6. Let . Repeat steps (2)–(4) times to obtain the bootstrap statistics . Let denote the quantile of conditional on . Estimate by

 ^q∗α=min{z∈{T∗n,m,1,...,T∗n,m,B}:1BB∑k=1I{T∗n,m,k≤z}≥α}.
Remark 4.2.
1. The properties and ensure that conditional on the original data the support of contains that of (from assumption 7

) with probability converging to one. Thus,

can be used for calculating as well.

2. To proceed as in Algorithm 4.1 it may be necessary to modify so that belongs to the domain of for all . As long as these modifications do not have any influence on for , the influence on the and should be asymptotically negligible (which can be proven for the estimator by Colling and Van Keilegom (2019)).

The bootstrap algorithm should fulfil two properties: On the one hand, under the null hypothesis the algorithm has to provide, conditionally on the original data, consistent estimates of the quantiles of , or to be precise its asymptotic distribution from Theorem 3.1. To formalize this, let denote the underlying probability space. Assume that can be written as and for some measurable spaces and . Further, assume that is characterized as the product of a probability measure on and a Markov kernel

 P12:Ω1×A2→[0,1],

that is . While randomness with respect to the original data is modelled by , randomness with respect to the bootstrap data and conditional on the original data is modelled by . Moreover, assume

 P12(ω,A)=P(Ω1×A|(Y1(ω),X1(ω)),...,(Yn(ω),Xn(ω)))forallω∈Ω1,A∈A2.

With these notations in mind for all it would be desirable to obtain

 (4.2)

for all and . Here, the convention

 P12(ω,{T∗n,m≤q})=P12(ω,{~ω∈Ω2:(ω,~ω)∈{T∗n,m≤q}})

is used. On the other hand, to be consistent under the bootstrap quantiles have to stabilize or at least converge to infinity with a rate less than that of . To be precise, it is needed that

 P1(ω∈Ω1:limsupm→∞P12(ω,{Tnδ)=o(1) (4.3)

for all .

In the supplement we give conditions under which the bootstrap Algorithm 4.1 has the desired properties (4.2) and (4.3). In particular we need an expansion of as bootstrap counterpart to (2.7). To formulate this, for any realisation define

 FY∗(y)=P12(ω,{Y∗1≤y}),T∗(y)=FY∗(y)−FY∗(0)FY∗(1)−FY∗(0)andS∗=h∗(Y∗).

Then for any compact set and

 Am,n,δ={supy∈K∣∣∣^h∗(y)−h∗(y)−1mm∑j=1ψ∗(S∗j,X∗j,T∗(y))∣∣∣>δ√m}

we need

 P1(ω∈Ω1:∀δ>0:limsupm→∞P12(ω,Am,n,δ)=0)=1+o(1) (4.4)

for , where fulfils some assumptions given in the supplement (see assumption (A8*) for details). In the supplement we also give conditions under which for the transformation estimator of Colling and Van Keilegom (2019) the expansion is valid (see Lemma S.8).

Simulations

Throughout this section, , and are chosen. Moreover, the null hypothesis of belonging to the Yeo and Johnson (2000) transformations

 Λθ(Y)=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩(Y+1)θ−1θ,ifY≥0,θ≠0log(Y+1),ifY≥0,θ=0−(1−Y)2−θ−12−θ,ifY<0,θ≠2−log(1−Y),ifY<0,θ=2.

with parameter is tested. Under we generate data using the transformation to match the identification constraints . Under the alternative we choose transformations with an inverse given by the following convex combination,

 h−1(Y)=(1−c)(Λ−1θ0(Y)−Λ−1θ0(0))+c(r(Y)−r(0))(1−c)(Λ−1θ0(1)−Λ−1θ0(0))+c(r(1)−r(0)) (4.5)

for some , some strictly increasing function and some . In general it is not clear if a growing factor leads to a growing distance (2.5). Indeed, the opposite might be the case, if is somehow close to the class of transformation functions considered in the null hypothesis. Simulations were conducted for , and , where

denotes the cumulative distribution function of a standard normal distribution, and

. The prefactor in the definition of is introduced because the values of are rather small compared to the values of , that is, even when using the presented convex combination in (4.5), (except for ) would dominate the “alternative part” of the transformation function. Note that and only differ with respect to a different standardization. Therefore, if is defined via (4.5) with the resulting function is for close to the null hypothesis case.

For calculating the test statistic the weighting function was set equal to one. The nonparametric estimator of was calculated as in Colling and Van Keilegom (2019) (see Appendix B for details) with the Epanechnikov kernel