# Identification in a Fully Nonparametric Transformation Model with Heteroscedasticity

The so far most general identification result in the context of nonparametric transformation models is proven. The result is constructive in the sense that it provides an explicit expression of the transformation function.

## Authors

• 3 publications
04/04/2020

### Estimation of the Transformation Function in Fully Nonparametric Transformation Models with Heteroscedasticity

Completely nonparametric transformation models with heteroscedastic erro...
05/07/2020

### Diffusion Copulas: Identification and Estimation

We propose a new semiparametric approach for modelling nonlinear univari...
11/02/2018

### Nonparametric identification in the dynamic stochastic block model

We show nonparametric identification of the parameters in the dynamic st...
04/16/2020

### Identification of a class of index models: A topological approach

We establish nonparametric identification in a class of so-called index ...
01/16/2021

### What was the river Ister in the time of Strabo? A mathematical approach

In this paper, we introduce a novel method for map registration and appl...
11/29/2018

### Solving group Steiner problems as Steiner problems: the rigorous proof

The Steiner tree problems are well-known NP-hard problems that have dive...
02/05/2020

### Linearly Constrained Neural Networks

We present an approach to designing neural network based models that wil...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The underlying question of this article can be formulated quite easily: Given some real valued random variable

and some -valued random variable fulfilling the heteroscedastic transformation model

 h(Y)=g(X)+σ(X)ε (1.1)

with some error term independent of and fulfilling and , are the model components

and the error distribution uniquely determined if the joint distribution of

is known? This uniqueness is called identification of a model.

Over the last years, transformation models have attracted more and more attention since they are often used to obtain desirable properties by first transforming the dependent random variable of a regression model. Applications for such transformations can reach from reducing skewness of the data to inducing additivity, homoscedasticity or even normality of the error terms. Already

Box and Cox (1964), Bickel and Doksum (1981) and Zellner and Revankar (1969) introduced some parametric classes of transformation functions. Horowitz (1996)

proved for a linear regression function

and homoscedastic errors that the model is identified, when is assumed for some and the regression parameter is standardized such that the first component, which is different from zero, is equal to one. Later, the ideas of Horowitz (1996) were extended by Ekeland et al. (2004) to general smooth regression functions . The arguably most general identification results so far were provided by Chiappori et al. (2015) and Vanhems and Van Keilegom (2019), who considered general regression functions and homoscedastic errors as well, but allowed endogenous regressors. Linton et al. (2008) used similar ideas to obtain identifiability of a model with parametric transformation functions as a special case. As will be seen in Section 2 these approaches can not be applied to the heteroscedastic model so that different methods are needed. Despite their practical relevance (e.g. in duration models, see Khan et al. (2011)), results allowing heteroscedasticity are rare. Zhou et al. (2009) showed identifiability in a single-index model with a linear regression function

and a known variance function

. Wang and Wang (2018) applied this model to lung cancer data. Neumeyer et al. (2016) required identifiability implicitly in their assumptions.

In contrast to the approaches mentioned above, it is tried here to avoid any parametric assumption on or

, which to the author’s knowledge has not been done before. Note that the validity of the model is unaffected by linear transformations. This means that for arbitrary constants

equation (1.1) still holds when replacing , and by

Of course, one could have chosen an arbitrary as well, but similar to existing results the transformation function will be restricted to be strictly increasing without loss of generality. Nevertheless, at least two conditions for fixing and are needed. Referring to the fact that these conditions will determine the linear transformation they are sometimes called location and scale constraints.

This remainder is organized as follows. First, some assumptions are listed before the main identification result for heteroscedastic transformation models is motivated and stated. Afterwards, a short conclusion in Section 3 is followed by the Appendix, which contains some results on uniqueness of solutions to differential equations and the proof of the main result.

## 2 The Idea and the Result

Before the identification result can be motivated, some assumptions and notations have to be introduced. First, basic assumptions concerning validity of model (1.1) and continuity of its model components are given.

1. [label=(A0)]

2. Let and be real valued and -valued random variables, respectively, with

 h(Y)=g(X)+σ(X)ε

for some transformation, regression and variance functions and .

3. is a centred random variable independent of with and .

4. Let the density of be continuous and let and from 1 be continuously differentiable.

Moreover, a regularity assumption for the conditional distribution function of given is needed.

1. [label=(A4)]

2. The conditional cumulative distribution function

is continuously differentiable with respect to and . Let be a weight function with support such that for all and such that (with and from 1)

 A:=∫v(x)⎛⎜ ⎜⎝σ(x)∂g(x)∂x1−g(x)∂σ(x)∂x1σ(x)⎞⎟ ⎟⎠dxandB:=∫v(x)∂σ(x)∂x1σ(x)dx

are well defined with .

The assumption requires heteroscedasticity of the model. Note that the homoscedastic case was already treated by Chiappori et al. (2015). Later, it will be shown in Remark 2.2 that 13 and the first part of 1 exclude the case that there exist a homoscedastic and a heteroscedastic version of model (1.1) at the same time. In the following, the functions and from 1 and 3 are used to show their uniqueness and consequently identification of the model.

### 2.1 The Transformation Function as a Solution to an Initial Value Problem

Many of the homoscedastic identification approaches mentioned in the introduction are based on the same idea (see Ekeland et al. (2004), Horowitz (2009) and recently Chiappori et al. (2015)). Using the example of Chiappori et al. (2015) their method can be summarized in the following way: Let be the conditional cumulative distribution function of conditioned on . Take the derivatives of with respect to and some component of , divide the first by the latter one and obtain the transformation function by integrating this quotient. After applying some identification constraints the transformation function is identified as it only depends on the joint distribution of . In heteroscedastic models, the reasoning has to be changed since the way, the transformation function enters the conditional distribution function and its partial derivatives, becomes more complex. The latter functions can be written as

 FY|X(y|x) =P(Y≤y|X=x) =Fε(h(y)−g(x)σ(x)), ∂FY|X(y|x)∂y =fε(h(y)−g(x)σ(x))h′(y)σ(x)>0 (1.2)

and

 ∂FY|X(y|x)∂xi=−fε(h(y)−g(x)σ(x))σ(x)∂g(x)∂xi+(h(y)−g(x))∂σ(x)∂xiσ(x)2,i=1,...,dX.

Here, is an abbreviation for the derivative and denotes the cumulative distribution function of . Hence, even if 1 is valid the transformation function can not be obtained by simply integrating the quotient

 ∂FY|X(y|x)∂y∂FY|X(y|x)∂xi=−h′(y)σ(x)σ(x)∂g(x)∂xi+(h(y)−g(x))∂σ(x)∂xi, (1.3)

since the denominator now also depends on the transformation function.

Instead, we consider the reciprocal value of (1.3) and name this :

Next, if is the weight function from 1 can be integrated with respect to as follows to obtain

 λ(y):=∫v(x)λ(y|x)dx=−A+Bh(y)h′(y) (1.4)

with and from 1. Since assumption 1 implies and consequently strict monotonicity of , there exists exactly one root of which will be called

 y0:=λ−1(0)

in the following. Due to (1.4) it holds that .

In the following, the problem of identifying model (1.1

) is reduced to solving an ordinary differential equation uniquely. Afterwards, basic uniqueness theorems for initial value problems will imply the main identification result. To this end, rewrite equation (

1.4) to obtain

 h′(y)=−A+Bh(y)λ(y) (1.5)

for all . This indeed can be understood as a differential equation, but an initial condition is needed to obtain an initial value problem. Here, the initial condition

 h(y1)=α (1.6)

for some and some is considered (remember that was assumed to be strictly increasing). Theorem A.2 in the appendix yields uniqueness of any solution to this initial value problem on any interval . This identification result can be generalized to all .

### 2.2 Uniqueness of the Unknown Coefficients

The reasoning above is designed for fixed and , that is, it remains to prove uniqueness of these coefficients. Moreover, it would be desirable to derive an explicit formula for the transformation function instead of only proving its uniqueness. This will be done in the remainder of this section.

First, the initial value problem, which corresponds to the equations (1.5) and (1.6), is solved by

 h(y)=(A+Bα)exp(−B∫yy11λ(u)du)−ABforally∈(y0,∞). (1.7)

By straightforward calculations, it can be verified that (1.7) is indeed a solution to the initial value problem. Second, as was already mentioned in the introduction, model (1.1) is not only fulfilled for , but also for any linear transformation of these functions. Therefore, to obtain uniqueness it is necessary to fix these linear transforms. This can be done by requiring so called location and scale constraints and corresponds to fixing and . While

 h(y0)=0 (1.8)

is chosen as the location constraint the scale constraint is equal to the initial condition (1.6), that is, is viewed as an arbitrary, but fixed positive number. Here, the location constraint was chosen such that equation (1.4) implies . Nevertheless, other location constraints are conceivable as well as can be seen in Remark 2.3.

Consequently, equation (1.7) reduces to

 h(y)=αexp(−B∫yy11λ(u)du)forally∈(y0,∞). (1.9)

If there exist two coefficients such that the corresponding transformation functions from (1.9) fulfil model (1.1), it would hold that

 ~h(y)=α(h(y)α)~BBforally∈(y0,∞).

Assume without loss of generality . Then,

 ~h′(y)=~BB(h(y)α)~BB−1h′(y)y↘y0⟶0.

Therefore, continuous differentiability of and would imply , which due to (1.2) would lead to a violation of 1. Hence, is unique under 11, which finally leads to the main identification result. Note that the same argument is valid for transformation functions as in (1.7) since these are simply linearly transformed versions of (1.9).

###### Theorem 2.1

Let and assume 11 and (1.6).

1. [label=)]

2. For each such that , the unique solution to (1.5) on is given by (1.7). It can be extended to a global unique solution to (1.5) by

 (1.10)

where is uniquely determined by requiring as

 α2=−limt→0(A+Bα)exp(B(∫y0−ty21λ(u)du−∫y0+ty11λ(u)du))+AB. (1.11)
3. If additionally (1.8) and hold, one has

 h(y)=⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩exp(−B∫yy11λ(u)du)y>y00y=y0α2exp(−B∫yy21λ(u)du)y

where is uniquely determined by requiring as

 α2=−limt→0exp(B(∫y0−ty21λ(u)du−∫y0+ty11λ(u)du)). (1.13)

Moreover, is uniquely determined and it holds that

 g(x)=E[h(Y)|X=x]andσ(x)=√Var(h(Y)|X=x).

The proof can be found in Section B.

Finally, two remarks are given dealing on the one hand with further generalizations and implications for future estimation and testing techniques and on the other hand with justifying alternative identification constraints.

###### Remark 2.2
1. [label=)]

2. If is not constant, there are values such that . Consequently, changes its sign for these . If is constant, that is, the error is homoscedastic, this is not the case. Hence, model (1.1) can not be fulfilled for homoscedastic and heteroscedastic errors at the same time.

3. The identification result can be generalized in many regards. For example, one could have used any other partial derivative in 1 as well. Moreover, can be chosen as a Dirac delta function as well and it is possible to consider error densities with bounded support. See Kloodt (2019) for a more detailed examination. Moreover, it is conjectured that the result can be generalized to conditional independence of and given endogenous regressors similarly to Chiappori et al. (2015).

###### Remark 2.3

One could have used other scale and location constraints than (1.6) and (1.8). For example, consider for some real numbers the conditions

 h(ya)=αaandh(yb)=αb. (1.14)

Assume there exist two transformation functions such that model (1.1) and the constraints (1.14) are fulfilled. Then, the functions

 ~T1(y)=T1(y)−T1(y0)T1(y1)−T1(y0)and~T2(y)=T2(y)−T2(y0)T2(y1)−T2(y0)

fulfil model (1.1) and the constraints (1.6) and (1.8). This leads to so that

 T1(y)=(αb−αa)~T1(y)−~T1(ya)~T1(yb)−~T1(ya)+αa=(αb−αa)~T2(y)−~T2(ya)~T2(yb)−~T2(ya)+αa=T2(y)

for all .

A similar reasoning can be applied to show that identification constraints like

 h(ya)=αa,h′(ya)=αb

for some ensure uniqueness of the transformation function as well.

## 3 Conclusion and Outlook

The so far most general identification result in the theory of transformation models has been provided. While doing so, the techniques of Ekeland et al. (2004) and Chiappori et al. (2015) have been used to reduce the problem of identifiability to that of solving an ordinary differential equation. Most of the previous results are contained as special cases. The main contribution consists in allowing heteroscedastic errors, which justifies the common practice to assume identifiability like for example in the paper of Neumeyer et al. (2016).

Moreover, the result is constructive in the sense that it does not only guarantee identification of the model, but even supplies an analytic expression of the transformation function depending on the joint cumulative distribution function of the data and some parameter . This parameter is identified, too, and can be expressed as in Kloodt (2019) under the additional assumption of a twice continuously differentiable transformation function.

Due to the explicit character of equation (1.7), future research could consist in analysing the resulting plug-in estimator. This will be the topic of a subsequent paper. Furthermore, the presented results could be successively generalized as in Remark 2.2 or by allowing vanishing derivatives of . Moreover, it would be desirable to develop conditions on the joint distribution function of under which model (1.1) is fulfilled. In contrast to the thoughts on identifiability here, such a question addresses the solvability of (1.1), that is, the issue of existence of a solution instead of uniqueness.

## Appendix A Uniqueness of Solutions to Ordinary Differential Equations

In this Section, two basic results about ordinary differential equations and uniqueness of possible solutions are given. Theorem A.2 is slightly modified compared to the version of Forster (1999, p. 102) so that the proof is presented as well.

###### Lemma A.1

(Gronwall’s Inequality, see Grönwall (1919) or Bellman (1953) for details) Let be a compact interval. Let and be continuous functions. Further, let

 u(y)≤v(y)+∫yaq(z)u(z)dz

for all . Then, one has

 u(y)≤v(y)+∫yav(z)q(z)exp(∫yzq(t)dt)dzforally∈I.
###### Theorem A.2

(see Forster (1999, p. 102) for a related version) Let and be a set such that . Moreover, let be continuous with respect to both components and continuously differentiable with respect to the second component. Then, for all any solution of the initial value problem

 h′(y)=D(y,h(y)),h(a)=θ0

is unique.

Proof: Let be two solutions of the mentioned initial value problem. Since

 K:={(y,θ)∈[a,b]×R+:y∈[a,b],θ∈{h1(y),h2(y)}}

is compact, there exists some such that for all . Consider the distance . Then for all

 d(y) =|h1(y)−h1(a)−(h2(y)−h2(a))| =∣∣∣∫ya(D(z,h1(z))−D(z,h2(z)))dz∣∣∣ ≤∫ya|D(z,h1(z))−D(z,h2(z))|dz ≤L∫ya|h1(z)−h2(z)|dz =L∫yad(z)dz.

Gronwall’s Inequality leads to (set ).

## Appendix B Proof of Theorem 2.1

Consider a compact interval and recall equation (1.7). Assumption 1 ensures . First, it is shown that as defined in (1.7) is the unique solution to (1.5) on

. For the moment assume

and define

 G=[y1,k2]×[α,∞)andD:G→R, D(y,h)=−A+Bhλ(y).

With the choices and , Theorem A.2 ensures uniqueness of the solution to

 h′(y)=−A+Bh(y)λ(y)forally∈[y1,k2].

By straightforward calculations, it can be verified that (1.7) is indeed a solution to this initial value problem. Since for all , this solution holds for arbitrarily large . Hence, by letting tend to infinity uniqueness of on is obtained.

Now, consider an arbitrary value . Then, if the previous initial condition is replaced by

 h(~y)=~α

for some , the same reasoning as before can be used to show that the differential equation

 h′(y)=−A+Bh(y)λ(y)forally∈[~y,k2]

is uniquely solved under this constraint by

 h(y) =(A+B~α)exp(−B∫y~y1λ(u)du)−AB =(A+B~α)(A+Bα)exp(−B∫y1~y1λ(u)du)(A+Bh(y))−AB

for all , where the last equation follows from (1.7). To fulfil the previous scale constraint it is required that

 ~α=(A+αB)exp(B∫y1~y1λ(u)du)−AB.

Since this in turn results in expression (1.7) for all , is identified for all . Choosing arbitrarily close to results in

 h(y)=(A+Bα)exp(−B∫yy11λ(u)du)−ABforally>y0.

When proceeding analogously for with the initial condition

 h(y2)=α′

for some , one has

 h(y)=(A+Bα′)exp(−B∫yy21λ(u)du)−ABforally

Recall for all and let . Due to the continuous differentiability of in , one has

 limt→0h(y0+t)−h(y0)th(y0−t)−h(y0)−t→1.

On the other hand, it holds that

 h(y0+t)−h(y0)th(y0−t)−h(y0)−t =−(A+Bα)exp(−B∫y0+ty11λ(u)du)(A+Bα′)exp(−B∫y0−ty21λ(u)du) =−(A+Bα)(A+Bα′)exp(B(∫y0−ty21λ(u)du−∫y0+ty11λ(u)du)),

so that

This leads to the uniqueness of solution (1.10), since uniqueness of was already shown in Section 2.2.
Inserting yields the second part of the assertion, while identification of and

as the conditional mean and standard deviation follows from standard arguments.

## Acknowledgements

This work was supported by the DFG (Research Unit FOR 1735 Structural Inference in Statistics: Adaptation and Effciency).
Moreover, I would like to thank Natalie Neumeyer and Ingrid Van Keilegom for their very helpful suggestions and comments on the project.

## References

• Bellman (1953) R. Bellman. Stability theory of differential equations. Dover Publications, 1953.
• Bickel and Doksum (1981) P. J. Bickel and K. A. Doksum. An analysis of transformations revisited. Journal of the American Statistical Association, 76:296–311, 1981.
• Box and Cox (1964) G. E. P. Box and D. R. Cox. An analysis of transformations. Journal of the Royal Statistical Society. Series B, 26(2):211–252, 1964.
• Chiappori et al. (2015) P.-A. Chiappori, I. Komunjer, and D. Kristensen. Nonparametric identification and estimation of transformation. Journal of Econometrics, 188(1):22–39, 2015.
• Ekeland et al. (2004) I. Ekeland, J. J. Heckman, and L. Nesheim. Identification and estimation of hedonic models. Journal of Political Economy, 112(1):60–109, 2004.
• Forster (1999) O. Forster. Analysis 2, volume 4. Vieweg, 1999.
• Grönwall (1919) T. H. Grönwall. Note on the derivatives with respect to a parameter of the solutions of a system of differential equations. Annals of Mathematics, 20(4):292–296, 1919.
• Horowitz (1996) J. L. Horowitz. Semiparametric estimation of a regression model with an unknown transformation of the dependent variable. Econometrica, 64(1):103–137, 1996.
• Horowitz (2009) J. L. Horowitz. Semiparametric and nonparametric methods in econometrics. Springer, 2009.
• Khan et al. (2011) S. Khan, Y. Shin, and E. Tamer. Heteroscedastic transformation models with covariate dependent censoring. Journal of Business & Economic Statistics, 29(1):40–48, 2011.
• Kloodt (2019) N. Kloodt. Nonparametric Transformation Models. PhD thesis, Universität Hamburg, 2019.
• Linton et al. (2008) O. Linton, S. Sperlich, and I. Van Keilegom. Estimation of a semiparametric transformation model. The Annals of Statistics, 36(2):686–718, 2008.
• Neumeyer et al. (2016) N. Neumeyer, H. Noh, and I. Van Keilegom. Heteroscedastic semiparametric transformation models: estimation and testing for validity. Statistica Sinica, 26:925–954, 2016.
• Vanhems and Van Keilegom (2019) A. Vanhems and I. Van Keilegom. Semiparametric transformation model with endogeneity: a control function approach. Econometric Theory, 2019. to appear.
• Wang and Wang (2018) Q. Wang and X. Wang. Analysis of censored data under heteroscedastic transformation regression models with unknown transformation function. The Canadian Journal of Statistics, 46(2):233–245, 2018.
• Zellner and Revankar (1969) A. Zellner and N. S. Revankar. Generalized production functions. Review of Economic Studies, 36(2):241–250, 1969.
• Zhou et al. (2009) X.-H. Zhou, H. Lin, and E. Johnson. Non-parametric heteroscedastic transformation regression models for skewed data with an application to health care costs. Journal of the Royal Statistical Society B, 70:1029–1047, 2009.