A common problem faced in applications is that one can only make indirect observations of a physical process. Consequently, important quantities of interest cannot be directly observed, but a suitable image under some transformation is typically available. These problems are called inverse problems in the literature. Loosely speaking, the goal is to recover a quantity (often a function) from a distorted version of an image , where
is some operator. Developing valid statistical inference procedures for these inverse problems is desirable, and in recent years several authors have worked on the construction of estimators, structural tests, and (pointwise and uniform) confidence bands for the unknown indirect regression function[see Mair and Ruymgaart (1996), Cavalier and Tsybakov (2002), Johnstone et al. (2004), Bissantz and Holzmann (2008), Cavalier (2008), Birke et al. (2010), Johnstone and Paul (2014), Marteau and Mathé (2014), and Proksch et al. (2015) among many others]. In this paper we consider an indirect regression model of the form
where is a predictor, is a random error and is a convolution operator, which will be specified later (along with the covariates ). Here is an unknown but square-integrable smooth function. We study a unified approach to testing certain model assumptions regarding the distribution function of the error in the indirect regression model (1.1).
Apart from specification of the operator , many statistical techniques used in applications for the estimation of depend on the error distribution. For example, when recovering astronomical images certain defects such as cosmic-ray hits are important to identify and remove [Section
of Adorf (1995)]. Here deviation values between observations from pixels and an initial reconstruction are calculated and compared with the standard deviation of the noise. A large deviation indicates the presence of a possible cosmic-ray hit, and observations from the affected pixels are discarded (or replaced by imputed values) in subsequent iterative reconstruction procedures that improve the quality of the final reconstructed image. Determining an unrealistic deviation depends on the structure of the noise distribution. More recently, Bertero et al. (2009) review maximum likelihood methods for reconstruction of distorted images, and, in their Section 5.2 on deconvolution using sparse representation, these authors note the popularity of assuming an additive Gaussian white noise model for transformed data. However, it is not known in advance whether this transformation is appropriate for a given image. If the transformation is inappropriate, then we can expect the Gaussian white noise model to also be inappropriate. The purpose of this paper is to help in answering some of these questions, which could be considered as goodness-of-fit hypotheses of specified error distributions.
Problems of this type have found considerable interest in direct regression models (this is the case where is an identity operator and only appears in (1.1)) [see Darling (1955), Sukhatme (1972) or Durbin (1973) for some early works or del Barrio et al. (2000) and Khmaladze and Koul (2004) for more recent references]. However, to the best of our knowledge the important case of testing distributional assumptions regarding the error structure of an indirect regression model of the form (1.1) has not been considered so far. We address this problem by proposing a test, which is based on the empirical distribution function of the standardized residuals from an estimate of the regression function. The method is based on a projection principle introduced in the seminal papers of Khmaladze (1981, 1988). This projection is also called the Khmaladze transformation and it has been well-studied in the literature. Exemplarily, we mention the work of Marzec and Marzec (1997), Stute et al. (1998), Khmaladze and Koul (2004, 2009), Haywood and Khmaladze (2008), Dette and Hetzler (2009), Koul and Song (2010), Müller et al. (2012), and Can et al. (2015), who use the Khmaladze transform to construct goodness-fit-tests for various problems. The work which is most similar in spirit to our work is the paper of Koul et al. (2018), who consider a similar problem in linear measurement error models.
We prefer the projection approach because there is a common asymptotic distribution describing the large sample behavior of the test statistics (without unknown parameters to be estimated) and the procedure can be easily adapted to handle different problems. To obtain a better understanding of projection principles as they relate to forming model checks, we direct the reader to consider the rather elaborate work of Bickel et al. (2006), who introduce a general framework for constructing tests of general semiparametric hypotheses that can be tailored to focus substantial power on important alternatives. These authors investigate a so-called score process obtained by a projection principle. Unfortunately, the resulting test statistics are generally not asymptotically distribution free
, i.e. the asymptotic distributions of these test statistics generally depend on unknown parameters and inference using them becomes more complicated. The Khmaladze transform is simpler to specify and easily employed in regression problems, since test statistics obtained from the transformation are asymptotically distribution free with (asymptotic) quantiles immediately available.
The article is organized as follows. A brief discussion of Sobolev spaces and their appearance in statistical deconvolution problems is given in Section 2. In this section we further propose an estimator of the indirect regression function and study its statistical properties. The proposed test statistic is introduced in Section 3. Finally, Section 4 concludes the article with a numerical study of the proposed testing procedure and an application. The technical details and proofs of our results can be found in Section 5.
2. Estimating smooth indirect regressions
Consider the model (1.1) with the operator specifying convolution between an unknown but smooth function and a known distortion function that characterizes , i.e.
Here the covariates are random and have support for some . The model errors are assumed to be independent with mean zero and common distribution function admitting a Lebesgue density function, which is denoted by throughout this paper. We also assume that are independent of the i.i.d. covariates .
Throughout this article we will assume that the indirect regression function from (1.1) is periodic and smooth in the sense that belongs to the subspace of periodic, weakly differentiable functions from the class of square integrable functions with support ; see Chapter 5 of Evans (2010) for definitions and additional discussion. For let be the set of multi-indices satisfying . To be precise, we will call a function weakly differentiable in of order when there is a collection of functions such that
for every infinitely differentiable function , with and , , vanishing at the boundary of and writing
The class of weakly differentiable functions from of order forms the Sobolev space
The periodic Sobolev space are those functions from that are periodic on and whose weak derivatives are also periodic on . An orthonormal basis for the space of square integrable functions is given by the Fourier basis . Here
is the common inner product between the vectorsand . It follows that can be equivalently represented by
where denotes the Euclidean norm and
are the Fourier coefficients of [see Kühn et al. (2014) for further discussion]. The series in the equivalent representation of motivates replacing the degree of weak differentiability by a real-valued smoothness index . Throughout this article we work with the general indirect regression model space defined as
We will assume that , for some specified below, and that such that is positive-valued and integrates to so that is a convolution operator from into . In this case we can represent in terms of a Fourier series
where and are the Fourier coefficients of and , respectively. In particular we have
Studying the indirect regression model (1.1) requires that we consider the ill-posedness of the inverse problem. This phenomenon occurs because the ratio needs to be summable when . However, when estimated Fourier coefficients are used does not asymptotically vanish (with increasing ) due to the stochastic noise from the errors in model (1.1). Consequently, the ratio is not necessarily summable, and this problem is therefore called ill-posed. We can see that the coefficients determine the rate at which the ratio expands, and, therefore, the ill–posedness of the inverse problem here is given by the rate of decay in the coefficients of the distortion function . We will assume that the inverse problem is mildly to moderately ill-posed in the sense of Fan (1991):
There are finite constants , and such that, for every , the Fourier coefficients of the function in (2.1) satisfy .
Under Assumption 1, whenever , for some , it follows that
from the celebrated convolution theorem for the Fourier transformation. This means that convolution of the indirect regressionwith the distortion function adds smoothness, and the resulting distorted regression function is now smoother than by exactly the degree of ill-posedness of the inverse problem. Note that Assumption 1 is milder than that of Fan (1991) in the sense that we allow the degree of ill-posedness and that the scaled Fourier coefficients can vanish. This covers the case of direct regression models where is the identity operator, that is . Further note that we do not have to invert the operator in order to investigate properties of the error distribution in the indirect regression model (1.1).
Several techniques have been developed in the literature to derive series-type estimators (see, for example, Cavalier, 2008). A popular regularization method to employ is the so-called spectral cut-off method, where an indicator function is introduced in (2.3). For example, the indicator function (for some sequence converging to ) results in a biased version of :
The proposed estimator is obtained by replacing the coefficients with consistent estimators , which gives
as an estimator of . The sequence of smoothing parameters is chosen such that is consistently estimated. We can generalize this approach as follows.
Following Politis and Romano (1999) we consider a Fourier smoothing kernel , where is defined to be the Fourier transformation of some smoothing kernel function, say . The resulting estimate is then defined by
Another useful observation that Politis and Romano (1999) make is the function has Fourier coefficients . Throughout this paper we will choose as follows:
The Fourier smoothing kernel satisfies , for , , for , and .
The random covariates from model (1.1) are assumed to be independent with distribution function . For simplicity we will assume that satisfies the following properties.
Let the covariate distribution function admit a positive Lebesgue density function satisfying , and that for some .
The boundedness assumptions taken for are common in nonparametric regression because these conditions guarantee good performance of nonparametric function estimators. The last condition ensures that the density function satisfies similar smoothness properties as the indirect regression function , which allows us to use a Fourier series technique to specify a good estimator of (see, for example, Politis and Romano, 1999).
What remains is to define the estimates of the Fourier coefficients required in the definition (2.5). Observing the representation
the covariate density function must be estimated. For this purpose we the expand the density function into its Fourier series using the coefficients , with . Estimators of these coefficients are given by
From these estimators we then obtain an estimator of the unknown covariate density function , that is
with smoothing weights
Here (as before) the choice of defines the form of the smoothing weights . The sequence of smoothing parameters is specified later.
We now propose to estimate the Fourier coefficients of the distorted regression function by
where the smoothing weights are defined in (2.7).
The results of Lemma 2 in Section 5 show that the consistency of the estimated Fourier coefficients is heavily dependent on the consistency of the covariate density estimator . This fact motivates our choice of smoothing parameters as
and requiring that the covariate density function has a smoothness index in Assumption 3, where is the smoothness index of the function class to which belongs, is the degree of ill-posedness of the inverse problem and is the dimension of the covariates. Our first result result establishes the uniform consistency of the estimator in (2.5) and a further technical metric space inclusion property that is useful for working with residual-based empirical processes.
Let for some and let Assumption
1 hold for some degree of
ill-posedness . Let Assumption 2 hold for a
Fourier smoothing kernel that satisfies
. Further let Assumption 3 hold for and assume that the errors have a
finite absolute moment of order
have a finite absolute moment of order. Choose the smoothing parameter as in (2.9). Then
where is the unit ball of the metric space .
3. Goodness-of-fit testing the error distribution
). Here the location parameter is the mean of the errors and equal to zero, but the scale parameter is unknown. The null hypothesis is given by
is a specified density function of the standardized error distribution andis the unknown scale parameter. To simplify notation we write for the density function of the standardized errors () and () for the corresponding distribution function. With this notation the null hypothesis in (3.1) becomes for some . Equivalently, we can write for some by writing () for the error distribution function specified by the null hypothesis.
Following Müller et al. (2012), who consider a similar problem in the direct case, we propose to use the standardized residuals
is a consistent estimator of the scale parameter . A nonparametric estimator of is given by the empirical distribution function of these standardized residuals,
The null hypothesis is then rejected if a given metric between the estimated standardized distribution function and is large enough. A popular metric in the literature is the supremum metric, and this leads to the Kolmogorov-Smirnov test statistic:
Critical values for the Kolmogorov-Smirnov test statistic are then determined from asymptotic theory, but these can be difficult to work with in practice because they depend on . To avoid this problem, we will work with a different test statistic.
Our proposed test statistic will crucially depend on the estimator satisfying an asymptotic expansion, which is given in the following result.
Let the assumptions of Theorem 1 hold, with and assume that the Fourier smoothing kernel is radially symmetric. Let have a finite absolute moment of order or larger and a bounded Lebesgue density that is (uniformly) Hölder continuous with exponent . Finally, the function is assumed to be uniformly continuous and bounded. Then under the null hypothesis (3.1)
A direct consequence of Theorem 2 is that, under the null hypothesis (3.1), the stochastic process weakly converges in the space to a Gaussian process, which is also the weak limit of the stochastic process
This limit distribution can be easily simulated. However, it is clearly not distribution free because it depends on and specified in the null hypothesis.
In order to obtain a test statistic whose critical values are independent from the distribution specified in the null hypothesis, we use a particular projection of the residual-based empirical process by viewing this quantity as an (approximate) semimartingale with respect to its natural filtration. The projection is given by the Doob-Meyer decomposition of this semimartingale (see page 1012 of Khmaladze and Koul, 2004). For this purpose we will assume that has finite Fisher information for location and scale, i.e.
writing for the derivative of the Lebesgue density .
The Khmaladze transformation produces a standard limiting distribution: a standard Brownian motion on , and as a consequence we can construct test statistics which are asymptotically distribution free, i.e. the corresponding critical values do not depend on specified by the null hypothesis.
To be precise, note that
characteristically has mean zero and variance equal to one. In order to introduce our test statistic we define the augmented score function
and the incomplete information matrix
Following Khmaladze and Koul (2009) the transformed empirical process of standardized residuals is given by
for some . We can rewrite in a more computationally friendly form, i.e.
Under the null hypothesis (3.1) weakly converges in the space to , writing for the standard Brownian motion.
In general, the incomplete information matrix does not have a simple form, and degenerates as . To avoid this degeneracy issue we proceed as in Stute et al. (1998), who recommend using the quantile from the empirical distribution function for , i.e. writing for the sample quantile function associated with . We propose to base a goodness-of-fit test for the hypothesis (3.1) on the supremum metric between and the constant :
The test statistic has an asymptotic distribution given by under the null hypothesis (3.1).
Our proposed goodness-of-fit test for the null hypothesis (3.1) is then defined by
|(3.5)||Reject when ,|
where is the upper -quantile of the distribution of . The value of may be obtained from formula (7) on page 34 of Shorack and Wellner (1986), i.e.
For a -level test, and is approximately .
4. Finite sample properties
We conclude the article with a numerical study of the previous results with two examples and an application of the proposed test. Throughout this section we consider a goodness-of-fit test for normally distributed errors in the indirect regression model (1.1), i.e.
Note that in this case a straightforward calculation shows that the augmented score function and the incomplete information matrix from (3.3) become particularly simple, that is and
writing and for the respective distribution and density functions of the standard normal distribution.
4.1. Simulation study
In the first example we generate independent bivariate covariates with independent and identically distributed components and () as follows. The common distribution of and is characterized by the density function (), which is depicted in the left panel of Figure 1, where
One can easily verify that
is a probability density function and satisfies the requirements of Assumption3 for any . The random sample of covariates is then generated from the distribution characterized by the non-trivial density function
using a standard probability integral transform approach. In the second example we use independently, uniformly distributed covariates in the unit square.
The distortion function is taken as the product of two (normalized) Laplace density functions restricted to the interval , each with mean and scale . For greater transparency, the Fourier coefficients of the distortion function are
This choice indeed satisfies Assumption 1 with . When nonparametric smoothing is performed we work with the radially symmetric spectral cutting kernel characterized by the Fourier coefficient function , , with smoothing parameter chosen by minimizing the leave-one-out cross-validated estimate of the mean squared prediction error (see, for example, Härdle and Marron, 1985). This choice is practical, simple to implement and performed well in our study.
The indirect regression function is given by
for . This is easily seen to belong to for any . Following the previous discussion, the distorted regression belongs to for any . In the middle and right panels of Figure 1 we display the indirect regression function and the distorted regression function .
We considered four scenarios: normally distributed errors with standard deviation ; Laplace distributed errors with scale parameter ; centered, skew-normal errors with scale parameter and skew parameter (standard deviation is ); Student’s distributed errors with degrees of freedom (standard deviation is ). The first scenario allows us to check the level of the proposed test statistic , and the other three scenarios allow for observing the simulated powers of the proposed test. Here we work with a -level test, and the quantile is then .
We perform simulation runs of samples of sizes , , and . Table 1 displays the results for the first example (when the covariates have the non-trivial distribution characterized by the density function ) and Table 2 displays the results for the second example (when the covariates are independently, uniformly distributed in the unit square ). Beginning with the first example, at the sample size the test rejected the null hypothesis in of the cases (near the desired ) but at the sample sizes and the test respectively rejected the null hypothesis in and in of the cases, which are both above the desired nominal level. However, at the sample size the test rejected the null hypothesis in of the cases, which is (again) near the desired nominal level of . We expect that this behavior is due to the data-driven smoothing parameter selection. Interestingly, in the second example the test is slightly conservative at all of the simulated sample sizes (e.g. rejecting of the cases at sample size ), but with sample size the test rejected the null hypothesis in of the cases (near the nominal level of ), which coincides with the first example.
Turning our attention now to the power of the test, in the first example, we can see that the test performs well for moderate and larger sample sizes. At the sample size the test respectively rejected the alternative error distributions Laplace, skew-normal and Student’s in only , and of the cases, but at the sample size the test respectively rejected the alternative distributions in , and of the cases. In the second example, we can see that the power of test dramatically improves with smaller sample sizes (rejecting the alternative distributions in , and of the cases at sample size ) with less improvement at larger sample sizes (rejecting the alternative distributions in , and of the cases at the sample size ). In conclusion it appears that the proposed test statistic is an effective tool for testing the goodness-of-fit of a desired error distribution in indirect regression models.
4.2. An application to image reconstruction
Here we illustrate an application of the previous results using the HeLa dataset investigated in Bissantz et al. (2009) and more recently by Bissantz et al. (2016). This data composes an image of living HeLa cells obtained using a standard confocal laser scanning microscope and consists of intensity measurements (numbered values ) on pixels giving a total of observations, see Figure 2. As noted on page
of Bissantz et al. (2009), these image data are (approximately) Poisson distributed. We therefore apply the Anscombe transformationto obtain approximately normally distributed data, and then apply the test (3.5) to check the assumption of normally distributed errors (at the level) from a reconstruction of this image using the previously studied results. We use the computing language R with the package OpenImageR, which allows for reading the image data and conducting our analysis.
Since the total number of observations is quite large, we rather illustrate the test for normal errors using two smaller sections of the original HeLa image. To display the reconstructions of the smaller images (for visual comparison with the original data) we apply the inverse of the Anscombe transformation to the fitted values of each regression. In both examples, the pixels are mapped to midpoints of appropriate grids of the unit square . The first image we consider is pixels composing observations and is displayed in Figure 3 alongside its reconstructed version and a normal QQ-plot of the resulting standardized regression residuals (see Section 3). The second image we consider is pixels composing observations and is displayed in Figure 4 alongside its reconstructed version and a normal QQ-plot of the resulting standardized regression residuals. In both cases, as in Section 4.1, when nonparametric smoothing is applied the smoothing parameter is chosen by minimizing the leave-one-out cross-validated estimate of the mean squared prediction error.
Beginning with the first and smaller image, the martingale transform test statistic that assesses the goodness-of-fit of a normal distribution has value , which is smaller than , and the null hypothesis of normally distributed errors is not rejected. Inspecting the QQ-plot of these standardized residuals it appears that the assumption of normally distributed errors is appropriate, which confirms our previous finding. In this case, we can see the reconstruction very closely mirrors the original.
Turning now to the second and larger image, the value of the test statistic is , which is much larger than , and we reject the null hypothesis of normally distributed errors. The QQ-plot of the standardized residuals now appears to contain systematic deviation from normality, which confirms that the hypothesis of the normally distributed errors is inappropriate. Here we can see the reconstruction is now not as accurate as it was for the previous case. In conclusion, we can see the approach of using the proposed test statistic for assessing convenient forms of the error distribution is useful.
Acknowledgements This work has been supported in part by the Collaborative Research Center “Statistical modeling of nonlinear dynamic processes” (SFB 823, Project C1) of the German Research Foundation (DFG) and in part by the Bundesministerium für Bildung und Forschung through the project “MED4D: Dynamic medical imaging: Modeling and analysis of medical data for improved diagnosis, supervision and drug development”. The authors would also like to thank Kathrin Bissantz for providing us the HeLa cells image used in our data example.
In this section we give the technical details supporting our results. We have the following uniform convergence property for the density estimator .
(and note that whenever ) to see that
Using the representation and the fact that are the Fourier coefficients of the function we obtain
One calculates directly that
In addition, is bounded and therefore
To continue, let be a sequence of positive real numbers satisfying and partition into parts with associated centers () such that . The assertion (5.1) follows from the arguments above and by additionally showing that and , almost surely.
Combining (5.2) and (5.3) with Bernstein’s inequality (see, for example, Section 2.2.2 of van der Vaart and Wellner, 1996), one chooses a large enough positive constant (through the choice of the quantity ) such that
is summable in . Since , this occurs when and we have
We will now demonstrate that , almost surely. Let be arbitrary and write
is a generic random variable with distribution characterized by the density function. The complex exponential functions are bounded in absolute value by , and it is easy to verify that . As above, use Bernstein’s inequality choosing a large enough positive constant (through the choice of the quantity ) to find that
is summable in . This occurs when , independent of . It follows that , almost surely.
Further, let be arbitrary. For any it follows that