Computed tomography (CT) is a noninvasive imaging technique, which is a key method for medical diagnoses. CT is based on measuring the intensity losses of X-rays sent through a body. From these measurements an attenuation profile can be recovered that provides an image of the body’s (unobservable) interior. The X-rays are linear and so the scanner rotates to create a two-dimensional slice. Insight into three-dimensional structures is obtained by considering multiple slices. Our investigation is limited to a statistical analysis of data gathered from a single slice. For this purpose we introduce the inverse regression model
are independent and identically distributed random variables with. Here is a given index set, with each index corresponding to an X-ray path and the design point characterizing this path with associated response . Consequently, can be written using coordinates as the distance from the origin and as the angle of inclination. The body’s (true) attenuation profile along the slice is represented by , a function supported on the unit disc. is a linear operator acting on and denotes the normalized Radon transform, i.e. for and ,
Details on the underlying physics and applications of CT can be found in Buzug (2008).
Image reconstruction in CT is a particular case of the broad class of linear inverse problems. An overview of the mathematical aspects of these problems and methods to solving them can be found in the monographs of Natterer (1986), Engl et al. (1996) and Helgason (2011). Other examples of linear inverse problems are the heat equation and convolution transforms (see Mair and Ruymgaart (1996), Saitoh (1997), and Cavalier (2008), among others). Additional statistical inverse problems include errors-in-variables models and the Berkson error model (see, for example, Bissantz et al. (2007), Carroll et al. (2007), Koul and Song (2008, 2009), Bertero et al. (2009), Kaipio and Somersalo (2010), Delaigle et al. (2014), and Kato and Sasaki (2017)). The Radon transform is usually discussed in the contexts of positron emission tomograpy (PET) and CT in medical imaging. In the case of PET, lines-of-sight are observed along which emissions have occurred. However, the positions of the emissions on these lines are unknown. Here the aim is to reconstruct the emission density (see Johnstone and Silverman (1990), Korostelev and Tsybakov (1993), and Cavalier (2000), among others). On the other hand, CT leads to the inverse regression (1.1) (see, for example, Cavalier (1999) and Kerkyacharian et al. (2010); Kerkyacharian et al. (2012)).
We contribute to this discussion by deriving the rate of uniform, strong consistency for a nonparametric estimator of the unknown function based on the popular spectral cutoff method. Further, we derive a functional central limit theorem for the empirical process of the resulting model residuals , i.e. we investigate the estimator
where the nonnegative weights sum to (see Section 3). Statistical applications of results of this type include validation of model assumptions. In the context of inverse regression models, to the best of our knowledge only one result is available: Bissantz et al. (2018), who study an inverse regression model characterized by a convolution transformation.
In direct regression problems, residual-based empirical processes arising from non- and semiparametric regression estimators have been considered by numerous authors (see Akritas and van Keilegom (2001), Neumeyer (2009), Müller et al. (2012), Colling and Van Keilegom (2016), and Zhang et al. (2018), among others). Dette et al. (2007)
consider tests for a parametric form of the variance function in a heteroscedastic nonparametric regression by comparing the empirical distribution function of standardized residuals calculated under a null model to that of an alternative model.Neumeyer and Keilegom (2010) work with a similar approach as the previous authors to propose tests for verifying convenient forms of the regression function. Khmaladze and Koul (2009)
introduce a popular distribution free approach to addressing goodness-of-fit problems for the errors from a nonparametric regression, where these authors introduce a transformation of the empirical distribution function of residuals that is useful for forming test statistics with convenient limit distributions. All of these approaches to validating model assumptions crucially rely on a technical asymptotic linearity property of the residual-based empirical distribution function. We show the estimator (1.3) shares this property as well, and the results of this article can be used immediately in approaches to validating model assumptions in the inverse regression model (1.1) that are in the same spirit as the previously mentioned works.
2. Estimation in the indirect regression model
In this section we give more details regarding the Radon transform model (1.1) and introduce an estimator of the function .
2.1. The Radon transform
Following Johnstone and Silverman (1990) let
denote the unit disc, which is the two dimensional domain of the investigated attenuation profile and is called brain space
for historical reasons. It is equipped with the uniform distribution, given in polar coordinates by
This means that no prior emphasis on any region of the scanned area is given. The detector space is defined as
with corresponding probability measure
The domain of the transformed image is , a parametrization of all lines (X-ray paths) crossing the unit disc. It is usally referred to as detector space. is a probability measure on adapted to the length of the line segments inside the disc. For analytic simplicity we allow the angles in and to be exactly and . This is possible since the below required smoothness of and entail periodicity with respect to the angular coordinates.
The Radon transform in (1.2) defines a linear operator from to . Identifying corresponding equivalence classes it can be shown that
is one-to-one, compact and permits a singular value decomposition (SVD). The SVD ofis vital for our subsequent investigations. To state it efficiently we introduce some definitions borrowed from Johnstone and Silverman (1990) and Born and Wolf (1970). Let
be and index set and define for the function
is the so called radial polynomial. Finally for we define
where denotes the ths Chebyshev polynomial of the second kind. For convenience of notation we also define and for . Both collections of functions,
form orthonormal bases of the spaces and respectively. With these notations the SVD of for some is given by
In the literature the functions are commonly referred to as Zernike polynomials, which play an important role in the analysis of optical systems, for instance in the modelling of refraction errors, c.f. Zernike (1934) and more recently Lakshminarayanan and Fleck (2011). We refer to Deans (1983) for more details on the cited SVD of the normalized Radon transform. Due to injectivity of the operator we can immediately access its inverse pointwise defined for some , as
The identities (2.7), (2.8) as well as -expansions in the respective spaces apply a priori almost everywhere. However if is sufficiently smooth they even hold uniformly. In order to specify the required regularity we define
the smoothness class. We assume throughout this paper that the regression function in model (1.1) is an element of (for some ). Controlling smoothness and thereby the complexity of the class of regression functions by related conditions is common in inverse problems. This is owed to their natural correspondence to singular value decompositions of operators and their suitability to prove minimax optimal rates (see for example Mair and Ruymgaart (1996), Cavalier and Tsybakov (2002), Bissantz and Holzmann (2013) or Blanchard and Mücke (2018)).
Suppose that with , then the following four identities hold everywhere:
Moreover the functions and are times continuously differentiable.
The equality of and its -expansion is vital when proving uniform bounds on the distance between and . In one dimensional convolution type problems this is usually dealt with by the Dirichlet conditions that directly apply to classes of smooth functions (see Nawab et al. (1996) pp. 197-198). It should also be noted that the series condition on the function in (2.9) implies regularity properties beyond mere smoothness. For instance, if it also entails periodicity of and its continuous derivatives in the angular component up to the order . This property follows by periodicity of the basis functions in the angle and is an analogue to periodicity of convergent Fourier series on bounded intervals. Notice that it fits naturally to the scanning regime, since any function transformed from Cartesian into spherical coordinates will comply to periodicity with respect to the angle.
As common in computed tomography we will assume a parallel scanning procedure, corresponding to a grid of design points on the detector space. Adopting our results to fan beam geometry, which underlies most modern scanners, is then mathematically simple.
We thus define a grid on the detector space , where for given each of the constituting rectangles has side length in -direction and in -direction. More formally, we define an index set
and decompose the detector space in rectangular boxes of the form
where . The design points are then defined as follows. The second coordinate of is given by
and the first coordinate is determined as the solution of the equation
Throughout this paper we consider the inverse regression model (1.1) with these design points. The non-uniform design in radial direction defined by (2.14) is motivated by a midpoint rule to numerically integrate over each box, with respect to the measure in (2.4). For asymptotic considerations, we assume that and that depends on as follows:
There exist constants , , such that for all .
Denoting the number of rows and columns in the grid of design points by and respectively is common in the literature and numerical programming. Notice that our Assumption 2.2 leaves room for the resolution optimal choice (see Natterer and Wübbelling (2001), p. 74). Sometimes we will use the notation , actually meaning that according to Assumption 2.2 and thereby and diverge. Note also that the index set depends on the sample size in model (1.1). Thus formally we consider a triangular array of independent, identically distributed and centred random variables , but we do not reflect this dependence on in our notation.
2.3. The spectral cutoff estimator
is an estimator of the inner product
and denotes the Lebesgue measure of the cell . Comparing (2.13) to our estimator in (2.15), we observe that the inner products have been replaced by the estimates (2.16). Furthermore the series has been truncated at , which represents the application of a regularized inverse. In the literature it is common to refer to either or as bandwidth
, since it is used to balance between bias and variance like the bandwidth in kernel density estimation (seeCavalier (2008)).
The choice of a bandwidth is a non-trivial problem. An optimal bandwidth with respect to some criterion such as the integrated mean squared error will depend on the unknown regression function . Several data driven selection criteria for the choice of have been proposed and examined in the literature. We refer to the monograph of Vogel (2002), where multiple techniques are gathered. More closely related to our case is the risk hull method by Cavalier and Golubev (2006)
in the white noise model and the smooth bootstrap examined byBissantz et al. (2018) in a different context.
It should be noticed that in practice a smooth dampening of high frequencies usually shows a better performance than the strict spectral cutoff. We can accommodate this by introducing a smooth version of the estimator in (2.15). For this purpose let denote a function with compact support and define
as an alternative estimator of . Note that the estimate in (2.15) is obtained for . All results presented in this paper remain valid for the estimator (2.18). However, for sake of brevity and a transparent presentation the subsequent discussion is restricted to the spectral cutoff estimator in (2.15).
3. The empirical process of residuals
In this section we investigate the asymptotic properties of the empirical residual process
where denotes the residual distribution function and
the th residual obtained from the estimate . The weights are defined in Section 2.3. We begin by showing a uniform convergence result for . For this purpose we derive uniform approximation rates for bias and variance and subsequently balance these two, to get optimal results. The proofs of the following results are complicated and therefore deferred to the Appendix.
Next we derive a uniform bound for the random error of the estimator . (1.1).
Balancing the two upper bounds for the deterministic and random part yields an optimal choice of the bandwidth. More precisely for the choice
By the same techniques uniform bounds can be deduced for the derivatives of our estimators.
Let the assumptions of Theorem 3.3 hold, let be of order and suppose for some . Additionally let , such that . Then
In order to prove the weak convergence of the process we consider the bracketing metric entropy of the subclass
entails that a function is smooth to a degree determined by . This implies that a finite-dimensional representation can be used as an adequate approximation of , in our case a truncated -expansion. We employ these considerations to derive the following result about the complexity of the class , which is of independent interest and is proven in Appendix B (see section B.4).
Let , then for any and sufficiently small
denotes the minimal number of -brackets with respect to needed to cover the smoothness class .
For the next step recall the definition of the estimated residuals in (3.1), as well as the estimate for the residual distribution function in (1.3). In order to prove a uniform CLT for we disentangle the dependencies of the terms in in the next result.
Under the assumptions of Theorem 3.6, the process
converges weakly to a mean zero Gaussian process with covariance function
Acknowledgements This work has been supported in part by the Collaborative Research Center “Statistical modeling of nonlinear dynamic processes” (SFB 823, Project C1) of the German Research Foundation (DFG) and the Bundesministerium für Bildung und Forschung through the project “MED4D: Dynamic medical imaging: Modeling and analysis of medical data for improved diagnosis, supervision and drug development”.
Appendix A Proofs and technical details
Throughout our calculations will denote a positive constant, which may differ from line to line. The dependence of on other parameters will be highlighted in the specific context.
a.1. Proof of Lemma 3.1
We are now in a position to derive the decay rate of the bias postulated in Lemma 3.1. The decay rate naturally splits up into two parts. One accounts for the average approximation error of Radon coefficients with index smaller than and the other for the error due to frequency limitation of the estimator.
where the terms and are given by
For the term it follows that
where we have used that Proposition B.2 in Appendix B implies the estimate
in the first and the approximation result from Lemma A.1 in the second inequality. Similarly we have
In the last step we have used that complies to the smoothness condition of (see (2.9)) and thus the series converges. ∎
a.2. Proof of Lemma 3.2
We proceed deriving an upper bound for the maximum. For this purpose we introduce a truncation parameter and define the truncated error
We will now show that all of the errors with eventually equal their truncated versions almost surely. Via Markov’s inequality we conclude that
and therewith it follows that
Recalling that and that there exists some such that by Assumption 2.2, we derive
Summability is entailed by . The Borel-Cantelli Lemma implies that almost surely eventually all measurement errors and their truncated versions are equal. Thus we can confine ourselves to the maximum
where and are defined by
Using the inequality
which is a consequence of Proposition B.2, it follows that
wherewe exploit the decay rate in the last estimate . For the proof of this fact we recall the notation (A.3) and note that the condition implies
For the term we note that for a fixed constant
Due to truncation is bounded by and its variance by . Furthermore the weights are uniformly of order , since
Consequently the Bernstein inequality yields for the right side of (A.2) the upper bound
which is summable for sufficiently large . The Borel-Cantelli Lemma therefore implies that