# The empirical process of residuals from an inverse regression

In this paper we investigate an indirect regression model characterized by the Radon transformation. This model is useful for recovery of medical images obtained by computed tomography scans. The indirect regression function is estimated using a series estimator motivated by a spectral cut-off technique. Further, we investigate the empirical process of residuals from this regression, and show that it satsifies a functional central limit theorem.

## Authors

• 3 publications
• 2 publications
• 3 publications
• 33 publications
• ### Goodness-of-fit testing the error distribution in multivariate indirect regression

We propose a goodness-of-fit test for the distribution of errors from a ...
12/06/2018 ∙ by Justin Chown, et al. ∙ 0

• ### On indirect assessment of heart rate in video

Problem of indirect assessment of heart rate in video is addressed. Seve...
04/27/2020 ∙ by Mikhail Kopeliovich, et al. ∙ 0

• ### On the L_p-error of the Grenander-type estimator in the Cox model

We consider the Cox regression model and study the asymptotic global beh...
07/16/2019 ∙ by Cecile Durot, et al. ∙ 0

• ### On the Whittle estimator for linear random noise spectral density parameter in continuous-time nonlinear regression models

A continuous-time nonlinear regression model with Lévy-driven linear noi...
09/23/2019 ∙ by A. V. Ivanov, et al. ∙ 0

• ### Emergent limits of an indirect measurement from phase transitions of inference

Measurements are inseparable from inference, where the estimation of sig...
01/05/2020 ∙ by Satoru Tokuda, et al. ∙ 0

• ### A LQD-RKHS-based distribution-to-distribution regression method and its application to restore distributions of missing SHM data

Data loss is a critical problem in structural health monitoring (SHM). P...
11/21/2018 ∙ by Zhicheng Chen, et al. ∙ 0

• ### A Study on Modeling of Inputting Electrical Power of Ultra High Power Electric Furnace by using Fuzzy Rule and Regression Model

: In this paper a method to make inputting electrical model upon factors...
11/21/2017 ∙ by Choe Un-Chol, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

Computed tomography (CT) is a noninvasive imaging technique, which is a key method for medical diagnoses. CT is based on measuring the intensity losses of X-rays sent through a body. From these measurements an attenuation profile can be recovered that provides an image of the body’s (unobservable) interior. The X-rays are linear and so the scanner rotates to create a two-dimensional slice. Insight into three-dimensional structures is obtained by considering multiple slices. Our investigation is limited to a statistical analysis of data gathered from a single slice. For this purpose we introduce the inverse regression model

 (1.1) Yk=Rg(zk)+εk,k∈K,

where

are independent and identically distributed random variables with

. Here is a given index set, with each index corresponding to an X-ray path and the design point characterizing this path with associated response . Consequently, can be written using coordinates as the distance from the origin and as the angle of inclination. The body’s (true) attenuation profile along the slice is represented by , a function supported on the unit disc. is a linear operator acting on and denotes the normalized Radon transform, i.e. for and ,

 (1.2) Rg(s,ϕ):=12(1−s2)−12√1−s2∫−√1−s2g(scos(ϕ)−tsin(ϕ),ssin(ϕ)+tcos(ϕ))dt.

Details on the underlying physics and applications of CT can be found in Buzug (2008).

Image reconstruction in CT is a particular case of the broad class of linear inverse problems. An overview of the mathematical aspects of these problems and methods to solving them can be found in the monographs of Natterer (1986), Engl et al. (1996) and Helgason (2011). Other examples of linear inverse problems are the heat equation and convolution transforms (see Mair and Ruymgaart (1996), Saitoh (1997), and Cavalier (2008), among others). Additional statistical inverse problems include errors-in-variables models and the Berkson error model (see, for example, Bissantz et al. (2007), Carroll et al. (2007), Koul and Song (2008, 2009), Bertero et al. (2009), Kaipio and Somersalo (2010), Delaigle et al. (2014), and Kato and Sasaki (2017)). The Radon transform is usually discussed in the contexts of positron emission tomograpy (PET) and CT in medical imaging. In the case of PET, lines-of-sight are observed along which emissions have occurred. However, the positions of the emissions on these lines are unknown. Here the aim is to reconstruct the emission density (see Johnstone and Silverman (1990), Korostelev and Tsybakov (1993), and Cavalier (2000), among others). On the other hand, CT leads to the inverse regression (1.1) (see, for example, Cavalier (1999) and Kerkyacharian et al. (2010); Kerkyacharian et al. (2012)).

We contribute to this discussion by deriving the rate of uniform, strong consistency for a nonparametric estimator of the unknown function based on the popular spectral cutoff method. Further, we derive a functional central limit theorem for the empirical process of the resulting model residuals , i.e. we investigate the estimator

 (1.3) ^Fn(t)=∑k∈Kwk1{^εk≤t},t∈R,

where the nonnegative weights sum to (see Section 3). Statistical applications of results of this type include validation of model assumptions. In the context of inverse regression models, to the best of our knowledge only one result is available: Bissantz et al. (2018), who study an inverse regression model characterized by a convolution transformation.

In direct regression problems, residual-based empirical processes arising from non- and semiparametric regression estimators have been considered by numerous authors (see Akritas and van Keilegom (2001), Neumeyer (2009), Müller et al. (2012), Colling and Van Keilegom (2016), and Zhang et al. (2018), among others). Dette et al. (2007)

consider tests for a parametric form of the variance function in a heteroscedastic nonparametric regression by comparing the empirical distribution function of standardized residuals calculated under a null model to that of an alternative model.

Neumeyer and Keilegom (2010) work with a similar approach as the previous authors to propose tests for verifying convenient forms of the regression function. Khmaladze and Koul (2009)

introduce a popular distribution free approach to addressing goodness-of-fit problems for the errors from a nonparametric regression, where these authors introduce a transformation of the empirical distribution function of residuals that is useful for forming test statistics with convenient limit distributions. All of these approaches to validating model assumptions crucially rely on a technical asymptotic linearity property of the residual-based empirical distribution function. We show the estimator (

1.3) shares this property as well, and the results of this article can be used immediately in approaches to validating model assumptions in the inverse regression model (1.1) that are in the same spirit as the previously mentioned works.

We have organized the remaining parts of the paper as follows. Model (1.1) is further discussed and we introduce the estimator in Section 2. Our main results are given in Section 3. All of the proofs of our results and additional supporting technical details may be found in the appendices.

## 2. Estimation in the indirect regression model

In this section we give more details regarding the Radon transform model (1.1) and introduce an estimator of the function .

Following Johnstone and Silverman (1990) let

 (2.1) B:={(r,θ):0≤r≤1, 0≤θ≤2π}

denote the unit disc, which is the two dimensional domain of the investigated attenuation profile and is called brain space

for historical reasons. It is equipped with the uniform distribution, given in polar coordinates by

 (2.2) dμ(r,θ):=π−1r dr dθ.

This means that no prior emphasis on any region of the scanned area is given. The detector space is defined as

 (2.3) D:={(s,ϕ):0≤s≤1, 0≤ϕ≤2π}

with corresponding probability measure

 (2.4) dλ(s,ϕ):=2π−2√1−s2 ds dϕ.

The domain of the transformed image is , a parametrization of all lines (X-ray paths) crossing the unit disc. It is usally referred to as detector space. is a probability measure on adapted to the length of the line segments inside the disc. For analytic simplicity we allow the angles in and to be exactly and . This is possible since the below required smoothness of and entail periodicity with respect to the angular coordinates.

The Radon transform in (1.2) defines a linear operator from to . Identifying corresponding equivalence classes it can be shown that

is one-to-one, compact and permits a singular value decomposition (SVD). The SVD of

is vital for our subsequent investigations. To state it efficiently we introduce some definitions borrowed from Johnstone and Silverman (1990) and Born and Wolf (1970). Let

 N:={(l,m):m∈N0,l=m,m−2,...,−m}.

be and index set and define for the function

 (2.5) φ(l,m)(r,θ):=√m+1 R|l|m(r) exp(ilθ),

where

 R|l|m(r):=12(m−|l|)∑j=0(−1)j(m−j)!j!(m+|l|2−j)!(m−|l|2−j)!rm−2j

is the so called radial polynomial. Finally for we define

 (2.6) ψ(l,m)(s,ϕ):=Um(s)exp(ilϕ),

where denotes the ths Chebyshev polynomial of the second kind. For convenience of notation we also define and for . Both collections of functions,

 {φ(l,m):(l,m)∈N}  and  {ψ(l,m):(l,m)∈N}

form orthonormal bases of the spaces and respectively. With these notations the SVD of for some is given by

 (2.7) Rg(s,ϕ) = ∞∑m=0m∑l=−m1√m+1 ψ(l,m)(s,ϕ)⟨g,φ(l,m)⟩L2(B,μ).

In the literature the functions are commonly referred to as Zernike polynomials, which play an important role in the analysis of optical systems, for instance in the modelling of refraction errors, c.f. Zernike (1934) and more recently Lakshminarayanan and Fleck (2011). We refer to Deans (1983) for more details on the cited SVD of the normalized Radon transform. Due to injectivity of the operator we can immediately access its inverse pointwise defined for some , as

 (2.8) g=R−1[Rg](r,θ) = ∞∑m=0m∑l=−m√m+1 φ(l,m)(r,θ)⟨Rg,ψ(l,m)⟩L2(D,λ).

The identities (2.7), (2.8) as well as -expansions in the respective spaces apply a priori almost everywhere. However if is sufficiently smooth they even hold uniformly. In order to specify the required regularity we define

 (2.9) O(v):={g∈L2(B,μ)∣∣g  continuous,∞∑m=0m∑l=−m∣∣⟨Rg,ψ(l,m)⟩L2(D,λ)∣∣(m+1)v<∞,},

the smoothness class. We assume throughout this paper that the regression function in model (1.1) is an element of (for some ). Controlling smoothness and thereby the complexity of the class of regression functions by related conditions is common in inverse problems. This is owed to their natural correspondence to singular value decompositions of operators and their suitability to prove minimax optimal rates (see for example Mair and Ruymgaart (1996), Cavalier and Tsybakov (2002), Bissantz and Holzmann (2013) or Blanchard and Mücke (2018)).

###### Proposition 2.1.

Suppose that with , then the following four identities hold everywhere:

 (2.10) g = ∞∑m=0m∑l=−mφ(l,m)⟨g,φ(l,m)⟩L2(B,μ) (2.11) Rg = ∞∑m=0m∑l=−mψ(l,m)⟨Rg,ψ(l,m)⟩L2(D,λ) (2.12) Rg = ∞∑m=0m∑l=−m1√m+1 ψ(l,m)⟨g,φ(l,m)⟩L2(B,μ). (2.13) g = R−1[Rg]=∞∑m=0m∑l=−m√m+1 φ(l,m)⟨Rg,ψ(l,m)⟩L2(D,λ).

Moreover the functions and are times continuously differentiable.

The equality of and its -expansion is vital when proving uniform bounds on the distance between and . In one dimensional convolution type problems this is usually dealt with by the Dirichlet conditions that directly apply to classes of smooth functions (see Nawab et al. (1996) pp. 197-198). It should also be noted that the series condition on the function in (2.9) implies regularity properties beyond mere smoothness. For instance, if it also entails periodicity of and its continuous derivatives in the angular component up to the order . This property follows by periodicity of the basis functions in the angle and is an analogue to periodicity of convergent Fourier series on bounded intervals. Notice that it fits naturally to the scanning regime, since any function transformed from Cartesian into spherical coordinates will comply to periodicity with respect to the angle.

### 2.2. Design

As common in computed tomography we will assume a parallel scanning procedure, corresponding to a grid of design points on the detector space. Adopting our results to fan beam geometry, which underlies most modern scanners, is then mathematically simple.

We thus define a grid on the detector space , where for given each of the constituting rectangles has side length in -direction and in -direction. More formally, we define an index set

 K:={(k1,k2):0≤k1≤q−1,0≤k2≤p−1}

and decompose the detector space in rectangular boxes of the form

where . The design points are then defined as follows. The second coordinate of is given by

 z2k2:=2πk2+12p

and the first coordinate is determined as the solution of the equation

 (2.14) ∫(k1+1)/qk1/q(s−z1k1)√1−s2ds=0.

Throughout this paper we consider the inverse regression model (1.1) with these design points. The non-uniform design in radial direction defined by (2.14) is motivated by a midpoint rule to numerically integrate over each box, with respect to the measure in (2.4). For asymptotic considerations, we assume that and that depends on as follows:

###### Assumption 2.2.

There exist constants , , such that for all .

Denoting the number of rows and columns in the grid of design points by and respectively is common in the literature and numerical programming. Notice that our Assumption 2.2 leaves room for the resolution optimal choice (see Natterer and Wübbelling (2001), p. 74). Sometimes we will use the notation , actually meaning that according to Assumption 2.2 and thereby and diverge. Note also that the index set depends on the sample size in model (1.1). Thus formally we consider a triangular array of independent, identically distributed and centred random variables , but we do not reflect this dependence on in our notation.

### 2.3. The spectral cutoff estimator

Motivated by the representation (2.13) we now define the cutoff estimator for the function in model (1.1) by

 (2.15) ^g(r,θ)=tn∑m=0m∑l=−m√m+1 φ(l,m)(r,θ) ^R(l,m).

Here

 (2.16) ^R(l,m):=∑k∈Kwk¯¯¯¯¯¯¯¯¯¯¯¯ψ(l,m)(zk) Yk

is an estimator of the inner product

 (2.17) R(l,m):=⟨Rg,ψ(l,m)⟩L2(D,λ)

and denotes the Lebesgue measure of the cell . Comparing (2.13) to our estimator in (2.15), we observe that the inner products have been replaced by the estimates (2.16). Furthermore the series has been truncated at , which represents the application of a regularized inverse. In the literature it is common to refer to either or as bandwidth

, since it is used to balance between bias and variance like the bandwidth in kernel density estimation (see

Cavalier (2008)).

The choice of a bandwidth is a non-trivial problem. An optimal bandwidth with respect to some criterion such as the integrated mean squared error will depend on the unknown regression function . Several data driven selection criteria for the choice of have been proposed and examined in the literature. We refer to the monograph of Vogel (2002), where multiple techniques are gathered. More closely related to our case is the risk hull method by Cavalier and Golubev (2006)

in the white noise model and the smooth bootstrap examined by

Bissantz et al. (2018) in a different context.

###### Remark 1.

It should be noticed that in practice a smooth dampening of high frequencies usually shows a better performance than the strict spectral cutoff. We can accommodate this by introducing a smooth version of the estimator in (2.15). For this purpose let denote a function with compact support and define

 (2.18) ^gΛ(r,θ)=∞∑m=0 Λ(mt−1n)m∑l=−m√m+1φ(l,m)(r,θ) ^R(l,m),

as an alternative estimator of . Note that the estimate in (2.15) is obtained for . All results presented in this paper remain valid for the estimator (2.18). However, for sake of brevity and a transparent presentation the subsequent discussion is restricted to the spectral cutoff estimator in (2.15).

## 3. The empirical process of residuals

In this section we investigate the asymptotic properties of the empirical residual process

 √n(^Fn(t)−F(t)):=√n∑k∈Kwk(1{^εk≤t}−F(t)), t∈R,

where denotes the residual distribution function and

 (3.1) ^εk:=Yk−R^g(zk), k∈K

the th residual obtained from the estimate . The weights are defined in Section 2.3. We begin by showing a uniform convergence result for . For this purpose we derive uniform approximation rates for bias and variance and subsequently balance these two, to get optimal results. The proofs of the following results are complicated and therefore deferred to the Appendix.

###### Lemma 3.1.

Suppose that Assumption 2.2 holds and that for some . Then the estimator in (2.15) satisfies

 ∥∥E^g(z)−g(z)∥∥∞=O(t−(v−1)n+t8nn−1),

where .

Next we derive a uniform bound for the random error of the estimator . (1.1).

###### Lemma 3.2.

Suppose that Assumption 2.2 holds and that for some . Additionally let the sequence satisfy . Then the estimator in (2.15) satisfies

 ∥∥^g(z)−E^g(z)∥∥∞=O(t4nlog(n)1/2n−1/2)   a.s.

Balancing the two upper bounds for the deterministic and random part yields an optimal choice of the bandwidth. More precisely for the choice

 (3.2) tn:=Θ((log(n)−1n)12(v+3))

balances the upper bound from Lemma 3.2 with the leading term of the bias from Lemma 3.1. Combining these results yields the first part of the following theorem.

###### Theorem 3.3.

Let Assumption 2.2 hold, suppose that for some and that for some . Additionally let be chosen as in (3.2). Then

 (3.3) ∥g(z)−^g(z)∥∞=O((log(n)n−1)v−12(v+3))

and for all

 (3.4) ∞∑m=0m∑l=−mmτ∣∣⟨R[g−^g],ψ(l,m)⟩L2(D,λ)∣∣=O(nv−τ2(v+3))   a.s.

By the same techniques uniform bounds can be deduced for the derivatives of our estimators.

###### Corollary 3.4.

Let the assumptions of Theorem 3.3 hold, let be of order and suppose for some . Additionally let , such that . Then

 (3.5) ∥∥∥∂α∂rα∂β∂θβg−∂α∂rα∂β∂θβ^g∥∥∥∞=O((log(n)1/2n−1/2t2k+4n+tv−(2k+1)n))   a.s.

In order to prove the weak convergence of the process we consider the bracketing metric entropy of the subclass

 (3.6) O(τ,1,1):={g:B→R∣∣g  continuous,∥g∥∞≤1,∞∑m=0m∑l=−m|R(l,m)|(m+1)τ≤1,},

for some . Theorem 3.3 implies that for all the difference eventually lies in . As we know from Proposition 2.1 the condition

 ∞∑m=0m∑l=−m(m+1)τ∣∣⟨Rh,ψ(l,m)⟩L2(D,λ)∣∣≤1

entails that a function is smooth to a degree determined by . This implies that a finite-dimensional representation can be used as an adequate approximation of , in our case a truncated -expansion. We employ these considerations to derive the following result about the complexity of the class , which is of independent interest and is proven in Appendix B (see section B.4).

###### Proposition 3.5.

Let , then for any and sufficiently small

 (3.7) log(N[](ϵ,O(τ,1,1),∥⋅∥∞))≤C(1ϵ)2τ−t.

denotes the minimal number of -brackets with respect to needed to cover the smoothness class .

For the next step recall the definition of the estimated residuals in (3.1), as well as the estimate for the residual distribution function in (1.3). In order to prove a uniform CLT for we disentangle the dependencies of the terms in in the next result.

###### Theorem 3.6.

Assume that for some , for some , that admits a Hölder continuous density with exponent and that Assumption 2.2 holds. If the bandwidth satisfies (3.2), then

 (3.8) supt∈R∣∣ ∣∣∑k∈Kwk[1{^εk≤t}−1{εk≤t}−εkfε(t)]∣∣ ∣∣=oP(n−1/2).
###### Corollary 3.7.

Under the assumptions of Theorem 3.6, the process

converges weakly to a mean zero Gaussian process with covariance function

 Σ(t,~t) := + fε(~t)E[ε1{ε≤t}]+σ2fε(t)fε(~t)),   t,~t∈R.

Acknowledgements This work has been supported in part by the Collaborative Research Center “Statistical modeling of nonlinear dynamic processes” (SFB 823, Project C1) of the German Research Foundation (DFG) and the Bundesministerium für Bildung und Forschung through the project “MED4D: Dynamic medical imaging: Modeling and analysis of medical data for improved diagnosis, supervision and drug development”.

## Appendix A Proofs and technical details

Throughout our calculations will denote a positive constant, which may differ from line to line. The dependence of on other parameters will be highlighted in the specific context.

### a.1. Proof of Lemma 3.1

We begin with an auxiliary result which provides an approximation rate for Lemma 3.1 in expectation of for and is proven in Appendix B (see section B.3).

###### Proposition A.1.

Suppose that for and that Assumption 2.2 holds. Then for all it follows that

 (A.1) |E^R(l,m)−R(l,m)|≤Cm5n−1

where is some constant depending on and (the constant from Assumption 2.2).

We are now in a position to derive the decay rate of the bias postulated in Lemma 3.1. The decay rate naturally splits up into two parts. One accounts for the average approximation error of Radon coefficients with index smaller than and the other for the error due to frequency limitation of the estimator.

The singular value decomposition of the normalized Radon transform in (2.12) and the definition of our estimator (in (2.15)) yield

 ∥E^g−g∥∞=∥∥ ∥∥tn∑m=0m∑l=−m√m+1φ(l,m)(E^R(l,m)−R(l,m))∥∥ ∥∥∞≤A1+A2,

where the terms and are given by

 A1:=tn∑m=0m∑l=−m√m+1 ∥∥φ(l,m)∥∥∞ ∣∣E^R(l,m)−R(l,m)∣∣ A2:=∑m>tnm∑l=−m√m+1 ∥φ(l,m)∥∞ |R(l,m)|.

For the term it follows that

 A1 ≤

where we have used that Proposition B.2 in Appendix B implies the estimate

 (A.2) ∥φ(l,m)∥∞≤√m+1

in the first and the approximation result from Lemma A.1 in the second inequality. Similarly we have

 A2 ≤ ∑m>tnm∑l=−m(m+1)|R(l,m)|≤∑m>tnm∑l=−m(m+1)vt1−vn|R(l,m)| ≤ t1−vn∞∑m=0m∑l=−m(m+1)v|R(l,m)|=O(t1−vn).

In the last step we have used that complies to the smoothness condition of (see (2.9)) and thus the series converges. ∎

### a.2. Proof of Lemma 3.2

We first rewrite employing (2.16) and (A.2)

 ∥∥^g(z)−E^g(z)∥∥∞ = ∥∥tn∑m=0m∑l=−m√m+1φ(l,m)(^R(l,m)−E^R(l,m))∥∥∞ = ∥∥tn∑m=0m∑l=−m√m+1φ(l,m)(∑k∈K¯¯¯¯¯¯¯¯¯¯¯¯ψ(l,m)(zk)wkεk)∥∥∞ ≤ tn∑m=0m∑l=−m(m+1) ∣∣∑k∈K¯¯¯¯¯¯¯¯¯¯¯¯ψ(l,m)(zk)wkεk∣∣ = ≤ tn∑m=02(m+1)3max(l,m)∈Nm≤tn∣∣∑k∈K¯¯¯¯¯¯¯¯¯¯¯¯ψ(l,m)(zk) (m+1)−1wkεk∣∣ ≤ Ct4nmax(l,m)∈Nm≤tn∣∣∑k∈K¯¯¯¯¯¯¯¯¯¯¯¯ψ(l,m)(zk) (m+1)−1wkεk∣∣

We proceed deriving an upper bound for the maximum. For this purpose we introduce a truncation parameter and define the truncated error

 (A.3)

We will now show that all of the errors with eventually equal their truncated versions almost surely. Via Markov’s inequality we conclude that

 P(|εk|>dn)≤E[|ε|κ]d−κn

and therewith it follows that

Recalling that and that there exists some such that by Assumption 2.2, we derive

 C∑n=pqnd−κn=C∑n=pqn1−κ/2log(n)κ/2≤C∑q≥1q2−κlog(C2q2)κ/2<∞.

Summability is entailed by . The Borel-Cantelli Lemma implies that almost surely eventually all measurement errors and their truncated versions are equal. Thus we can confine ourselves to the maximum

 (A.4) max(l,m)∈Nm≤tn∣∣∑k∈K¯¯¯¯¯¯¯¯¯¯¯¯ψ(l,m)(zk) (m+1)−1wkεdnk∣∣≤B1+B2,

where and are defined by

 B1:=max(l,m)∈Nm≤tn∣∣∑k∈K¯¯¯¯¯¯¯¯¯¯¯¯ψ(l,m)(zk) (m+1)−1wk[εdnk−Eεdnk]∣∣ B2:=∑k∈K∣∣¯¯¯¯¯¯¯¯¯¯¯¯ψ(l,m)(zk)∣∣ (m+1)−1wk∣∣Eεdnk∣∣.

Using the inequality

 (A.5) ∥ψ(l,m)∥∞≤m+1,

which is a consequence of Proposition B.2, it follows that

 B2≤∣∣Eεdn∣∣∑k∈Kwk=O(n−1/2),

wherewe exploit the decay rate in the last estimate . For the proof of this fact we recall the notation (A.3) and note that the condition implies

For the term we note that for a fixed constant

 P(|B1|>log(n)1/2n−1/2C⋆) ≤ t2nmax(l,m)∈Nm≤tnP(∣∣ ∣∣∑k∈K¯¯¯¯¯¯¯¯¯¯¯¯ψ(l,m)(zk) (m+1)−1wk[εdnk−Eεdnk]∣∣ ∣∣>log(n)1/2n−1/2C⋆).

Due to truncation is bounded by and its variance by . Furthermore the weights are uniformly of order , since

 maxk∈Kwk=2π−2maxk∈K∫2π(k2+1)/q2πk2/q∫(k1+1)/pk1/p√1−s2ds dϕ≤4(πpq)−1=4(πn)−1.

Consequently the Bernstein inequality yields for the right side of (A.2) the upper bound

 t2nexp(−CC⋆log(n)/nn−1+log(n)12dn/n32)≤t2nexp(−CC⋆log(n))≤t2nn−CC⋆,

which is summable for sufficiently large . The Borel-Cantelli Lemma therefore implies that

 max(l,m)∈Nm≤tn∣∣∑k∈K¯¯¯¯¯¯¯¯¯¯¯¯ψ(l,m)(zk) (m+1)−1wk[εdnk−Eεdnk]∣∣=O(log(n)1/2n−1/2)   a.s.

Combining these estimates we see, that the left side of (A.4) is almost surely of order
. Consequently the right side of (A.2) is of order almost surely, which proves the assertion. ∎

### a.3. Proof of Theorem 3.3

Combining Lemma 3.1 and Lemma 3.2 yields the first part of Theorem 3.3, when the truncation parameter is chosen as in (3.2). For the proof of the second property we note the identity

 ⟨R[g−^g],ψ(l,m)⟩L2(D,λ)=R(l,m)−^R(l,m)1{m≤tn},

which gives for the left hand side of (3.4)

 ∞∑m=0m∑l=−mmτ∣∣⟨R[g−^g],ψ(l,m)⟩L2(D,λ)∣∣=∞∑m=0m∑l=−mmτ∣∣R(l,m)−^R(l,m)1{m≤tn}∣∣ ≤ D1+D2+D3.

The terms , and are defined as follows:

 D1 := tn∑m=0m∑l=−mmτ∣∣R(l,m)−E^R(l,m)∣∣ D2 := tn∑m=0m∑l=−mmτ∣∣^R(l,m)−E^R(l,m)∣∣ D3 := ∑m>tnm∑l=−mmτ|R(l,m)|.

By Proposition A.1 we receive the upper bound

 D1≤tn∑m=0m∑l=−mCmτ+5n−1=O(tτ+7nn−1).

For the second sum on right of (A.3) we use the estimate

 D2=tn∑m=0m∑l=−mmτ(m+1)[(m+1)−1∣∣^R(l,m)−E^R(l,m)∣∣] ≤ Ctτ+3nmax(l,m)∈Nm≤tn{(m+1)−1∣∣^R(l,m)−E^R(l,m)∣∣}=O(n−1/2log(n)1/2tτ+3n)  a.s.

In the last equality we have used the following bound established in the proof of Lemma 3.2:

 max(l,m)∈