 # Wavelet Shrinkage in Nonparametric Regression Models with Positive Noise

Wavelet shrinkage estimators are widely applied in several fields of science for denoising data in wavelet domain by reducing the magnitudes of empirical coefficients. In nonparametric regression problem, most of the shrinkage rules are derived from models composed by an unknown function with additive gaussian noise. Although gaussian noise assumption is reasonable in several real data analysis, mainly for large sample sizes, it is not general. Contaminated data with positive noise can occur in practice and nonparametric regression models with positive noise bring challenges in wavelet shrinkage point of view. This work develops bayesian shrinkage rules to estimate wavelet coefficients from a nonparametric regression framework with additive and strictly positive noise under exponential and lognormal distributions. Computational aspects are discussed and simulation studies to analyse the performances of the proposed shrinkage rules and compare them with standard techniques are done. An application in winning times Boston Marathon dataset is also provided.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Wavelet based methods have been applied in several fields of statistics such as time series modelling, functional data analysis, computational methods and nonparametric regression for example. Their success can be justified by several mathematical and computational reasons. In nonparametric regression, the application of this work, it is possible to expand an unknown squared integrable function in orthogonal wavelet basis, which are composed by dilations and translations of a specified function usually called wavelet function or mother wavelet

. Examples of wavelet functions are Daubechies wavelets, whose are usually indexed by their number of null moments and shown in Figure

1.1

for one (Haar or Daub1), two (Daub2), four (Daub4) and ten (Daub10) null moments. This wavelet representation allows the visualization of the data that are obtained from the unknown function by resolution levels and performs a multiresolution analysis by the application of discrete wavelet transform on them. Further, the wavelet representation of a function is typically sparse, i.e, the coefficients of the expansion are majority equal to zero or very close to zero at smooth regions of the represented function domain. This property is important because, once wavelets are well localized in space and frequency domains, the sparsity representation feature provides the identification of the main properties of the unknown function, such as peaks, discontinuities, maximum and minimum by a few amount of nonzero coefficients. For a review of wavelet methods in statistics, see Vidakovic (1999) and Nason (2008). For a general overview about wavelets and their mathematical properties, see Daubechies (1992) and Mallat (1998). Figure 1.1: Daubechies wavelet functions with N=1 (Haar wavelet), 2 (Daub2), 4 (Daub4) and 10 (Daub10) null moments.

Wavelet coefficients are essentially sparse at smooth locations of the unknown function, but in practice, after the application of the discrete wavelet transformation on the data, one observes contaminated wavelet coefficients with random noise, called empirical wavelet coefficients, which are not sparse due the noise effect. For denoising the empirical coefficients and estimating the wavelet coefficients of the function representation, thresholding and shrinkage methods are usually applied on the empirical coefficients by reducing their magnitudes. There are several nonlinear thresholding and shrinkage methods available in the literature, most of them based in the seminal works of Donoho (1993a,b), Donoho (1995a,b), Donoho and Johnstone (1994a,b) and Donoho and Johnstone (1995), with the proposition of the so called soft and hard thresholding rules. Bayesian shrinkage procedures have also been successfully proposed for denoising empirical wavelet coefficients. These methods allow the incorporation of prior information regarding to the coefficients, such as the their sparsity, support, dispersion and extreme values by means of a prior probabilistic distribution. In this context, the proposed priors are usually composed by a mixture of a high concentrated distribution around zero to assign sparsity and a symmetric distribution around zero. Prior distributions already proposed to the wavelet coefficients include mixtures of normals by Chipman et al. (1997), mixtures of a point mass function at zero and double exponential distribution by Vidakovic and Ruggeri (2001), Bickel prior by Angelini and Vidakovic (2004), double Weibull by Reményi and Vidakovic (2015), Dirichlet-Laplace priors by Bhattacharya el al. (2015) and, recently, logistic and beta priors by Sousa (2020) and Sousa et al. (2020) respectively. For a general overview abour wavelet shrinkage and thresholding techniques, see Jansen (2001).

Although the well succeeded performance of the proposed thresholding and bayesian shrinkage methods for denoising wavelet coefficients, most of them suppose that the data points from the underlying function are contaminated with additive normal random noise. Despite this assumption can occurs in practice and implies in several good estimation properties, it is not general, mainly under small sample sizes, where central limit theorem can not be applied. Little attention is given for wavelet denoising problems in nonparametric regression models under non-normal random noise or, specifically, additive strictly positive random noise. Neumann and von Sachs (1995) discuss normal approximations to the wavelet empirical coefficients for thresholding without the normality supposition of the noises and independent and identically distributed (iid) assumption of them. Leporini and Pesquet (2001) proposed the use of Besov priors on the wavelet coefficients to derive a bayesian thresholding rule under a possible resolution level dependent generalized normal distributed noise in the wavelet domain. Antoniadis et al. (2002) provided explicit bayesian thresholding rules based on Maximum a Posteriori (MAP) estimation procedure under exponential power distribution prior on the wavelet coefficients and supposing exponential power and Cauchy distributions to the noise in the wavelet domain. Averkamp and Houdré (2003) analyzed the ideal denoising in the sense of Donoho and Johnstone (1995) considering some classes of noise, including identically distributed symmetric around zero noises in the wavelet domain. Thresholding under compactly support noises in wavelet domain is also discussed. Thus, the above cited works dealt with non-gaussian noise but, no one of them assumes positive noise in the original model. Further, the noise distributions assumtpions occur directly in the wavelet domain, after the discrete wavelet transform application on the original data.

In this sense, this paper proposes bayesian shrinkage procedures for denoising empirical wavelet coefficients in nonparametric regression models with strictly positive random noise contamination in the original data, assuming additive noises to be independent and identically distributed exponential and lognormal. The adopted priors are the mixture of a point mass function at zero and the logistic prior proposed by Sousa (2020) and beta prior proposed by Sousa et al. (2020), both works under the classical gaussian noise structure. Assuming additive and positive random noise in the original nonparametric model brings several challenges in estimation point of view. First, independent noises property is lost after wavelet transformation, i.e, noises in the wavelet domain are possibly correlated. The consequence of this fact is that the wavelet coefficient estimation can not be performed individually as usually is done under gaussian noise assumption, but jointly by a joint posterior distribution of the wavelet coefficients vector, which requires computational methods, such as Markov Chain Monte Carlo (MCMC) methods to sample from the joint posterior distribution. Further, noises in the wavelet domain are not necessarily positive, but only linear combinations of them. Finally, several statistical models with multiplicative positive noise were proposed and dealt with by logarithmic transformations, but models with additive positive noise are not so common in the literature, although additive positive noise can be observed in a wide variety of real measurements. For example, arrival times of radio or waves measures typically contain positive errors due possibly delays of equipment detection. See Radnosrati et al. (2020) for an interesting study of classical estimation theory of models with additive positive noise and a nice application involving global navigation satellite systems (GNSS) with positive noise arrival times.

Thus, the main novelty of this work is to perform wavelet shrinkage under additive positive noise in the original nonparametric model. To do so, logistic and beta priors are put on the wavelet coefficients. Logistic prior is suitably for coefficients with support in the Real set. Its scale hyperparameter has easy and direct interpretation in terms of shrinkage, as can be seen in Sousa (2020). The beta prior (Sousa et al., 2020) is a good choice for bounded coefficients and its well known shape flexibility brings advantages in modelling. This paper is organized as follows: the considered statistical models are defined in Section 2 and their associated shrinkage rules with computational aspects described in Section 3. Parameters and hyperparameters choices are discussed in Section 4. Simulation studies to obtain the performance of the shrinkage rules and to compare with standard shrinkage/thresholding techniques are analysed in Section 5. A real data application involving winning times of Boston Marathon is done in Section 6. The paper is concluded with final considerations in Section 7.

## 2 Statistical models

We consider , , points from the nonparametric regression model

 yi=f(xi)+ei,i=1,⋯,n (2.1)

where is an unknown function and are independent and identically distributed (iid) random noises such that , . The goal is to estimate without assumptions about its functional structure, i.e, the estimation procedure will take only the data points into account. In this work, we consider random noise with exponential and lognormal distributions, given by

• Exponential distributed noise:

 h(ei;λ)=λexp{−λei}I(0,∞)(ei),λ>0, (2.2)
• Lognormal distributed noise:

 h(ei;σ)=1eiσ√2πexp{−log2(ei)2σ2}I(0,∞)(ei),σ>0, (2.3)

where is the usual indicator function on the set and is the natural logarithm. We suppose both the noise distribution parameters and as known, although a brief discussion for the unknown case is provided in Section 4.

The unknown function can be represented by

 f(x)=∑j,k∈Zθj,kψj,k(x), (2.4)

where is an orthonormal wavelet basis for constructed by dilations and translations of a function called wavelet or mother wavelet and are wavelet coefficients that describe features of at spatial location and scale or resolution level . In this context, the data points can be viewed as an approximation of at the finest resolution level with additive and positive noise contamination. As an example, Figure 2.1 displays a Donoho-Johnstone (D-J) test function called Blocks, that will be defined in Section 5, and data points generated from this function with additive exponential distributed random noises.

The estimation process of is done by estimation of the wavelet coefficients. In vector notation, model (2.1) can be written as

 y=f+e, (2.5)

where , and . A discrete wavelet transform (DWT), which is typically represented by an orthonormal transformation matrix , is applied on both sides of (2.5), obtaining the following model in wavelet domain

 d=θ+ε, (2.6)

where is called empirical coefficients vector, is the wavelet coefficients vector and is the random noise vector. Although is used as DWT representation, fast algorithms are applied to perform DWT in practice, which are more computationally efficient, see Mallat (1998). When is assumed to be iid normal distributed in model (2.1), as occurs in most of the studied nonparametric models in wavelet shrinkage methods research, the distribution of the noise in wavelet domain remains normal, is iid normal with the same scale parameter as in the time domain model noise. This property brings several estimation advantages, once the problem of estimating in this context is equivalent of estimating a location parameter of a normal distribution. Moreover, as the noises in wavelet domain remain independent, -estimation could be done individually. When ’s are positive, most of these advantages are lost. Actually, ’s are correlated and not necessarily positive. Also, their distribution is not the same as their counterparts in time domain. The main impact of these facts is that the estimation of can not be performed individually, but according to a joint posterior distribution of .

The wavelet coefficients vector could be estimated by application of a shrinkage rule on the empirical coefficients vector . This procedure essentially performs denoising on the observed coefficients by reducing their magnitudes in order to estimate the wavelet coefficients. After the estimation , is estimated by the inverse discrete wavelet transform (IDWT), .

In this work, we apply a bayesian shrinkage procedure assuming prior distributions to a single wavelet coefficient (the subindices are dropped by simplicity). The priors have the general structure

 π(θ;α,η)=αδ0(θ)+(1−α)g(θ;η), (2.7)

for , is the point mass function at zero and

is a probability distribution defined according to a hyperparameters vector

. The choice of can be made according to the support of . We consider in this work two quite flexible distributions , the symmetric around zero logistic distribution proposed by Sousa (2020) given by

 g(θ;τ)=exp{−θτ}τ(1+exp{−θτ})2IR(θ),τ>0, (2.8)

and the beta distribution on the interval

proposed by Sousa et al. (2020) given by

 g(θ;a,b,m)=(θ+m)a−1(m−θ)b−1(2m)a+b−1B(a,b)I[−m,m](θ),a,b,m>0, (2.9)

where is the beta function. Sousa (2020) and Sousa et al. (2020) developed shrinkage rules under logistic and beta priors respectively under the standard gaussian noise framework. Figures 2.2 (a) and (b) show logistic and beta densities for several hyperparameters values respectively. The beta densities are considered on interval .

The logistic prior centered at zero is suitable in bayesian wavelet shrinkage for real valued wavelet coefficients, i.e, when . Further, its hyperparameter has an important role in determining the degree of shrinkage to be applied on the empirical coefficients, as described in Sousa (2020). The beta prior offers great flexibility in modelling bounded wavelet coefficients, i.e, when , once it allows symmetric () and asymmetric () distributions around zero. As the logistic prior, its hyperparameters and control the amount of shrinkage of the associated bayesian rule. For , bigger values of imply the increase of the shrinkage level imposed on the shrinkage rule, i.e, the associated rule tends to a severe reduction of the empirical coefficients’ magnitudes. More details about beta priors on wavelet coefficients can be found in Sousa et al. (2020). Thus, logistic and beta priors are convenient choices for in (2.7) for modelling several prior information about the wavelet coefficients to be estimated, such as their support, symmetry and sparsity.

## 3 Shrinkage rules and computational aspects

The general shrinkage rules associated to the models (2.1), (2.2), (2.3), (2.6) and (2.7

) under quadratic loss function are obtained by the posterior expected value, i.e,

. Once it is infeasible to obtain the posterior expected value analitically, we use an adaptive Markov Chain Monte Carlo (MCMC) method to be described later to generate samples from the joint posterior distribution of and estimate a particular wavelet coefficient by the sample mean,

 ^θi=δi(d)≈1LL∑l=1θli, (3.1)

where is the -th element of the generated sample , and .

The posterior sample generation process is performed using the robust adaptive Metropolis (RAM) algorithm proposed by Vihola (2012) and implemented computationally in the adaptMCMC R package by Scheidegger (2021). The algorithm estimates the shape of the target distribution and simultaneously coerces the mean acceptance rate of process. For each iteration of the chain generation, a single shape matrix is adaptively updated. Let be a lower-diagonal matrix with positive diagonal elements, be a sequence decaying to zero, be the target mean acceptance rate and such that , the RAM algorithm works as follows for ,

1. Generate , where and

is the identity matrix of dimension

.

2. Do

with probability

 γk=min(1,π(θ∗k|d)π(θk−1|d)),

or else.

3. Computer the lower diagonal matrix with positive diagonal elements satisfying the equation

 SkStk=Sk−1(I+ηk(γk−γ)UkUtk||Uk||2)Stk−1.

We applied and as suggested by Vihola (2012) along the simulation studies and application to obtain the posterior distributions samples of the wavelet coefficients. The next subsections provide the posterior distributions that are considered as target distributions in RAM algorithm.

### 3.1 Posterior distributions under exponential noise

Considering the model under exponential noise (2.1), (2.2) and the model after DWT application (2.6), it is straightforward to obtain the likelihood function of the empirical coefficients by the application of the Jacobian method to the transformation . The likelihood function is given by

 L(d|θ)=|W|λnexp{−λ∑i∑jwji(dj−θj)}∏iI(0,∞)(∑jwji(dj−θj)). (3.2)

The posterior distribution of can be obtained by the well known relationship

 (3.3)

Thus, applying (3.3) for (3.2) and the logistic prior model (2.7) and (2.8), we have the following posterior distribution to the wavelet coefficients given the empirical ones under logistic prior model and exponential noise on the original data,

 π(θ|d) ∝∏i⎡⎢ ⎢ ⎢⎣αδ0(θi)+(1−α)exp{−θiτ}τ(1+exp{−θiτ})2⎤⎥ ⎥ ⎥⎦×exp{−λ∑i∑jwji(dj−θj)}× ×∏iI(0,∞)(∑jwji(dj−θj)). (3.4)

Analogously, we can have the posterior distribution of under beta prior model and exponential noise on the original data by considering now (2.9) instead of (2.8), given by

 π(θ|d) ∝∏i[αδ0(θi)+(1−α)(θi+m)a−1(m−θi)b−1(2m)a+b−1B(a,b)]×exp{−λ∑i∑jwji(dj−θj)}× ×∏iI[−m,m](θi)×∏iI(0,∞)(∑jwji(dj−θj)). (3.5)

### 3.2 Posterior distributions under lognormal noise

The likelihood function of the empirical coefficients for the model under lognormal noise (2.1), (2.3) and the model after DWT application (2.6) is obtained as described in Subsection 3.1 and given by

 L(d|θ) =|W|(σ√2π)n∏i[∑jwji(dj−θj)]×exp{−12σ2∑ilog2(∑jwji(dj−θj))}× ×∏iI(0,∞)(∑jwji(dj−θj)). (3.6)

Thus, the posterior distribution of under lognormal noise in the original data and logistic prior model (2.7) and (2.8) is obtained by application of (3.3) for the likelihood function (3.2) and given by

 π(θ|d) ∝∏i⎡⎢ ⎢ ⎢⎣αδ0(θi)+(1−α)exp{−θiτ}τ(1+exp{−θiτ})2⎤⎥ ⎥ ⎥⎦×exp{−12σ2∑ilog2(∑jwji(dj−θj))}∏i[∑jwji(dj−θj)]× ×∏iI(0,∞)(∑jwji(dj−θj)), (3.7)

and the posterior distribution of under beta prior model (2.7) and (2.9) is

 π(θ|d) ∝∏i[αδ0(θi)+(1−α)(θi+m)a−1(m−θi)b−1(2m)a+b−1B(a,b)]×exp{−12σ2∑ilog2(∑jwji(dj−θj))}∏i[∑jwji(dj−θj)]× ×∏iI[−m,m](θi)×∏iI(0,∞)(∑jwji(dj−θj)). (3.8)

Therefore, the posterior distributions (3.1) and (3.1) of are the considered target distributions under logistic and beta prior models respectively in RAM algorithm to be sampled and estimate the wavelet coefficients by the shrinkage rule (3.1) for original data contaminated by exponential noise. Similarly, the posterior distributions (3.2) and (3.2) are the target ones under logistic and beta priors respectively for lognormal noise contaminated observations.

## 4 Parameters elicitation

The performance of the bayesian procedure is closely related to a good choice or estimation of the involved parameters and hyperparameters of the models. The proposed shrinkage rules depend on the parameters and of the noise exponential and lognormal distributions respectively, which were considered as known throughout the paper, the weight of the point mass function of the prior models and the hyperparameters and of the logistic and beta priors respectively.

Angelini and Vidakovic (2004) proposed the hyperparameters and be dependent on the resolution level according to the expressions

 α=α(j)=1−1(j−J0+1)r, (4.1)
 m=m(j)=maxk{|djk|}, (4.2)

where , is the primary resolution level, is the number of resolution levels, and . They also suggest that in the absence of additional information, can be adopted.

The choices of the hyperparameters and are discussed respectively by Sousa (2020) and Sousa et al. (2020). In fact, their values have a direct impact on the shrinkage level of the associated rule. Higher denoising level on empirical coefficients requires higuer values of and . Moreover, these hyperparameters can be resolution level dependent, such as and . As default values, can be used. Further discussion about how to choose of a beta prior distribution can also be seen in Chaloner and Duncan (1983) and Duran and Booker (1988).

The noise distribution parameters and of exponential and lognormal respectively, although considered as known, can be be included in the bayesian framework, independently of the wavelet coefficients, by attributing suitable priors to them, such inverse gamma prior for example. In this case, the general prior model (2.7) under exponential noise could be updated by

 π(θ,λ;α,η,ζ)=π(θ;α,η)×π(λ;ζ),

where is the prior distribution of and is its hyperparameter vector. Analogous procedure can be done for the lognormal noise case.

## 5 Simulation studies

The performances of the proposed shrinkage rules were obtained in simulation studies and compared against standard shrinkage/thresholding tecnhiques. The so called Donoho-Johnstone (D-J) test functions (Donoho and Johnstone, 1995) were considered as underlying functions to be estimated, which are composed by four test functions called Bumps, Blocks, Doppler and Heavisine defined on by,

• Bumps

 f(x)=11∑l=1hlK(x−xlwl),

where

;

;

and

.

• Blocks

 f(x)=11∑l=1hlK(x−xl),

where

;

and

.

• Doppler

 f(x)=√x(1−x)sin(2.1πx+0.05).
• Heavisine

 f(x)=4sin(4πx)−sgn(x−0.3)−sgn(0.72−x).

The functions are presented in Figure 5.1. In fact, the D-J functions have important features such as peaks, discontinuities, constant parts and oscillations to be captured by denoising data, representing most of the signals that occur in practice. Figure 5.1: Donoho-Johnstone test functions used as underlying signals in the simulation studies.

For a particular test function, data were generated by adding exponential and lognormal noises to the function points according to two signal to noise ratio (SNR) values, SNR =

and and two sample sizes, and . Each scenario of underlying function, SNR and sample size data generation was replicated times and the averaged mean square error (AMSE) was calculated as performance measure, given by

 AMSE=1MnM∑m=1n∑i=1[^f(m)(xi)−f(xi)]2,

where is the estimate of the function at a particular point in the -th replication, . For each replication, samples of the posterior distributions (3.1), (3.1), (3.2) and (3.2) were obtained by RAM algorithm and the associated shrinkage rules were calculated by (3.1). The performances of the shrinkage rules under logistic (LOGISTIC) and beta (BETA) priors were compared against four extensively used shrinkage and thresholding methods, Universal thresholding (UNIV) proposed by Donoho and Johnstone (1994), Cross Validation (CV) proposed by Nason (1996), False Discovery Rate (FDR) proposed by Abramovich and Benjamini (1996) and Stein Unbiased Risk Estimator (SURE) proposed by Donoho and Johnstone (1995) .

### 5.1 Simulation under exponential noise

Table 5.1 shows the AMSEs of the shrinkage and thresholding rules under exponential noise simulated data. In fact, the proposed shrinkage rules had great performances in terms of AMSE in almost all the scenarios. The shrinkage rule under logistic prior was the best estimator for all the scenarios with sample size and for most of the times when , being the best estimator in general. The shrinkage rule under beta prior was the best for Bumps function, SNR=3 and and Blocks, SNR=9 and also . Even when beta shrinkage rule was not the best one, its performance was close to the logistic rule in general, being the second best estimator. Moreover, the proposed rules worked much better against the standard rules in some of the cases, for example, for Bumps function, SNR = 9 and , the AMSEs of logistic and beta rules were respectively 0.787 and 1.140. The third best estimator in those scenarios was SURE, with AMSE = 6.287, almost 8 times the AMSE of logistic rule. Only for heavisine function and we did not have the proposed rules as the best ones, losing for UNIV and CV methods, but even in these cases, their performances were close to these ones. Finally, it should be noted the good behavior of the rules for low signal to noise ratio, i.e, for SNR=3, which is an evidence of good work for high noise datasets.

Figure 5.2 presents the estimates obtained by the shrinkage rule under logistic prior for and SNR=9. The main features of each test function were captures by the estimates, such as spikes of Bumps, piecewise constant regions of Blocks, oscillations of Doppler and the discontinuity point of Heavisine function. Boxplots of the estimators MSEs are also provided in Figure 5.3 and showed low variation for the proposed shrinkage rules MSEs. Figure 5.2: Estimates of the D-J test functions by the shrinkage rule under logistic prior in the simulation study for n=64, SNR = 9 and for simulated points under exponential noise. Figure 5.3: Boxplots of the mean square errors (MSE) of the shrinkage and thresholding rules in the simulation study for n=64, SNR = 9 and for simulated points under exponential noise. The associated rules are: 1-UNIV, 2-CV, 3-FDR, 4-SURE, 5-LOGISTIC and 6-BETA.

### 5.2 Simulation under lognormal noise

The obtained results for simulated data under lognormal noise are available in Table 5.2. In general, the shrinkage rule under logistic prior had the best performance in terms of AMSE, beating the other estimators in practically all scenarios with SNR=9. The rule under beta prior also presented good performance, with AMSEs close to the logistic rule ones and being the best for Blocks function, and SNR=9. Further, the beta rule worked better than logistic one in scenarios with low signal to noise ratio, SNR=3.

Although logistic rule was the best in general, it should be observed that the behaviors of the standard rules under lognormal noise were better in general than the respective ones under exponential noise. For example, considering data with SNR=3, SURE was the best for Bumps and Doppler underlying functions, while UNIV was the best one for Blocks and Heavisine. Under exponential noise, these rules were dominated by the proposed estimators for these same functions and scenarios.

Figure 5.4 shows the estimates of the D-J functions by the shrinkage rule under logistic prior, for and SNR=9. As occured in exponential noise context, the estimates captured well the main characteristics of the test functions. Boxplots of the MSEs are shown in Figure 5.5, where it is possible to note low MSE variation for the proposed shrinkage rules. Figure 5.4: Estimates of the D-J test functions by the shrinkage rule under logistic prior in the simulation study for n=64, SNR = 9 and for simulated points under lognormal noise. Figure 5.5: Boxplots of the mean square errors (MSE) of the shrinkage and thresholding rules in the simulation study for n=64, SNR = 9 and for simulated points under lognormal noise. The associated rules are: 1-UNIV, 2-CV, 3-FDR, 4-SURE, 5-LOGISTIC and 6-BETA.

## 6 Real data application

Boston Marathon is one of the most important marathon of the world. It occurs yearly since 1897 with a trajectory of 42,195 Km between Hopkinton and Boston cities, at US Massachussetts state. As mentioned in the introduction, arrival times are classical examples of measurements contaminated by positive noise due possible delays of detection by instruments.

In this sense, we applied the proposed shrinkage rule with logistic prior under exponential noise assumption for denoising winning times (in minutes) of Boston Marathon Men’s Open Division from 1953 to 2016. The data is publicly available at Boston Athletic Association (BAA) webpage https://www.baa.org/races/boston-marathon/results/champions. We used a DWT with Daub10 basis and the prior hyperparameters were adopted according to (4.1) and .

Figure 6.1 shows original and denoised data by the shrinkage rule with logistic prior under exponential noise. As expected, the denoised winning times are less than or equal the measured ones, depending on the shrinkage level. Since the good precision of measured times for this competition, it was not necessary the application of a high shrinkage level rule. The empirical wavelet coefficients (represented by vertical bars) by resolution level and the differences between them and the estimated coefficients, , are shown in Figures 6.2 (a) and (b) respectively. It is possible to note that, although residuals in original data are positive, which can be seen in Figure 6.3 (a), their counterparts in the wavelet domain are not necessarily positive, i.e., there are estimated coefficients bigger than their respective empirical ones.

Finally, Figure 6.3 (b) presents the histogram (with area equals to 1) of the residuals in time domain, i.e, , with a superposed exponential density curve, for , the maximum likelihood estimate. In fact, the one sample Kolmogorov-Smirnov test for exponential distribution with

of the residuals provided a p-value = 0.7057, not rejecting the null hypothesis under

of significance level. Thus, the exponential noise assumption for these dataset seems to be reasonable. Figure 6.1: Original and denoised winning times of Boston Marathon Men’s Open Division between 1953-2016. Denoising was performed by the proposed shrinkage rule with logistic prior under exponential noise model. (a) Empirical wavelet coefficients. (a) Differences between observed and denoised data (residuals).

## 7 Final considerations

We proposed bayesian wavelet shrinkage rules to estimate wavelet coefficients under nonparametric models with exponential and lognormal additive noise. The adopted priors to the wavelet coefficients were mixtures of a point mass function at zero with logistic and beta distributions. Under the standard gaussian noise assumption, the distribution is preserved on wavelet domain, i.e, the noises after discrete wavelet transform application on original data remain iid gaussian, which allow estimation process coefficient by coefficient. Under positive noise model, this feature is lost. Noises on wavelet domain are not necessarily positive and are correlated. The main impact is that shrinkage is performed on the empirical coefficients vector, which required the application of a robust adaptive MCMC algorithm to calculate posterior expectations, once these are the shrinkage rule under quadratic loss assumption.

The performances of the proposed shrinkage rules in terms of averaged mean square error (AMSE) were better than standard shrinkage and thresholding techniques in most of the scenarios of the simulation studies. Although the rules are more expensive computationally than the classical methods, their performances in simulation studies can indicate them as promissing shrinkage rules for denoising contaminated data with positive noise.

The behaviour of the shrinkage rules for other positive support distributed noises and the impact of the wavelet basis choice for performing DWT are suggested as future important questions to be studied in future works.

## Acknowledgements

This work was supported by a CAPES111Coordination of Superior Level Staff Improvement, Brazil fellowship.

## References

•  Abramovich, F. e Benjamini, Y., (1996) Adaptive thresholding of wavelet coefficients. Comp. Stat. Data Anal., 22, 351–361.
•  Angelini, C. and Vidakovic, B. (2004) Gama-minimax wavelet shrinkage: a robust incorporation of information about energy of a signal in denoising applications. Statistica Sinica, 14, 103-125.
•  Antoniadis, A., Leporini, D. and Pesquet, J.C., (2002) Wavelet thresholding for some classes of non-Gaussian noise. Statistica Neerlandica, Vol. 56, nr. 4, pp. 434–453.
•  Averkamp,R. and Houdré, C. (2003) Wavelet thresholding for non necessarily gaussian noise: idealism. The Annals of Statistics, Vol. 31, No. 1, 110–151.
•  Bhattacharya, A., Pati, D., Pillai, N. and Dunson, D., (2015) Dirichlet-Laplace priors for optimal shrinkage. J. Am. Statatist. Soc.,110 1479-1490.
•  Chaloner, K.M. and Duncan, G.T., (1983) Assessment of a beta prior distribution: PM elicitation. J. Royal Stat. Soc., serie D, 32, 174-80.
•  Chipman, H., Kolaczyk, E., and McCulloch, R. (1997) Adaptive Bayesian wavelet shrinkage. J. Am. Statist. Ass., 92, 1413–1421
•  Daubechies,I., (1992) Ten Lectures on Wavelets. SIAM, Philadelphia.
•  Donoho, D. L., (1993a) Nonlinear wavelet methods of recovery for signals, densities, and spectra from indirect and noisy data. Proceedings of Symposia in Applied Mathematics, volume 47, American Mathematical Society, Providence: RI.
•  Donoho, D. L., (1993b) Unconditional bases are optimal bases for data compression and statistical estimation. Appl Comput Harmonic Anal 1:100–115.
•  Donoho, D. L., (1995a) De-noising by soft-thresholding. IEEE Trans. Inf. Th., 41, 613–627.
•  Donoho, D. L., (1995b) Nonlinear solution of linear inverse problems by wavelet-vaguelette decomposition. App. Comp. Harm. Anal., 2, 101–26.
•  Donoho, D. L. and Johnstone, I. M., (1994a) Ideal denoising in an orthonormal basis chosen from a library of bases. Compt. Rend. Acad. Sci. Paris A, 319, 1317–1322.
•  Donoho, D. L. and Johnstone, I. M., (1994b) Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81, 425–455.
•  Donoho, D. L. and Johnstone, I. M., (1995) Adapting to unknown smoothness via wavelet shrinkage. J. Am. Statist. Ass., 90, 1200–1224.
•  Duran, B.S. and Booker, J.M., (1988) A Bayes sensitivity analysis when using the beta distribution as prior. IEEE Trans. on Reliability, 37, no 2.
•  Jansen, M., (2001) Noise reduction by wavelet thresholding. Springer, New York.
•  Leporini, D. and Pesquet, J.C., (2001) Bayesian wavelet denoising: Besov priors and non-Gaussian noises. Signal Processing, 81, 55-67.
•  Mallat, S. G., (1998) A Wavelet Tour of Signal Processing. Academic Press, San Diego.
•  Nason, G. P. (1996) Wavelet Shrinkage Using Cross-validation. J. R. Statist. Soc. B, 58, 463–479.
•  Nason, G. P. (2008) Wavelet Methods in Statistics with R. Springer.
•  Neumann, M.H. and von Sachs, R., (1995) Wavelet Thresholding: Beyond the Gaussian I.I.D. Situation. Wavelets and Statistics. Springer.
•  Radnosrati, K., Hendeby, G. and Gustafsson, F., (2020) Exploring Positive Noise In Estimation Theory. IEEE Transactions on Signal Processing, vol. 68.
•  Reményi, N. and Vidakovic, B., (2015) Wavelet shrinkage with double weibull prior. Communications in Statistics: Simulation and Computation, 44, 1, 88-104.
•  Sheidegger, A., (2021) adaptMCMC: Implementation of a Generic Adaptive Monte Carlo Markov Chain. R package version 1.4.
•  Sousa, A.R.S., (2020) Bayesian wavelet shrinkage with logistic prior. Communications in Statistics: Simulation and Computation, DOI: 10.1080/03610918.2020.1747076.
•  Sousa, A.R.S., Garcia, N.L. and Vidakovic, B., (2020) Bayesian wavelet shrinkage with beta prior. Computational Statistics,1-23.
•  Vidakovic, B., (1998)

Nonlinear wavelet shrinkage with Bayes rules and Bayes factors

. J. Am. Statist. Ass., 93, 173–179.
•  Vidakovic, B., (1999) Statistical Modeling by Wavelets. Wiley, New York.
•  Vidakovic, B. and Ruggeri, F., (2001) BAMS Method: Theory and Simulations. Sankhya: The Indian Journal of Statistics, Series B 63, 234-249.
•  Vihola,M., (2012) Robust adaptive Metropolis algorithm with coerced acceptance rate. J.Stat Comput, 22:997–1008.