Bayesian Wavelet Shrinkage with Beta Priors

07/15/2019 ∙ by Alex Rodrigo dos Santos Sousa, et al. ∙ 0

We present a Bayesian approach for wavelet shrinkage in the context of non-parametric curve estimation with the use of the beta distribution with symmetric support around zero as the prior distribution for the location parameter in the wavelet domain in models with additive Gaussian errors. Explicit formulas of shrinkage rules for particular cases are obtained, statistical properties such as bias, classical and Bayesian risk of the rules are analyzed and performance of the proposed rules is assessed in simulations studies involving standard test functions. Application to Spike Sorting real data set is provided.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Wavelet-based methods are widely applied in a range of fields, such as mathematics, signal and image processing, geophysics, and many others. In statistics, applications of wavelets arise mainly in the areas of non-parametric regression, density estimation, functional data analysis and stochastic processes. These methods basically utilize the possibility of representing functions that belong to certain functional spaces as expansions in wavelet bases, similar to others expansions such as splines or Fourier (among others) for example. However, wavelet expansions have characteristics that make them quite useful from the point of view of function representation: they are localized in both time and scale in an adaptive way. Their coefficients are typically sparse, they can be obtained by fast computational algorithms, and the magnitudes of coefficients are linked with the smoothness properties of the functions they represent. These properties enable time/frequency data analysis, bring computational advantages, and allow for statistical data modeling at different resolution scales.

Grossmann and Morlet (1984), Daubechies (1988, 1992) and Mallat (1998) are standard early references about wavelets in general and Ogden (1997), Vidakovic (1999), Percival and Walden (2000), Nason (2008), Antoniadis (1997), Morettin (1999) and Morettin et al (2017) are well-cited references about wavelet techniques in statistics.

Wavelet shrinkage methods are used to estimate the coefficients associated with the representation of the function in the wavelet domain by reducing the magnitude of the observed (empirical) coefficients that are obtained by the wavelet transform of the the original data. There are in fact several shrinkage techniques available in the literature. The main works in this area are of Donoho and Johnstone (1994, 1995, 1998), but also Donoho et al. (1995, 1996a, 1996b), Johnstone e Silverman (1997), Vidakovic (1998, 1999) and Antoniadis et al. (2002) can be cited. For more details of shrinkage methods, see Vidakovic (1999) and Jansen (2012).

It is well known that optimal rules in Bayesian models are shrinkage rules, except for pathological cases. Bayesian approach offers theoretical paradigm for statistical formalization of wavelet shrinkage. In addition to their controllable shrinkage properties, Bayesian methods have also been extensively studied for the possibility of incorporating, by means of a prior probabilistic distributions, prior information about the regression smoothnes, neighboring coefficients and other parameters to be estimated. Bayesian models in the wavelet domain have showed to be capable of incorporating prior information about the unknown regression function such as smoothness, periodicity, self-similarity, degree of “noisiness” (SNR), and, for some particular bases (e.g., Haar), monotonicity.

In this sense, the choice of the prior distribution in the statistical model describing wavelet coefficients is extremely important to achieve meaningful results.

Several priors models in the wavelet domain were proposed since 1990s, see Chipman et all (1997), Abramovich, Sapatinas and Silverman (1998), Abramovich and Sapatinas (1999), Vidakovic (1998), Vidakovic and Ruggeri (2001), Angelini and Vidakovic (2004), Johnstone and Silverman (2004, 2005), among others.

The standard statistical models in which shrinkage techniques are applied assume Gaussian additive errors. These models are important not only from their applicability to a range of different problems, but also from the mathematical point of view since the Gaussian additive errors remain both Gaussian and additive after the wavelet transformation.

In this paper we propose and explore a beta distribution symmetric around zero as a prior distribution for the location parameter in a Gaussian model on wavelet coefficients. As traditionally done in this kind of analysis the prior is in fact a distribution contaminated at 0. This added point mass at zero to the spread part of the prior facilitates thresholding. The flexibility of the beta distribution, as a spread part of the prior, is readily controlled by convenient choice of its parameters. Moreover, we show that there is an interesting relationship between the prior (hyper) parameters and the degree of wavelet shrinkage, which is useful in practical applications.

In this paper we aim to incorporate prior belief on the boundedness of the energy of the signal (the

-norm of the regression function). The prior information on the energy bound often exists in real life problems and it can be modeled by the assumption that the location parameter space is bounded, which in our context relates to estimating a bounded normal mean. Estimation of a bounded normal mean has been considered in Miyasawa (1953), Bickel (1981), Casella and Strawderman (1981), and Vidakovic and DasGupta (1996). In the context of Bayesian modeling, if the structure of the prior can be supported by the analysis of the empirical distribution of the wavelet coefficients, the precise elicitation of the prior hyperparameters cannot be done without some kind of guidance, especially in the Empirical Bayes spirit. Of course, when prior knowledge on the signal-to-noise ratio

is available, then any 0-contaminated symmetric and unimodal distribution supported on the bounded set, say

, could be a possible candidate for the prior. If the problem is rescaled so that the size of the noise (its variance) is 1, then

can be taken as

This paper is organized as follows: Section 2 defines the considered model in time and wavelet domain and the proposed prior distribution, Section 3 presents the shrinkage rule, shows statistical properties such as variance, bias and risks and explicit formulas of two particular cases of the shrinkage rule under beta prior and of an extension the shrinkage rule under triangular prior. Section 4 is dedicated to prior elicitation. To verify the strength of the proposed approach simulation studies are performed in Section 5, and the shrinkage rule is applied in a Spike Sorting real data set in Section 6. Section 7 provides conclusions.

2 The Model

2.1 The Symmetric Around Zero Beta Distribution

In Statistics, the beta distribution is extensively used to model variables in the domain. This distribution is extremely flexible in shape controlled by a convenient choices of its parameters. In our framework, it is convenient to use its version shifted and rescaled to the interval as well as choosing its parameters to keep it symmetric about 0. Therefore, we propose the use of beta distribution with symmetric support around zero as a prior distribution for the wavelet coefficients. Its density function is


where is the standard beta function, and are the parameters of the distribution, and is an indicator function equal to 1 for its argument in the interval and 0 else.

For , the density function (2.1) is unimodal around zero and as increases, the density becomes more concentrated around zero. This is an important feature for wavelet shrinkage methods, since high values of imply higher levels of shrinkage of the empirical coefficients, which results in sparse estimated coefficients. Figure 2.1 shows the beta density function (2.1) for some selected values of and .

(a) Beta density function for .
(b) Beta density function for .
Figure 2.1: Beta density function symmetric around zero for some values of and .

2.2 Beta Distribution as Prior for the Wavelet Coefficients

We start with the nonparametric regression problem of the form


where and ,

, are zero mean independent normal random variables with unknown variance

. In vector notation, we have


where , and . The goal is to estimate the unknown function

. After applying a discrete wavelet transform (DWT) on (2.3), given by the orthogonal matrix

, we obtain the following model, in the wavelet domain,


where , and .

Due to the independence of the random errors and the orthogonality of the transform, the model in the wavelet domain remains additive and the errors are i.i.d. normal.

Because the strong decorrelating property of wavelets we can study one coefficient at a time. Of course, wavelet coefficients are “almost decorrelated,” but never fully decorrelated, except for special cases, and strategies for involvement of neighboring coefficients in the shrinkage policy are common in the literature. However, these strategies increase the complexity of the shrinkage for very modest improvements, and still favored approach, especially among the practitioners, is to consider shrinkage coefficient-by-coefficient.

For the th component of the vector , we have a simple model


where is the empirical wavelet coefficient, is the coefficient to be estimated and is the normal random error with unknown variance . For simplicity of notation, we suppress the subindices of , and . Note that, according to the model (2.5), and then, the problem of estimating a function becomes a normal mean estimation problem in the wavelet domain for each coefficient.

To complete the Bayesian model, we propose the following prior distribution for ,


where , is the point mass function at zero and is the beta distribution (2.1) in . The proposed prior distribution has , and as hyperparameters and their choices are directly related to the degree of shrinkage of the empirical coefficients. It will be shown that as or (or both of them) increase, the degree of shinkage increases as well.

3 The Shrinkage Rule

The shrinkage rule for Bayesian estimation of the wavelet coefficient of model (2.5

) depends on the choice of location of the posterior (mean, mode, or median) and the loss function. Under square error loss function

, it is well known that the Bayes rule is the posterior expected value of , i.e, minimized Bayes risk. The Proposition 3.1 gives an expression of the shrinkage rule under a mixture prior consisting of a point mass at zero and a density function with support in .

Proposition 3.1.

If the prior distribution of is of the form , where is a density function with support in , then the shrinkage rule under the quadratic loss function is given by


where is the standard normal density function.


If is the likelihood function, we have that

Figure 3.1 presents some shrinkage rules for as beta distribution (2.1) with hyperparameters , and as well as their variances. It can be seen that the length of the interval in which the rule shrinks to zero increases as the hyperparameter increases, since high values of result in more concentrated beta distributions around zero. A typical feature of these rules is that as increases, gets closer to and as decreases, gets closer to , i.e, and . These asymptotic characteristics are reasonable since there is the assumption that the coefficients to be estimated belong to the range , so that empirical coefficients outside this range occur due to the presence of noise.

(a) Shrinkage rules
(b) Variances
Figure 3.1: Shrinkage rules and their variances under beta prior distribution with hyperparameters and .

Figure 3.2 shows the bias and classical risks respectively for the shrinkage rules under beta prior distribution with hyperparameters and showed in Figure 2.1. Observe that, as expected, the rules have smaller variances and biases for values of near zero, reaching minimum values in both graphs when the wavelet coefficient is zero. It is also noted that as hyperparameter increases, the bias of the estimator increases and the variance decreases. The classical risk decreases as tends to zero and that for high values of , the risk is larger for rules with large values of . These features are justified by the fact that the degree of shrinkage increases as the hyperparameter increases, so if the value of the wavelet coefficient is far from zero, such rules with larger values of tend to underestimate than rules with small values of .

(a) Bias
(b) Classical Risks
Figure 3.2: Bias and classical risks of the shrinkage rules under beta prior distribution with hyperparameters and .

Table 3.1 shows Bayes risks in terms of the hyperparameter of the shrinkage rules considered. As expected, Bayes risk decreases as increases.

a 0.5 1 2 5
r 0.226 0.188 0.137 0.008
Table 3.1: Bayes risks of the shrinkage rules under beta prior distribution with hyperparameters and .

3.1 Particular Cases

In this section we show two particular cases of the shrinkage rule under beta prior and an extension of the beta prior, the triangular prior. Angelini and Vidakovic (2004) studied in details the shrinkage rule under uniform prior, an interesting particular case of beta distribution with the hyperparameter . The proofs of Propositions 3.2, 3.3 and 3.4 use the Lemma 3.1, whose proofs are in the Appendix.

Lemma 3.1.

Let and

be the standard normal density and cumulative distribution functions respectively and

. Then,

3.1.1 Shrinkage Rule for

Proposition 3.2.

The shrinkage rule under prior distribution of the form , where is the beta distribution in with the hyperparameter is



, , , , , , and .


In the Proposition 3.1, we do as the beta distribution (2.1) with . Then,

where , , , and are the integrals of the numerator and denominator, respectively.

where , , . Similarly,

where , e .

Figure 3.3 shows the shrinkage rules (3.2), bias, variance and classical risk for the hyperparameters and for .

Figure 3.3: Shrinkage rules, bias, variances and classical risks of these rules under beta prior for and . Green color for , red color for and black color for .

As expected, the rule presents a higher degree of shrinkage as the hyperparameter increases. Similarly to the previous section results, the bias increases and the classical risk increases and the variance decreases as increases. Observe that both bias, variance and classical risk increase as moves away from zero and approaches .

Table 3.2 presents the Bayes risks of the rules (3.2) for the hyperparameters and for .

0.8 0.9 0.99
r 0.241 0.137 0.016
Table 3.2: Bayes risks of the shrinkage rules under beta prior with the hyperparameters and

The Bayes risk decreases as increases, since increasing implies a greater degree of shrinkage of empirical coefficients.

3.1.2 Shrinkage Rule for

Proposition 3.3.

The shrinkage rule under prior distribution of the form , where is the beta distribution on with the hyperparameter can be approximated by







In Proposition 3.1, we take as the beta distribution (2.1) with . Then,

where , e are the integrals of the numerator and denominator respectively. In fact, we approximate these integrals using second order Taylor series.

The following approximation of the integral in the numerator can be obtained around a value :


In the same way, the approximation of the integral in the denominator around is



Figure 3.4 shows the Bayes rule and the approximation given in Proposition 3.3 for the case and .

Figure 3.4: The shrinkage rule and the approximated rule (3.3) for , e .

3.1.3 Shrinkage Rule for Triangular Prior

We present the triangular prior distribution for the wavelet coefficients as an extension of the beta distribution. In fact, the triangular distribution in

is the convolution of two uniform distributions in


The triangular distribution in the interval symmetric around zero has density function given by


The following proposition provides an explicit formula for the shrinkage rule under triangular prior.

Proposition 3.4.

The shrinkage rule under prior distribution of the form , where is the triangular distribution over , is





Applying Proposition 3.1 we get

For , the integral in the denominator, we have that

We calculate e separately, i.e,

For , the integral in the numerator,

In this way, we obtain as

Finally, substituting the integrals and into (???), we have

4 Default Prior Hyperparameters

Methods and criteria for determination of the involved parameters and hyperparameters to estimate the coefficients are important in Bayesian procedures. In the framework of Bayesian shrinkage with beta prior, the choices of the parameter of the random error distribution and the hyperparameters , and of the beta prior distribution of the wavelet coefficient are required. We present the methods and criteria already available in the literature for such choices and used in simulation and application studies.

Based on the fact that much of the noise information present in the data can be obtained on the finest resolution scale, for the robust estimation, Donoho and Johnstone (1994) suggested robust estimator


Angelini et al. (2004) suggested the hyperparameters and be dependent on the level of resolution according to the expressions




where , is the primary resolution level and . They also suggested that in the absence of additional information, can be adopted.

Chaloner et al. (1983) proposed a method for choosing the hyperparameters of the beta distribution as a priori distribution for the probability of success in each Bernoulli assay of the binomial distribution. Duran and Booker (1988) propose the percentile method. For

and fixed, is chosen so that




Thus, the choice of is made by determining the probability of occurrence of a particular event . This procedure is interesting because of the greater facility of subjective determination of a

probability, that is, it is simpler to cognitively assign a probability to a certain event than to directly assign a value to the parameter of a probability distribution.

5 Simulation Studies

Simulation studies were done to evaluate the performance of the shrinkage rules under beta prior distribution for the particular cases in which the hyperparameter assumes the fixed values and triangular distribution and to compare them with the performances of some available shrinkage methods in the literature, namely, Universal Soft Thresholding (UNIV), False Discovery Rate (FDR), Cross Validation (CV), and Bayesian shrinkage rule under normal prior (BNormal). We also considered the shrinkage rule under Bickel prior, seggested by Angelini and Vidakovic (2004). They proved that the shrinkage rule under this prior is approximately -minimax for the class of all symmetric unimodal priors bounded on , . When increases, the weak limit of the least Bickel (1981) proved that the least favorable prior in is approximately (in sense of weak distributional limits) . Applying this result in our context, we have that the Bickel shrinkage is induced by prior

The corresponding Bayes rule does not have a simple analytical form and needs to be numerically computed. The hyperparameters and were selected according to the proposals described in Section 4.

To perform the simulation, the rules were applied in the Donoho-Johnstone (DJ) test functions. These functions, shown in Figure 5.1, are widely used in the literature for comparison of wavelet-based methods. These are four functions, called Bumps, Blocks, Doppler and Heavisine, which represent some characteristics of curves in real problems.

Figure 5.1: Donoho-Johnstone (DJ) test functions.

For each test function, three sample sizes were selected, and , and for each point, a normal error with zero mean and variance was added, where was selected according to three signal to noise ratio (SNR), 3, 5 and 7.

We used the mean squared error (MSE), as performance measure of the shrinkage rules. For each function, the process was repeated times and a comparison measure, the average of the obtained MSEs, , was calculated for each rule as shown in Tables 5.1 and 5.2. Boxplots of the MSEs of the shrinkage rules for and SNR=5 ares shown in Figure 5.2.

Signal n Method SNR=3 SNR=5 SNR=7 Signal n Method SNR=3 SNR=5 SNR=7
Bumps 512 UNIV 11.073 5.168 3.035 Blocks 512 UNIV 6.898 3.667 2.258
CV 11.482 9.441 6.329 CV 2.546 1.248 0.840
FDR 9.300 4.369 2.642 FDR 5.855 2.913 1.742
BNormal 7.723 5.169 4.429 BNormal 2.178 1.040 0.662
3.165 1.307 0.730 3.230 1.432 0.934
3.005 1.236 0.699 3.154 1.327 0.769
2.887 1.189 0.670 2.865 1.306 0.870
2.827 1.158 0.657 2.713 1.187 0.669
Triang 2.840 1.160 0.658 Triang 2.763 1.273 0.815
Bickel 2.827 1.157 0.656 Bickel 2.765 1.222 1.656
1024 UNIV 7.571 3.570 2.133 1024 UNIV 4.851 2.484 1.548
CV 2.934 1.933 1.727 CV 1.789 0.840 0.536
FDR 5.593 2.536 1.474 FDR 3.899 1.882 1.129
BNormal 4.514 1.172 0.684 BNormal 1.616 0.679 0.400
1.990 0.794 0.423 1.794 0.750 0.483
1.900 0.755 0.404 1.687 0.711 0.437
1.834 0.726 0.387 1.623 0.830 0.403
1.799 0.724 0.448 1.571 0.649 0.379
Triang 1.798 0.711 0.384 Triang 1.568 0.832 0.404
Bickel 1.800 0.712 0.380 Bickel 1.571 1.641 0.383
2048 UNIV 5.056 2.349 1.394 2048 UNIV 3.412 1.768 1.104
CV 1.609 0.738 0.482 CV 1.298 0.588 0.354
FDR 3.567 1.587 0.918 FDR 2.682 1.286 0.767
Bnormal 1.285 0.489 0.287 Bnormal 1.175 0.443 0.242
1.333 0.579 0.356 1.331 0.755 2.257
1.276 0.557 0.325 1.283 0.652 1.916
1.233 0.539 0.353 1.445 0.617 1.656
1.220 0.755 1.101 1.182 0.589 2.463
Triang 1.215 0.523 0.324 Triang 1.424 0.626 1.579
Bickel 1.213 0.520 0.311 Bickel 1.560 0.632 1.825
Table 5.1: AMSE of the shrinkage/thresholding rules in the simulation study for Bumps and Blocks DJ test functions.
Signal n Method SNR=3 SNR=5 SNR=7 Signal n Method SNR=3 SNR=5 SNR=7
Doppler 512 UNIV 2.663 1.406 0.885 Heavisine 512 UNIV 0.569 0.403 0.302
CV 1.285 0.642 0.446 CV 0.508 0.278 0.175
FDR 2.543 1.255 0.759 FDR 0.597 0.435 0.309
BNormal 1.170 0.528 0.316 BNormal 0.475 0.244 0.165
1.161 0.575 0.319 0.734 0.542 0.458
1.124 0.558 0.311 0.786 0.543 0.445
1.103 0.530 0.301 0.831 0.547 0.442
1.108 0.515 0.289 0.971 0.670 0.726
Triang 1.089 0.523 0.296 Triang 0.839 0.566 0.439
Bickel 1.094 0.518 0.287