Bayesian Lasso : Concentration and MCMC Diagnosis

02/22/2018 ∙ by Daoud Ounaissi, et al. ∙ University Lille 1: Sciences and Technologies 0

Using posterior distribution of Bayesian LASSO we construct a semi-norm on the parameter space. We show that the partition function depends on the ratio of the l 1 and l 2 norms and present three regimes. We derive the con- centration of Bayesian LASSO, and present MCMC convergence diagnosis. Keywords: LASSO, Bayes, MCMC, log-concave, geometry, incomplete Gamma function

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Let be two positive integers, and be an matrix with real numbers entries. Bayesian LASSO

(1)

is a typically posterior distribution used in the linear regression

Here

(2)

is the partition function, and are respectively the Euclidean and the

norms. The vector

are the observations, is the unknown signal to recover, is the standard Gaussian noise, and is a known matrix which maps the signal domain into the observation domain . If we suppose that is drawn from Laplace distribution i.e. the distribution proportional to

(3)

then the posterior of known is drawn from the distribution (1). The mode

(4)

of was first introduced in Tibshirani1996 and called LASSO. It is also called Basis Pursuit De-Noising method Chen . In our work we select the term LASSO and keep it for the rest of the article.

In general LASSO is not a singleton, i.e. the mode of the distribution is not unique. In this case LASSO is a set and we will denote by lasso any element of this set. A large number of theoretical results has been provided for LASSO. See Daubechies2004 , DDN , Fort , Mendoza , Pereyra and the references herein. The most popular algorithms to find LASSO are LARS algorithm Efron , ISTA and FISTA algorithms see e.g. Beck and the review article Parikh .

The aim of this work is to study geometry of bayesian LASSO and to derive MCMC convergence diagnosis.

2 Polar integration

Using polar coordinates , the partition function (2)

(5)

where denotes one of or norms in , denotes the surface measure on the unit sphere of the norm , and

(6)

where

We express the partition function (6) using the parabolic cylinder function. We also give an inequality of concentration and a geometric interpretation of the partition function .

3 Parabolic cylinder function and partition function

We extend the function à

(7)

This extension is homogeneous of order .

If , then , and more if , then

If , then we will express using the parabolic cylinder function. We recall that for the parabolic cylinder function is given by

(8)

when Temme . We also recall the integral representation of Erdélyi Erdélyi for the parabolic cylinder function

where is the function.

Proposition 3.1

The variable

(9)

will play an important role. It depends only on and the function is bounded below by

Now we can announce the following result.

Proposition 3.2

We have for

If , then and

Corollary 3.3

If then is bounded below by , where is the norm of the operator . The partition fucntion

is convex and decreasing.

Proof 3.4

It suffices to remark that

4 Geometric interpretation of the partition function

First we represent for in the form

(10)

The function is log-concav and integrable in . Observe that is a norm on the null space of . A general result Ball tells us that

is a quasi-norm on . The unit ball of is defined by

Its contour is equal to

We summarize our results in the following proposition.

Proposition 4.1

1) For each , the longest segment contained in holds for is solution of the equation

2) The ball

and its contour is equal to

3) The volume is

5 Necessary and sufficient condition to have lasso equal zero

Now we can give the necessary and sufficient condition to have

Proposition 5.1

The following assertions are equivalent.

1) .

2) pour tout .

3)

6 Concentration around the lasso

6.1 The case lasso null

The polar coordinate formula tells us that, we can draw a vector from by drawing its angle uniformly, and then simulate its distance to the origin from

(11)

Now let’s estimate for

the probability

where denotes the Lebesgue measure of . We introduce for each pair , the function

(12)

In the following

The function is increasing (because ). The function f est convexe et atteint son minimum au point solution de l’équation

The positive root is given by

(13)

On one hand

On the other hand by using the convexity of , we have for all ,

because . We deduce for ,

where is the upper incomplete gamma function. Finally we get the following result.

Proposition 6.1

We have for all ,

(14)

Using the following estimate Natalini

we get for ,

Therefore the quantity

balance sheet . If is drawn from the density , alors with a probability at least equal to .

In the figure(1) we plot for , , and the density for a fixed value of .

Figure 1: pour .

We notice that the mode is very close to the value of (13) for the same fixed .

6.2 The general case

We take the vector . We will study the concentration of around . The variable of interest is . The law of has for density

The change of variables formula gives for each norm

By definition for any vector , the convex function reaches its minimum at the point . Therefore is increasing.

The function

(15)

is strictly convex. Its critical point is solution of the equation

By a similar proof to that of propostion (6.1) we have the following result;

Proposition 6.2

If is drawn from the density , and , then for all ,

(16)

7 Applications

7.1 The contour in the case ,

Let a matrix of order . Its null-space . We have that contains

This intersection is a symmetric segment noted .

To determine the other points of the set , we will directly calculate . A simple calculation gives

et

Finally we have the following proposition.

Proposition 7.1

1) If , then

where is the distibution function of the normal law.

2) If and , then

3) Ifi , and , then the function

defined on is convex and decreasing, where .

4) We have for

the ball

is contained in the unit disk for the norm . The contour is defined by the equation

The norm of the linear operator is defined by

the function is constant on est constante sur

If then

If with , then

In both case

is part of the contour. The other points of the contour are deduced from the equation

Each pair generate four points of of the form where

We plot in the figure 2 the contour of for different choices of the matrix . We notice that the surface of is decreasing function relatively the norm of the matrix .

Figure 2: Contours of for different matrices , and .
Remark 7.2

The numerics show that exploses for the large values of , it means that is closes to the null-space of . to eleminate that explosion we need to estimate the tail of the gaussian density. Using the Gordon estimation Gordon^4

we have the following approximation

(17)

8 MCMC diagnosis

Here we take , , and for simplicity we consider . We sample from the distribution (1) using Hastings-Metropolis algorithm and propose the test as a criterion for the convergence. Here . We recall that if is drawn from the target distribution , then with the probability at least equal to . Table 2 gives the values of the probability . Note that for the criterion is satisfied with a large probability.

2 2.5 3 3.5 4 4.5 5
0.6672 0.9446 0.9924 0.9991 0.9999 1.0000 1.0000
Table 1: Values of the probability for .

8.1 Independent sampler (IS)

The proposal distribution

The ratio

It’s known that MCMC with the target distribution and the proposal distribution is uniformly ergodic Mengersen :

Here and then . Figure 4(a) shows respectively the plot of and .

8.2 Random-walk (RW) Metropolis algorithm

We do not know if the target distribution satisfies the curvature condition in Roberts Section 6. Here we propose to analyse the convergence of the Random walk Metropolis algorithm using the criterion . Figure 4(b) shows respectively the plot of and .

Figures 4 show that contrary to independent sampler algorithm, the random walk (RW) algorithm satisfies early the criterion . More precisely

  • the independent sampler (IS) algorithm begins to satisfy the criterion at iteration.

  • The RW algorithm begins to satisfy the criterion at iteration, but the IS algorithm never satisfies the criterion .

We finally compare IS and RW algorithms using the fact that . The best algorithm will furnish the best approximation of the integral . Table 3 gives the estimators and . It follows that and . We conclude that the random walk algorithm wins for both criteria against independent sampler algorithm.

-0.0005 -0.0037 0.0016 0.0164 0.0050 0.0021 -0.0058
0.0005 -0.0019 -0.0002 0.0012 -0.0005 0.0031 -0.0011
Table 2: and estimators using iterations.
Figure 3: (a): Test of convergence of MCMC algorithm with proposal distribution . (b): Test of convergence of MCMC algorithm with proposal distribution. iterations.

9 Conclusion

We studied the geometry of bayesian LASSO using polar coordinates and calculated the partition function. We obtained a concentration inequality and derived MCMC convergence diagnosis for the convergence of Hasting Metropolis algorithm. We showed that the random walk MCMC with the variance 0.5 wins again the independent sampler with Laplace proposal distribution.

References

  • (1) K. Ball, Logarithmically concave functions and sections of convex sets in , Stud. Math. T. LXXXVIII (1988) 69-84.
  • (2) A. Beck, M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problem, SIAM J. Imag. Sci. (2009) 183-202. .
  • (3) E.T. Copson, Asymptotic Expansions, Cambridge University Press, 1965.
  • (4) S. Chen, D. L. Donoho, M. Saunders, Atomic decomposition by basis pursuit, SIAM J. Sci. Comput. 20 (1) (1998) 33-61.
  • (5) I. Daubechies, M. Defrise, C. De Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Comm. Pure Appl. Math. 57 (11) (2004) 1413–1457.
  • (6) A. Dermoune, D. Ounaissi, N. Rahmania, Oscillation of Metropolis-Hastings and simulated annealing algorithms around LASSO estimator, Math. Comput. Simulation (2015).
  • (7) A. Erdélyi, Higher transcendental functions, California institute of technology, bateman manuscript project, vol. 2 p. 119 (1953).
  • (8) B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, Least angle regression, Ann. Statist. 37 (2004) 407-499.
  • (9) G. Fort, S. Le Corff, E. Moulines, A. Schreck, A shrinkage-thresholding Metropolis adjusted Langevin algorithm for bayesian variable selection, arXiv:1312.5658v3 [math.ST] (2015).
  • (10) R.D. Gordon, Values of Mills’ ratio of area to bounding ordinate of the normal probability integral for large values of the argument, Annals of Mathematical Statistics, vol. 12 364–366 (1941).
  • (11) B. Klartag, V.D. Milman, Geometry of log-concave functions and measures, Geom. Dedicata 112 (1) (2005) 169-182.
  • (12) M. Mendoza, A. Allegra, T.P. Coleman, Bayesian Lasso Posterior Sampling Via Parallelized Measure Transport, arXiv:1801.02106v1 [stat.CO] (2018).
  • (13) K. Mengersen, R.L Tweedie, Rates of convergence of the Hastings and Metropolis algorithms, Ann. Statist. 24 (1994) 101-121.
  • (14) N. Parikh, S. Boyed, Proximal algorithms, Found. Trends Optim. 1 (3) (2003) 123-231.
  • (15)

    M. Pereyra, Proximal Markov chain Monte Carlo algorithms, Stat. Comput. (2015) 1-16.

  • (16) N.M. Temme, Numerical and asymptotic aspects of parabolic cylinder functions, ournal of Computational and Applied Mathematics 121 221–246 (2000).
  • (17)

    G.O. Roberts, A.L. Tweedie, Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms, Biometrika 83 (1) (1996) 95-110.

  • (18) R. Tibshirani, Regression shrinkage and selection via Lasso, J. R. Stat. Soc. Ser. B Stat. Methodol. 58 (1) (1996) 267-288.
  • (19) R. Tibshirani, The Lasso problem and uniqueness, Electron. J. Stat. 7 (2013) 1456-1490.