1 Introduction
Let be two positive integers, and be an matrix with real numbers entries. Bayesian LASSO
(1) 
is a typically posterior distribution used in the linear regression
Here
(2) 
is the partition function, and are respectively the Euclidean and the
norms. The vector
are the observations, is the unknown signal to recover, is the standard Gaussian noise, and is a known matrix which maps the signal domain into the observation domain . If we suppose that is drawn from Laplace distribution i.e. the distribution proportional to(3) 
then the posterior of known is drawn from the distribution (1). The mode
(4) 
of was first introduced in Tibshirani1996 and called LASSO. It is also called Basis Pursuit DeNoising method Chen . In our work we select the term LASSO and keep it for the rest of the article.
In general LASSO is not a singleton, i.e. the mode of the distribution is not unique. In this case LASSO is a set and we will denote by lasso any element of this set. A large number of theoretical results has been provided for LASSO. See Daubechies2004 , DDN , Fort , Mendoza , Pereyra and the references herein. The most popular algorithms to find LASSO are LARS algorithm Efron , ISTA and FISTA algorithms see e.g. Beck and the review article Parikh .
The aim of this work is to study geometry of bayesian LASSO and to derive MCMC convergence diagnosis.
2 Polar integration
Using polar coordinates , the partition function (2)
(5) 
where denotes one of or norms in , denotes the surface measure on the unit sphere of the norm , and
(6) 
where
We express the partition function (6) using the parabolic cylinder function. We also give an inequality of concentration and a geometric interpretation of the partition function .
3 Parabolic cylinder function and partition function
We extend the function à
(7) 
This extension is homogeneous of order .
If , then , and more if , then
If , then we will express using the parabolic cylinder function. We recall that for the parabolic cylinder function is given by
(8) 
when Temme . We also recall the integral representation of Erdélyi Erdélyi for the parabolic cylinder function
where is the function.
Proposition 3.1
The variable
(9) 
will play an important role. It depends only on and the function is bounded below by
Now we can announce the following result.
Proposition 3.2
We have for
If , then and
Corollary 3.3
If then is bounded below by , where is the norm of the operator . The partition fucntion
is convex and decreasing.
Proof 3.4
It suffices to remark that
4 Geometric interpretation of the partition function
First we represent for in the form
(10) 
The function is logconcav and integrable in . Observe that is a norm on the null space of . A general result Ball tells us that
is a quasinorm on . The unit ball of is defined by
Its contour is equal to
We summarize our results in the following proposition.
Proposition 4.1
1) For each , the longest segment contained in holds for is solution of the equation
2) The ball
and its contour is equal to
3) The volume is
5 Necessary and sufficient condition to have lasso equal zero
Now we can give the necessary and sufficient condition to have
Proposition 5.1
The following assertions are equivalent.
1) .
2) pour tout .
3)
6 Concentration around the lasso
6.1 The case lasso null
The polar coordinate formula tells us that, we can draw a vector from by drawing its angle uniformly, and then simulate its distance to the origin from
(11) 
Now let’s estimate for
the probability
where denotes the Lebesgue measure of . We introduce for each pair , the function
(12) 
In the following
The function is increasing (because ). The function f est convexe et atteint son minimum au point solution de l’équation
The positive root is given by
(13) 
On one hand
On the other hand by using the convexity of , we have for all ,
because . We deduce for ,
where is the upper incomplete gamma function. Finally we get the following result.
Proposition 6.1
We have for all ,
(14) 
Using the following estimate Natalini
we get for ,
Therefore the quantity
balance sheet . If is drawn from the density , alors with a probability at least equal to .
In the figure(1) we plot for , , and the density for a fixed value of .
We notice that the mode is very close to the value of (13) for the same fixed .
6.2 The general case
We take the vector . We will study the concentration of around . The variable of interest is . The law of has for density
The change of variables formula gives for each norm
By definition for any vector , the convex function reaches its minimum at the point . Therefore is increasing.
The function
(15) 
is strictly convex. Its critical point is solution of the equation
By a similar proof to that of propostion (6.1) we have the following result;
Proposition 6.2
If is drawn from the density , and , then for all ,
(16) 
7 Applications
7.1 The contour in the case ,
Let a matrix of order . Its nullspace . We have that contains
This intersection is a symmetric segment noted .
To determine the other points of the set , we will directly calculate . A simple calculation gives
et
Finally we have the following proposition.
Proposition 7.1
1) If , then
where is the distibution function of the normal law.
2) If and , then
3) Ifi , and , then the function
defined on is convex and decreasing, where .
4) We have for
the ball
is contained in the unit disk for the norm . The contour is defined by the equation
The norm of the linear operator is defined by
the function is constant on est constante sur
If then
If with , then
In both case
is part of the contour. The other points of the contour are deduced from the equation
Each pair generate four points of of the form where
We plot in the figure 2 the contour of for different choices of the matrix . We notice that the surface of is decreasing function relatively the norm of the matrix .
Remark 7.2
The numerics show that exploses for the large values of , it means that is closes to the nullspace of . to eleminate that explosion we need to estimate the tail of the gaussian density. Using the Gordon estimation Gordon^4
we have the following approximation
(17) 
8 MCMC diagnosis
Here we take , , and for simplicity we consider . We sample from the distribution (1) using HastingsMetropolis algorithm and propose the test as a criterion for the convergence. Here . We recall that if is drawn from the target distribution , then with the probability at least equal to . Table 2 gives the values of the probability . Note that for the criterion is satisfied with a large probability.
2  2.5  3  3.5  4  4.5  5  

0.6672  0.9446  0.9924  0.9991  0.9999  1.0000  1.0000 
8.1 Independent sampler (IS)
The proposal distribution
The ratio
It’s known that MCMC with the target distribution and the proposal distribution is uniformly ergodic Mengersen :
Here and then . Figure 4(a) shows respectively the plot of and .
8.2 Randomwalk (RW) Metropolis algorithm
We do not know if the target distribution satisfies the curvature condition in Roberts Section 6. Here we propose to analyse the convergence of the Random walk Metropolis algorithm using the criterion . Figure 4(b) shows respectively the plot of and .
Figures 4 show that contrary to independent sampler algorithm, the random walk (RW) algorithm satisfies early the criterion . More precisely

the independent sampler (IS) algorithm begins to satisfy the criterion at iteration.

The RW algorithm begins to satisfy the criterion at iteration, but the IS algorithm never satisfies the criterion .
We finally compare IS and RW algorithms using the fact that . The best algorithm will furnish the best approximation of the integral . Table 3 gives the estimators and . It follows that and . We conclude that the random walk algorithm wins for both criteria against independent sampler algorithm.
0.0005  0.0037  0.0016  0.0164  0.0050  0.0021  0.0058  
0.0005  0.0019  0.0002  0.0012  0.0005  0.0031  0.0011 
9 Conclusion
We studied the geometry of bayesian LASSO using polar coordinates and calculated the partition function. We obtained a concentration inequality and derived MCMC convergence diagnosis for the convergence of Hasting Metropolis algorithm. We showed that the random walk MCMC with the variance 0.5 wins again the independent sampler with Laplace proposal distribution.
References
 (1) K. Ball, Logarithmically concave functions and sections of convex sets in , Stud. Math. T. LXXXVIII (1988) 6984.
 (2) A. Beck, M. Teboulle, A fast iterative shrinkagethresholding algorithm for linear inverse problem, SIAM J. Imag. Sci. (2009) 183202. .
 (3) E.T. Copson, Asymptotic Expansions, Cambridge University Press, 1965.
 (4) S. Chen, D. L. Donoho, M. Saunders, Atomic decomposition by basis pursuit, SIAM J. Sci. Comput. 20 (1) (1998) 3361.
 (5) I. Daubechies, M. Defrise, C. De Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Comm. Pure Appl. Math. 57 (11) (2004) 1413–1457.
 (6) A. Dermoune, D. Ounaissi, N. Rahmania, Oscillation of MetropolisHastings and simulated annealing algorithms around LASSO estimator, Math. Comput. Simulation (2015).
 (7) A. Erdélyi, Higher transcendental functions, California institute of technology, bateman manuscript project, vol. 2 p. 119 (1953).
 (8) B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, Least angle regression, Ann. Statist. 37 (2004) 407499.
 (9) G. Fort, S. Le Corff, E. Moulines, A. Schreck, A shrinkagethresholding Metropolis adjusted Langevin algorithm for bayesian variable selection, arXiv:1312.5658v3 [math.ST] (2015).
 (10) R.D. Gordon, Values of Mills’ ratio of area to bounding ordinate of the normal probability integral for large values of the argument, Annals of Mathematical Statistics, vol. 12 364–366 (1941).
 (11) B. Klartag, V.D. Milman, Geometry of logconcave functions and measures, Geom. Dedicata 112 (1) (2005) 169182.
 (12) M. Mendoza, A. Allegra, T.P. Coleman, Bayesian Lasso Posterior Sampling Via Parallelized Measure Transport, arXiv:1801.02106v1 [stat.CO] (2018).
 (13) K. Mengersen, R.L Tweedie, Rates of convergence of the Hastings and Metropolis algorithms, Ann. Statist. 24 (1994) 101121.
 (14) N. Parikh, S. Boyed, Proximal algorithms, Found. Trends Optim. 1 (3) (2003) 123231.

(15)
M. Pereyra, Proximal Markov chain Monte Carlo algorithms, Stat. Comput. (2015) 116.
 (16) N.M. Temme, Numerical and asymptotic aspects of parabolic cylinder functions, ournal of Computational and Applied Mathematics 121 221–246 (2000).

(17)
G.O. Roberts, A.L. Tweedie, Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms, Biometrika 83 (1) (1996) 95110.
 (18) R. Tibshirani, Regression shrinkage and selection via Lasso, J. R. Stat. Soc. Ser. B Stat. Methodol. 58 (1) (1996) 267288.
 (19) R. Tibshirani, The Lasso problem and uniqueness, Electron. J. Stat. 7 (2013) 14561490.
Comments
There are no comments yet.