
Concomitant Lasso with Repetitions (CLaR): beyond averaging multiple realizations of heteroscedastic noise
Sparsity promoting norms are frequently used in high dimensional regression. A limitation of Lassotype estimators is that the regularization parameter depends on the noise level which varies between datasets and experiments. Estimators such as the concomitant Lasso address this dependence by jointly estimating the noise level and the regression coefficients. As sample sizes are often limited in high dimensional regimes, simplified heteroscedastic models are customary. However, in many experimental applications , data is obtained by averaging multiple measurements. This helps reducing the noise variance, yet it dramatically reduces sample sizes, preventing refined noise modeling. In this work, we propose an estimator that can cope with complex heteroscedastic noise structures by using nonaveraged measurements and a concomitant formulation. The resulting optimization problem is convex, so thanks to smoothing theory, it is amenable to stateoftheart proximal coordinate descent techniques that can leverage the expected sparsity of the solutions. Practical benefits are demonstrated on simulations and on neuroimaging applications.
02/07/2019 ∙ by Quentin Bertrand, et al. ∙ 18 ∙ shareread it

Dual Extrapolation for Sparse Generalized Linear Models
Generalized Linear Models (GLM) form a wide class of regression and classification models, where prediction is a function of a linear combination of the input variables. For statistical inference in high dimension, sparsity inducing regularizations have proven to be useful while offering statistical guarantees. However, solving the resulting optimization problems can be challenging: even for popular iterative algorithms such as coordinate descent, one needs to loop over a large number of variables. To mitigate this, techniques known as screening rules and working sets diminish the size of the optimization problem at hand, either by progressively removing variables, or by solving a growing sequence of smaller problems. For both techniques, significant variables are identified thanks to convex duality arguments. In this paper, we show that the dual iterates of a GLM exhibit a Vector AutoRegressive (VAR) behavior after sign identification, when the primal problem is solved with proximal gradient descent or cyclic coordinate descent. Exploiting this regularity, one can construct dual points that offer tighter certificates of optimality, enhancing the performance of screening rules and helping to design competitive working set algorithms.
07/12/2019 ∙ by Mathurin Massias, et al. ∙ 11 ∙ shareread it

Learning step sizes for unfolded sparse coding
Sparse coding is typically solved by iterative optimization techniques, such as the Iterative ShrinkageThresholding Algorithm (ISTA). Unfolding and learning weights of ISTA using neural networks is a practical way to accelerate estimation. In this paper, we study the selection of adapted step sizes for ISTA. We show that a simple step size strategy can improve the convergence rate of ISTA by leveraging the sparsity of the iterates. However, it is impractical in most largescale applications. Therefore, we propose a network architecture where only the step sizes of ISTA are learned. We demonstrate that for a large class of unfolded algorithms, if the algorithm converges to the solution of the Lasso, its last layers correspond to ISTA with learned step sizes. Experiments show that our method is competitive with stateoftheart networks when the solutions are sparse enough.
05/27/2019 ∙ by Pierre Ablin, et al. ∙ 8 ∙ shareread it

Generalized Concomitant MultiTask Lasso for sparse multimodal regression
In high dimension, it is customary to consider Lassotype estimators to enforce sparsity. For standard Lasso theory to hold, the regularization parameter should be proportional to the noise level, yet the latter is generally unknown in practice. A possible remedy is to consider estimators, such as the Concomitant/Scaled Lasso, which jointly optimize over the regression coefficients as well as over the noise level, making the choice of the regularization independent of the noise level. However, when data from different sources are pooled to increase sample size, or when dealing with multimodal datasets, noise levels typically differ and new dedicated estimators are needed. In this work we provide new statistical and computational solutions to deal with such heteroscedastic regression models, with an emphasis on functional brain imaging with combined magneto and electroencephalographic (M/EEG) signals. Adopting the formulation of Concomitant Lassotype estimators, we propose a jointly convex formulation to estimate both the regression coefficients and the (square root of the) noise covariance. When our framework is instantiated to decorrelated noise, it leads to an efficient algorithm whose computational cost is not higher than for the Lasso and Concomitant Lasso, while addressing more complex noise structures. Numerical experiments demonstrate that our estimator yields improved prediction and support identification while correctly estimating the noise (square root) covariance. Results on multimodal neuroimaging problems with M/EEG data are also reported.
05/27/2017 ∙ by Mathurin Massias, et al. ∙ 0 ∙ shareread it

From safe screening rules to working sets for faster Lassotype solvers
Convex sparsitypromoting regularizations are ubiquitous in modern statistical learning. By construction, they yield solutions with few nonzero coefficients, which correspond to saturated constraints in the dual optimization formulation. Working set (WS) strategies are generic optimization techniques that consist in solving simpler problems that only consider a subset of constraints, whose indices form the WS. Working set methods therefore involve two nested iterations: the outer loop corresponds to the definition of the WS and the inner loop calls a solver for the subproblems. For the Lasso estimator a WS is a set of features, while for a Group Lasso it refers to a set of groups. In practice, WS are generally small in this context so the associated feature Gram matrix can fit in memory. Here we show that the GaussSouthwell rule (a greedy strategy for block coordinate descent techniques) leads to fast solvers in this case. Combined with a working set strategy based on an aggressive use of socalled Gap Safe screening rules, we propose a solver achieving stateoftheart performance on sparse learning problems. Results are presented on Lasso and multitask Lasso estimators.
03/21/2017 ∙ by Mathurin Massias, et al. ∙ 0 ∙ shareread it
Mathurin Massias
is this you? claim profile