
Distributed Convolutional Dictionary Learning (DiCoDiLe): Pattern Discovery in Large Images and Signals
Convolutional dictionary learning (CDL) estimates shift invariant basis adapted to multidimensional data. CDL has proven useful for image denoising or inpainting, as well as for pattern discovery on multivariate signals. As estimated patterns can be positioned anywhere in signals or images, optimization techniques face the difficulty of working in extremely high dimensions with millions of pixels or time samples, contrarily to standard patchbased dictionary learning. To address this optimization problem, this work proposes a distributed and asynchronous algorithm, employing locally greedy coordinate descent and an asynchronous locking mechanism that does not require a central server. This algorithm can be used to distribute the computation on a number of workers which scales linearly with the encoded signal's size. Experiments confirm the scaling properties which allows us to learn patterns on large scales images from the Hubble Space Telescope.
01/26/2019 ∙ by Thomas Moreau, et al. ∙ 52 ∙ shareread it

Beyond Pham's algorithm for joint diagonalization
The approximate joint diagonalization of a set of matrices consists in finding a basis in which these matrices are as diagonal as possible. This problem naturally appears in several statistical learning tasks such as blind signal separation. We consider the diagonalization criterion studied in a seminal paper by Pham (2001), and propose a new quasiNewton method for its optimization. Through numerical experiments on simulated and real datasets, we show that the proposed method outperforms Pham's algorithm. An open source Python package is released.
11/28/2018 ∙ by Pierre Ablin, et al. ∙ 28 ∙ shareread it

Concomitant Lasso with Repetitions (CLaR): beyond averaging multiple realizations of heteroscedastic noise
Sparsity promoting norms are frequently used in high dimensional regression. A limitation of Lassotype estimators is that the regularization parameter depends on the noise level which varies between datasets and experiments. Estimators such as the concomitant Lasso address this dependence by jointly estimating the noise level and the regression coefficients. As sample sizes are often limited in high dimensional regimes, simplified heteroscedastic models are customary. However, in many experimental applications , data is obtained by averaging multiple measurements. This helps reducing the noise variance, yet it dramatically reduces sample sizes, preventing refined noise modeling. In this work, we propose an estimator that can cope with complex heteroscedastic noise structures by using nonaveraged measurements and a concomitant formulation. The resulting optimization problem is convex, so thanks to smoothing theory, it is amenable to stateoftheart proximal coordinate descent techniques that can leverage the expected sparsity of the solutions. Practical benefits are demonstrated on simulations and on neuroimaging applications.
02/07/2019 ∙ by Quentin Bertrand, et al. ∙ 18 ∙ shareread it

Dual Extrapolation for Sparse Generalized Linear Models
Generalized Linear Models (GLM) form a wide class of regression and classification models, where prediction is a function of a linear combination of the input variables. For statistical inference in high dimension, sparsity inducing regularizations have proven to be useful while offering statistical guarantees. However, solving the resulting optimization problems can be challenging: even for popular iterative algorithms such as coordinate descent, one needs to loop over a large number of variables. To mitigate this, techniques known as screening rules and working sets diminish the size of the optimization problem at hand, either by progressively removing variables, or by solving a growing sequence of smaller problems. For both techniques, significant variables are identified thanks to convex duality arguments. In this paper, we show that the dual iterates of a GLM exhibit a Vector AutoRegressive (VAR) behavior after sign identification, when the primal problem is solved with proximal gradient descent or cyclic coordinate descent. Exploiting this regularity, one can construct dual points that offer tighter certificates of optimality, enhancing the performance of screening rules and helping to design competitive working set algorithms.
07/12/2019 ∙ by Mathurin Massias, et al. ∙ 11 ∙ shareread it

Learning step sizes for unfolded sparse coding
Sparse coding is typically solved by iterative optimization techniques, such as the Iterative ShrinkageThresholding Algorithm (ISTA). Unfolding and learning weights of ISTA using neural networks is a practical way to accelerate estimation. In this paper, we study the selection of adapted step sizes for ISTA. We show that a simple step size strategy can improve the convergence rate of ISTA by leveraging the sparsity of the iterates. However, it is impractical in most largescale applications. Therefore, we propose a network architecture where only the step sizes of ISTA are learned. We demonstrate that for a large class of unfolded algorithms, if the algorithm converges to the solution of the Lasso, its last layers correspond to ISTA with learned step sizes. Experiments show that our method is competitive with stateoftheart networks when the solutions are sparse enough.
05/27/2019 ∙ by Pierre Ablin, et al. ∙ 8 ∙ shareread it

Group level MEG/EEG source imaging via optimal transport: minimum Wasserstein estimates
Magnetoencephalography (MEG) and electroencephalography (EEG) are noninvasive modalities that measure the weak electromagnetic fields generated by neural activity. Inferring the location of the current sources that generated these magnetic fields is an illposed inverse problem known as source imaging. When considering a group study, a baseline approach consists in carrying out the estimation of these sources independently for each subject. The illposedness of each problem is typically addressed using sparsity promoting regularizations. A straightforward way to define a common pattern for these sources is then to average them. A more advanced alternative relies on a joint localization of sources for all subjects taken together, by enforcing some similarity across all estimated sources. An important advantage of this approach is that it consists in a single estimation in which all measurements are pooled together, making the inverse problem better posed. Such a joint estimation poses however a few challenges, notably the selection of a valid regularizer that can quantify such spatial similarities. We propose in this work a new procedure that can do so while taking into account the geometrical structure of the cortex. We call this procedure Minimum Wasserstein Estimates (MWE). The benefits of this model are twofold. First, joint inference allows to pool together the data of different brain geometries, accumulating more spatial information. Second, MWE are defined through Optimal Transport (OT) metrics which provide a tool to model spatial proximity between cortical sources of different subjects, hence not enforcing identical source location in the group. These benefits allow MWE to be more accurate than standard MEG source localization techniques. To support these claims, we perform source localization on realistic MEG simulations based on forward operators derived from MRI scans. On a visual task dataset, we demonstrate how MWE infer neural patterns similar to functional Magnetic Resonance Imaging (fMRI) maps.
02/13/2019 ∙ by Hicham Janati, et al. ∙ 6 ∙ shareread it

Accelerating likelihood optimization for ICA on real signals
We study optimization methods for solving the maximum likelihood formulation of independent component analysis (ICA). We consider both the the problem constrained to white signals and the unconstrained problem. The Hessian of the objective function is costly to compute, which renders Newton's method impractical for large data sets. Many algorithms proposed in the literature can be rewritten as quasiNewton methods, for which the Hessian approximation is cheap to compute. These algorithms are very fast on simulated data where the linear mixture assumption really holds. However, on real signals, we observe that their rate of convergence can be severely impaired. In this paper, we investigate the origins of this behavior, and show that the recently proposed Preconditioned ICA for Real Data (Picard) algorithm overcomes this issue on both constrained and unconstrained problems.
06/25/2018 ∙ by Pierre Ablin, et al. ∙ 4 ∙ shareread it

Multivariate Convolutional Sparse Coding for Electromagnetic Brain Signals
Frequencyspecific patterns of neural activity are traditionally interpreted as sustained rhythmic oscillations, and related to cognitive mechanisms such as attention, high level visual processing or motor control. While alpha waves (812 Hz) are known to closely resemble short sinusoids, and thus are revealed by Fourier analysis or wavelet transforms, there is an evolving debate that electromagnetic neural signals are composed of more complex waveforms that cannot be analyzed by linear filters and traditional signal representations. In this paper, we propose to learn dedicated representations of such recordings using a multivariate convolutional sparse coding (CSC) algorithm. Applied to electroencephalography (EEG) or magnetoencephalography (MEG) data, this method is able to learn not only prototypical temporal waveforms, but also associated spatial patterns so their origin can be localized in the brain. Our algorithm is based on alternated minimization and a greedy coordinate descent solver that leads to stateoftheart running time on long time series. To demonstrate the implications of this method, we apply it to MEG data and show that it is able to recover biological artifacts. More remarkably, our approach also reveals the presence of nonsinusoidal mushaped patterns, along with their topographic maps related to the somatosensory cortex.
05/24/2018 ∙ by Tom Dupré La Tour, et al. ∙ 2 ∙ shareread it

A QuasiNewton algorithm on the orthogonal manifold for NMF with transform learning
Nonnegative matrix factorization (NMF) is a popular method for audio spectral unmixing. While NMF is traditionally applied to offtheshelf timefrequency representations based on the shorttime Fourier or Cosine transforms, the ability to learn transforms from raw data attracts increasing attention. However, this adds an important computational overhead. When assumed orthogonal (like the Fourier or Cosine transforms), learning the transform yields a nonconvex optimization problem on the orthogonal matrix manifold. In this paper, we derive a quasiNewton method on the manifold using sparse approximations of the Hessian. Experiments on synthetic and real audio data show that the proposed algorithm outperforms stateoftheart firstorder and coordinatedescent methods by orders of magnitude. A Python package for fast TLNMF is released online at https://github.com/pierreablin/tlnmf.
11/06/2018 ∙ by Pierre Ablin, et al. ∙ 2 ∙ shareread it

Manifoldregression to predict from MEG/EEG brain signals without source modeling
Magnetoencephalography and electroencephalography (M/EEG) can reveal neuronal dynamics noninvasively in realtime and are therefore appreciated methods in medicine and neuroscience. Recent advances in modeling brainbehavior relationships have highlighted the effectiveness of Riemannian geometry for summarizing the spatially correlated timeseries from M/EEG in terms of their covariance. However, after artefactsuppression, M/EEG data is often rank deficient which limits the application of Riemannian concepts. In this article, we focus on the task of regression with rankreduced covariance matrices. We study two Riemannian approaches that vectorize the M/EEG covariance betweensensors through projection into a tangent space. The Wasserstein distance readily applies to rankreduced data but lacks affineinvariance. This can be overcome by finding a common subspace in which the covariance matrices are full rank, enabling the affineinvariant geometric distance. We investigated the implications of these two approaches in synthetic generative models, which allowed us to control estimation bias of a linear model for prediction. We show that Wasserstein and geometric distances allow perfect outofsample prediction on the generative models. We then evaluated the methods on real data with regard to their effectiveness in predicting age from M/EEG covariance matrices. The findings suggest that the datadriven Riemannian methods outperform different sensorspace estimators and that they get close to the performance of biophysicsdriven sourcelocalization model that requires MRI acquisitions and tedious data processing. Our study suggests that the proposed Riemannian methods can serve as fundamental buildingblocks for automated largescale analysis of M/EEG.
06/04/2019 ∙ by David Sabbagh, et al. ∙ 2 ∙ shareread it

Faster ICA under orthogonal constraint
Independent Component Analysis (ICA) is a technique for unsupervised exploration of multichannel data widely used in observational sciences. In its classical form, ICA relies on modeling the data as a linear mixture of nonGaussian independent sources. The problem can be seen as a likelihood maximization problem. We introduce PicardO, a preconditioned LBFGS strategy over the set of orthogonal matrices, which can quickly separate both super and subGaussian signals. It returns the same set of sources as the widely used FastICA algorithm. Through numerical experiments, we show that our method is faster and more robust than FastICA on real data.
11/29/2017 ∙ by Pierre Ablin, et al. ∙ 0 ∙ shareread it
Alexandre Gramfort
is this you? claim profile
Researcher in machine learning and signal processing at Inria, Mining Engineer at French Ministry of Industry (Engineer of the State Technical Corps), Scientific Consultant at CEA from 20122017, Assistant Professor at Telecom ParisTech from 20122017, Research Fellow at Massachusetts General Hospital from 20102011, Research Fellow at HArvard University from 20102011, Research fellow at INRIA from 20092010, Research fellow at Neurospin, CEA Saclay from 20092010