
Beyond Pham's algorithm for joint diagonalization
The approximate joint diagonalization of a set of matrices consists in finding a basis in which these matrices are as diagonal as possible. This problem naturally appears in several statistical learning tasks such as blind signal separation. We consider the diagonalization criterion studied in a seminal paper by Pham (2001), and propose a new quasiNewton method for its optimization. Through numerical experiments on simulated and real datasets, we show that the proposed method outperforms Pham's algorithm. An open source Python package is released.
11/28/2018 ∙ by Pierre Ablin, et al. ∙ 28 ∙ shareread it

Learning step sizes for unfolded sparse coding
Sparse coding is typically solved by iterative optimization techniques, such as the Iterative ShrinkageThresholding Algorithm (ISTA). Unfolding and learning weights of ISTA using neural networks is a practical way to accelerate estimation. In this paper, we study the selection of adapted step sizes for ISTA. We show that a simple step size strategy can improve the convergence rate of ISTA by leveraging the sparsity of the iterates. However, it is impractical in most largescale applications. Therefore, we propose a network architecture where only the step sizes of ISTA are learned. We demonstrate that for a large class of unfolded algorithms, if the algorithm converges to the solution of the Lasso, its last layers correspond to ISTA with learned step sizes. Experiments show that our method is competitive with stateoftheart networks when the solutions are sparse enough.
05/27/2019 ∙ by Pierre Ablin, et al. ∙ 8 ∙ shareread it

Accelerating likelihood optimization for ICA on real signals
We study optimization methods for solving the maximum likelihood formulation of independent component analysis (ICA). We consider both the the problem constrained to white signals and the unconstrained problem. The Hessian of the objective function is costly to compute, which renders Newton's method impractical for large data sets. Many algorithms proposed in the literature can be rewritten as quasiNewton methods, for which the Hessian approximation is cheap to compute. These algorithms are very fast on simulated data where the linear mixture assumption really holds. However, on real signals, we observe that their rate of convergence can be severely impaired. In this paper, we investigate the origins of this behavior, and show that the recently proposed Preconditioned ICA for Real Data (Picard) algorithm overcomes this issue on both constrained and unconstrained problems.
06/25/2018 ∙ by Pierre Ablin, et al. ∙ 4 ∙ shareread it

A QuasiNewton algorithm on the orthogonal manifold for NMF with transform learning
Nonnegative matrix factorization (NMF) is a popular method for audio spectral unmixing. While NMF is traditionally applied to offtheshelf timefrequency representations based on the shorttime Fourier or Cosine transforms, the ability to learn transforms from raw data attracts increasing attention. However, this adds an important computational overhead. When assumed orthogonal (like the Fourier or Cosine transforms), learning the transform yields a nonconvex optimization problem on the orthogonal matrix manifold. In this paper, we derive a quasiNewton method on the manifold using sparse approximations of the Hessian. Experiments on synthetic and real audio data show that the proposed algorithm outperforms stateoftheart firstorder and coordinatedescent methods by orders of magnitude. A Python package for fast TLNMF is released online at https://github.com/pierreablin/tlnmf.
11/06/2018 ∙ by Pierre Ablin, et al. ∙ 2 ∙ shareread it

Manifoldregression to predict from MEG/EEG brain signals without source modeling
Magnetoencephalography and electroencephalography (M/EEG) can reveal neuronal dynamics noninvasively in realtime and are therefore appreciated methods in medicine and neuroscience. Recent advances in modeling brainbehavior relationships have highlighted the effectiveness of Riemannian geometry for summarizing the spatially correlated timeseries from M/EEG in terms of their covariance. However, after artefactsuppression, M/EEG data is often rank deficient which limits the application of Riemannian concepts. In this article, we focus on the task of regression with rankreduced covariance matrices. We study two Riemannian approaches that vectorize the M/EEG covariance betweensensors through projection into a tangent space. The Wasserstein distance readily applies to rankreduced data but lacks affineinvariance. This can be overcome by finding a common subspace in which the covariance matrices are full rank, enabling the affineinvariant geometric distance. We investigated the implications of these two approaches in synthetic generative models, which allowed us to control estimation bias of a linear model for prediction. We show that Wasserstein and geometric distances allow perfect outofsample prediction on the generative models. We then evaluated the methods on real data with regard to their effectiveness in predicting age from M/EEG covariance matrices. The findings suggest that the datadriven Riemannian methods outperform different sensorspace estimators and that they get close to the performance of biophysicsdriven sourcelocalization model that requires MRI acquisitions and tedious data processing. Our study suggests that the proposed Riemannian methods can serve as fundamental buildingblocks for automated largescale analysis of M/EEG.
06/04/2019 ∙ by David Sabbagh, et al. ∙ 2 ∙ shareread it

Faster ICA under orthogonal constraint
Independent Component Analysis (ICA) is a technique for unsupervised exploration of multichannel data widely used in observational sciences. In its classical form, ICA relies on modeling the data as a linear mixture of nonGaussian independent sources. The problem can be seen as a likelihood maximization problem. We introduce PicardO, a preconditioned LBFGS strategy over the set of orthogonal matrices, which can quickly separate both super and subGaussian signals. It returns the same set of sources as the widely used FastICA algorithm. Through numerical experiments, we show that our method is faster and more robust than FastICA on real data.
11/29/2017 ∙ by Pierre Ablin, et al. ∙ 0 ∙ shareread it

Faster independent component analysis by preconditioning with Hessian approximations
Independent Component Analysis (ICA) is a technique for unsupervised exploration of multichannel data that is widely used in observational sciences. In its classic form, ICA relies on modeling the data as linear mixtures of nonGaussian independent sources. The maximization of the corresponding likelihood is a challenging problem if it has to be completed quickly and accurately on large sets of real data. We introduce the Preconditioned ICA for Real Data (Picard) algorithm, which is a relative LBFGS algorithm preconditioned with sparse Hessian approximations. Extensive numerical comparisons to several algorithms of the same class demonstrate the superior performance of the proposed technique, especially on real data, for which the ICA model does not necessarily hold.
06/25/2017 ∙ by Pierre Ablin, et al. ∙ 0 ∙ shareread it

EM algorithms for ICA
Independent component analysis (ICA) is a widely spread data exploration technique, where observed signals are assumed to be linear mixtures of independent components. From a machine learning point of view, it amounts to a matrix factorization problem under statistical independence constraints. Infomax is one of the first and most used algorithms for inference of the latent parameters. It maximizes a loglikelihood function which is nonconvex and decomposes as a sum over signal samples. We introduce a new majorizationminimization framework for the optimization of the loss function. We show that this approach is equivalent to an ExpectationMaximization (EM) algorithm using Gaussian scale mixtures. Inspired by the literature around EM algorithms, we derive an online algorithm for the streaming setting, and an incremental algorithm for the finitesum setting. These algorithms do not rely on any critical hyperparameter like a step size, nor do they require a linesearch technique. The finitesum algorithm also enjoys the precious guarantee of decreasing the loss function at each iteration. Experiments show that they outperform the stateoftheart on large scale problems, where one iteration of a fullbatch algorithm is a computational burden.
05/25/2018 ∙ by Pierre Ablin, et al. ∙ 0 ∙ shareread it
Pierre Ablin
is this you? claim profile