
Distributed Convolutional Dictionary Learning (DiCoDiLe): Pattern Discovery in Large Images and Signals
Convolutional dictionary learning (CDL) estimates shift invariant basis adapted to multidimensional data. CDL has proven useful for image denoising or inpainting, as well as for pattern discovery on multivariate signals. As estimated patterns can be positioned anywhere in signals or images, optimization techniques face the difficulty of working in extremely high dimensions with millions of pixels or time samples, contrarily to standard patchbased dictionary learning. To address this optimization problem, this work proposes a distributed and asynchronous algorithm, employing locally greedy coordinate descent and an asynchronous locking mechanism that does not require a central server. This algorithm can be used to distribute the computation on a number of workers which scales linearly with the encoded signal's size. Experiments confirm the scaling properties which allows us to learn patterns on large scales images from the Hubble Space Telescope.
01/26/2019 ∙ by Thomas Moreau, et al. ∙ 52 ∙ shareread it

Learning step sizes for unfolded sparse coding
Sparse coding is typically solved by iterative optimization techniques, such as the Iterative ShrinkageThresholding Algorithm (ISTA). Unfolding and learning weights of ISTA using neural networks is a practical way to accelerate estimation. In this paper, we study the selection of adapted step sizes for ISTA. We show that a simple step size strategy can improve the convergence rate of ISTA by leveraging the sparsity of the iterates. However, it is impractical in most largescale applications. Therefore, we propose a network architecture where only the step sizes of ISTA are learned. We demonstrate that for a large class of unfolded algorithms, if the algorithm converges to the solution of the Lasso, its last layers correspond to ISTA with learned step sizes. Experiments show that our method is competitive with stateoftheart networks when the solutions are sparse enough.
05/27/2019 ∙ by Pierre Ablin, et al. ∙ 8 ∙ shareread it

Multivariate Convolutional Sparse Coding for Electromagnetic Brain Signals
Frequencyspecific patterns of neural activity are traditionally interpreted as sustained rhythmic oscillations, and related to cognitive mechanisms such as attention, high level visual processing or motor control. While alpha waves (812 Hz) are known to closely resemble short sinusoids, and thus are revealed by Fourier analysis or wavelet transforms, there is an evolving debate that electromagnetic neural signals are composed of more complex waveforms that cannot be analyzed by linear filters and traditional signal representations. In this paper, we propose to learn dedicated representations of such recordings using a multivariate convolutional sparse coding (CSC) algorithm. Applied to electroencephalography (EEG) or magnetoencephalography (MEG) data, this method is able to learn not only prototypical temporal waveforms, but also associated spatial patterns so their origin can be localized in the brain. Our algorithm is based on alternated minimization and a greedy coordinate descent solver that leads to stateoftheart running time on long time series. To demonstrate the implications of this method, we apply it to MEG data and show that it is able to recover biological artifacts. More remarkably, our approach also reveals the presence of nonsinusoidal mushaped patterns, along with their topographic maps related to the somatosensory cortex.
05/24/2018 ∙ by Tom Dupré La Tour, et al. ∙ 2 ∙ shareread it

Understanding the Learned Iterative Soft Thresholding Algorithm with matrix factorization
Sparse coding is a core building block in many data analysis and machine learning pipelines. Typically it is solved by relying on generic optimization techniques, such as the Iterative Soft Thresholding Algorithm and its accelerated version (ISTA, FISTA). These methods are optimal in the class of firstorder methods for nonsmooth, convex functions. However, they do not exploit the particular structure of the problem at hand nor the input data distribution. An acceleration using neural networks, coined LISTA, was proposed in Gregor and Le Cun (2010), which showed empirically that one could achieve high quality estimates with few iterations by modifying the parameters of the proximal splitting appropriately. In this paper we study the reasons for such acceleration. Our mathematical analysis reveals that it is related to a specific matrix factorization of the Gram kernel of the dictionary, which attempts to nearly diagonalise the kernel with a basis that produces a small perturbation of the ℓ_1 ball. When this factorization succeeds, we prove that the resulting splitting algorithm enjoys an improved convergence bound with respect to the nonadaptive version. Moreover, our analysis also shows that conditions for acceleration occur mostly at the beginning of the iterative process, consistent with numerical experiments. We further validate our analysis by showing that on dictionaries where this factorization does not exist, adaptive acceleration fails.
06/02/2017 ∙ by Thomas Moreau, et al. ∙ 0 ∙ shareread it

Distributed Convolutional Sparse Coding
We consider the problem of building shiftinvariant representations for long signals in the context of distributed processing. We propose an asynchronous algorithm based on coordinate descent called DICOD to efficiently solve the ℓ_1minimization problems involved in convolutional sparse coding. This algorithm leverages the weak temporal dependency of the convolution to reduce the interprocess communication to a few local messages. We prove that this algorithm converges to the optimal solution and that it scales with superlinear speedup, up to a certain limit. These properties are illustrated with numerical experiments and our algorithm is compared to the stateoftheart methods used for convolutional sparse coding.
05/29/2017 ∙ by Thomas Moreau, et al. ∙ 0 ∙ shareread it

Post Training in Deep Learning with Last Kernel
One of the main challenges of deep learning methods is the choice of an appropriate training strategy. In particular, additional steps, such as unsupervised pretraining, have been shown to greatly improve the performances of deep structures. In this article, we propose an extra training step, called posttraining, which only optimizes the last layer of the network. We show that this procedure can be analyzed in the context of kernel theory, with the first layers computing an embedding of the data and the last layer a statistical model to solve the task based on this embedding. This step makes sure that the embedding, or representation, of the data is used in the best possible way for the considered task. This idea is then tested on multiple architectures with various data sets, showing that it consistently provides a boost in performance.
11/14/2016 ∙ by Thomas Moreau, et al. ∙ 0 ∙ shareread it

Understanding Trainable Sparse Coding via Matrix Factorization
Sparse coding is a core building block in many data analysis and machine learning pipelines. Typically it is solved by relying on generic optimization techniques, that are optimal in the class of firstorder methods for nonsmooth, convex functions, such as the Iterative Soft Thresholding Algorithm and its accelerated version (ISTA, FISTA). However, these methods don't exploit the particular structure of the problem at hand nor the input data distribution. An acceleration using neural networks was proposed in Gregor10, coined LISTA, which showed empirically that one could achieve high quality estimates with few iterations by modifying the parameters of the proximal splitting appropriately. In this paper we study the reasons for such acceleration. Our mathematical analysis reveals that it is related to a specific matrix factorization of the Gram kernel of the dictionary, which attempts to nearly diagonalise the kernel with a basis that produces a small perturbation of the ℓ_1 ball. When this factorization succeeds, we prove that the resulting splitting algorithm enjoys an improved convergence bound with respect to the nonadaptive version. Moreover, our analysis also shows that conditions for acceleration occur mostly at the beginning of the iterative process, consistent with numerical experiments. We further validate our analysis by showing that on dictionaries where this factorization does not exist, adaptive acceleration fails.
09/01/2016 ∙ by Thomas Moreau, et al. ∙ 0 ∙ shareread it
Thomas Moreau
verfied profile