
Using Feature Grouping as a Stochastic Regularizer for HighDimensional Noisy Data
The use of complex models with many parameters is challenging with highdimensional smallsample problems: indeed, they face rapid overfitting. Such situations are common when data collection is expensive, as in neuroscience, biology, or geology. Dedicated regularization can be crafted to tame overfit, typically via structured penalties. But rich penalties require mathematical expertise and entail large computational costs. Stochastic regularizers such as dropout are easier to implement: they prevent overfitting by random perturbations. Used inside a stochastic optimizer, they come with little additional cost. We propose a structured stochastic regularization that relies on feature grouping. Using a fast clustering algorithm, we define a family of groups of features that capture feature covariations. We then randomly select these groups inside a stochastic gradient descent loop. This procedure acts as a structured regularizer for highdimensional correlated data without additional computational cost and it has a denoising effect. We demonstrate the performance of our approach for logistic regression both on a samplelimited face image dataset with varying additive noise and on a typical highdimensional learning problem, brain image classification.
07/31/2018 ∙ by Sergul Aydore, et al. ∙ 16 ∙ shareread it

Approximate messagepassing for convex optimization with nonseparable penalties
We introduce an iterative optimization scheme for convex objectives consisting of a linear loss and a nonseparable penalty, based on the expectationconsistent approximation and the vector approximate messagepassing (VAMP) algorithm. Specifically, the penalties we approach are convex on a linear transformation of the variable to be determined, a notable example being total variation (TV). We describe the connection between messagepassing algorithms  typically used for approximate inference  and proximal methods for optimization, and show that our scheme is, as VAMP, similar in nature to the PeacemanRachford splitting, with the important difference that stepsizes are set adaptively. Finally, we benchmark the performance of our VAMPlike iteration in problems where TV penalties are useful, namely classification in task fMRI and reconstruction in tomography, and show faster convergence than that of stateoftheart approaches such as FISTA and ADMM in most settings.
09/17/2018 ∙ by Andre Manoel, et al. ∙ 10 ∙ shareread it

Group level MEG/EEG source imaging via optimal transport: minimum Wasserstein estimates
Magnetoencephalography (MEG) and electroencephalography (EEG) are noninvasive modalities that measure the weak electromagnetic fields generated by neural activity. Inferring the location of the current sources that generated these magnetic fields is an illposed inverse problem known as source imaging. When considering a group study, a baseline approach consists in carrying out the estimation of these sources independently for each subject. The illposedness of each problem is typically addressed using sparsity promoting regularizations. A straightforward way to define a common pattern for these sources is then to average them. A more advanced alternative relies on a joint localization of sources for all subjects taken together, by enforcing some similarity across all estimated sources. An important advantage of this approach is that it consists in a single estimation in which all measurements are pooled together, making the inverse problem better posed. Such a joint estimation poses however a few challenges, notably the selection of a valid regularizer that can quantify such spatial similarities. We propose in this work a new procedure that can do so while taking into account the geometrical structure of the cortex. We call this procedure Minimum Wasserstein Estimates (MWE). The benefits of this model are twofold. First, joint inference allows to pool together the data of different brain geometries, accumulating more spatial information. Second, MWE are defined through Optimal Transport (OT) metrics which provide a tool to model spatial proximity between cortical sources of different subjects, hence not enforcing identical source location in the group. These benefits allow MWE to be more accurate than standard MEG source localization techniques. To support these claims, we perform source localization on realistic MEG simulations based on forward operators derived from MRI scans. On a visual task dataset, we demonstrate how MWE infer neural patterns similar to functional Magnetic Resonance Imaging (fMRI) maps.
02/13/2019 ∙ by Hicham Janati, et al. ∙ 6 ∙ shareread it

Extracting Universal Representations of Cognition across BrainImaging Studies
The size of publicly available data in cognitive neuroimaging has increased a lot in recent years, thanks to strong research and community efforts. Exploiting this wealth of data demands new methods to turn the heterogeneous cognitive information held in different taskfMRI studies into commonuniversalcognitive models. In this paper, we pool data from large fMRI repositories to predict psychological conditions from statistical brain maps across different studies and subjects. We leverage advances in deep learning, intermediate representations and multitask learning to learn universal interpretable lowdimensional representations of brain images, usable for predicting psychological stimuli in all input studies. The method improves decoding performance for 80 flow from every study to the others: it notably gives a strong performance boost when decoding studies of small size. The trained lowdimensional representationtaskoptimized networksis interpretable as a set of basis cognitive dimensions relevant to meaningful categories of cognitive stimuli. Our approach opens new ways of extracting information from brain maps, overcoming the low power of typical fMRI studies.
09/17/2018 ∙ by Arthur Mensch, et al. ∙ 4 ∙ shareread it

Optimizing deep video representation to match brain activity
The comparison of observed brain activity with the statistics generated by artificial intelligence systems is useful to probe brain functional organization under ecological conditions. Here we study fMRI activity in ten subjects watching color natural movies and compute deep representations of these movies with an architecture that relies on optical flow and image content. The association of activity in visual areas with the different layers of the deep architecture displays complexityrelated contrasts across visual areas and reveals a striking foveal/peripheral dichotomy.
09/07/2018 ∙ by Hugo Richard, et al. ∙ 2 ∙ shareread it

Learning Neural Representations of Human Cognition across Many fMRI Studies
Cognitive neuroscience is enjoying rapid increase in extensive public brainimaging datasets. It opens the door to largescale statistical models. Finding a unified perspective for all available data calls for scalable and automated solutions to an old challenge: how to aggregate heterogeneous information on brain function into a universal cognitive system that relates mental operations/cognitive processes/psychological tasks to brain networks? We cast this challenge in a machinelearning approach to predict conditions from statistical brain maps across different studies. For this, we leverage multitask learning and multiscale dimension reduction to learn lowdimensional representations of brain images that carry cognitive information and can be robustly associated with psychological stimuli. Our multidataset classification model achieves the best prediction performance on several large reference datasets, compared to models without cognitiveaware lowdimension representations, it brings a substantial performance boost to the analysis of small datasets, and can be introspected to identify universal template cognitive concepts.
10/31/2017 ∙ by Arthur Mensch, et al. ∙ 0 ∙ shareread it

Stochastic Subsampling for Factorizing Huge Matrices
We present a matrixfactorization algorithm that scales to input matrices with both huge number of rows and columns. Learned factors may be sparse or dense and/or nonnegative, which makes our algorithm suitable for dictionary learning, sparse component analysis, and nonnegative matrix factorization. Our algorithm streams matrix columns while subsampling them to iteratively learn the matrix factors. At each iteration, the row dimension of a new sample is reduced by subsampling, resulting in lower time complexity compared to a simple streaming algorithm. Our method comes with convergence guarantees to reach a stationary point of the matrixfactorization problem. We demonstrate its efficiency on massive functional Magnetic Resonance Imaging data (2 TB), and on patches extracted from hyperspectral images (103 GB). For both problems, which involve different penalties on rows and columns, we obtain significant speedups compared to stateoftheart algorithms.
01/19/2017 ∙ by Arthur Mensch, et al. ∙ 0 ∙ shareread it

Deriving reproducible biomarkers from multisite restingstate data: An Autismbased example
Restingstate functional Magnetic Resonance Imaging (RfMRI) holds the promise to reveal functional biomarkers of neuropsychiatric disorders. However, extracting such biomarkers is challenging for complex multifaceted neuropathologies, such as autism spectrum disorders. Large multisite datasets increase sample sizes to compensate for this complexity, at the cost of uncontrolled heterogeneity. This heterogeneity raises new challenges, akin to those face in realistic diagnostic applications. Here, we demonstrate the feasibility of intersite classification of neuropsychiatric status, with an application to the Autism Brain Imaging Data Exchange (ABIDE) database, a large (N=871) multisite autism dataset. For this purpose, we investigate pipelines that extract the most predictive biomarkers from the data. These RfMRI pipelines build participantspecific connectomes from functionallydefined brain areas. Connectomes are then compared across participants to learn patterns of connectivity that differentiate typical controls from individuals with autism. We predict this neuropsychiatric status for participants from the same acquisition sites or different, unseen, ones. Good choices of methods for the various steps of the pipeline lead to 67 ABIDE data, which is significantly better than previously reported results. We perform extensive validation on multiple subsets of the data defined by different inclusion criteria. These enables detailed analysis of the factors contributing to successful connectomebased prediction. First, prediction accuracy improves as we include more subjects, up to the maximum amount of subjects available. Second, the definition of functional brain areas is of paramount importance for biomarker discovery: brain areas extracted from large RfMRI datasets outperform reference atlases in the classification tasks.
11/18/2016 ∙ by Alexandre Abraham, et al. ∙ 0 ∙ shareread it

Recursive nearest agglomeration (ReNA): fast clustering for approximation of structured signals
In this work, we revisit fast dimension reduction approaches, as with random projections and random sampling. Our goal is to summarize the data to decrease computational costs and memory footprint of subsequent analysis. Such dimension reduction can be very efficient when the signals of interest have a strong structure, such as with images. We focus on this setting and investigate feature clustering schemes for data reductions that capture this structure. An impediment to fast dimension reduction is that good clustering comes with large algorithmic costs. We address it by contributing a lineartime agglomerative clustering scheme, Recursive Nearest Agglomeration (ReNA). Unlike existing fast agglomerative schemes, it avoids the creation of giant clusters. We empirically validate that it approximates the data as well as traditional varianceminimizing clustering schemes that have a quadratic complexity. In addition, we analyze signal approximation with feature clustering and show that it can remove noise, improving subsequent analysis steps. As a consequence, data reduction by clustering features with ReNA yields very fast and accurate models, enabling to process large datasets on budget. Our theoretical analysis is backed by extensive experiments on publiclyavailable data that illustrate the computation efficiency and the denoising properties of the resulting dimension reduction scheme.
09/15/2016 ∙ by Andrés HoyosIdrobo, et al. ∙ 0 ∙ shareread it

Assessing and tuning brain decoders: crossvalidation, caveats, and guidelines
Decoding, ie prediction from brain images or signals, calls for empirical evaluation of its predictive power. Such evaluation is achieved via crossvalidation, a method also used to tune decoders' hyperparameters. This paper is a review on crossvalidation procedures for decoding in neuroimaging. It includes a didactic overview of the relevant theoretical considerations. Practical aspects are highlighted with an extensive empirical study of the common decoders in withinand acrosssubject predictions, on multiple datasets anatomical and functional MRI and MEG and simulations. Theory and experiments outline that the popular " leaveoneout " strategy leads to unstable and biased estimates, and a repeated random splits method should be preferred. Experiments outline the large error bars of crossvalidation in neuroimaging settings: typical confidence intervals of 10 crossvalidation can tune decoders' parameters while avoiding circularity bias. However we find that it can be more favorable to use sane defaults, in particular for nonsparse decoders.
06/16/2016 ∙ by Gaël Varoquaux, et al. ∙ 0 ∙ shareread it

Dictionary Learning for Massive Matrix Factorization
Sparse matrix factorization is a popular tool to obtain interpretable data decompositions, which are also effective to perform data completion or denoising. Its applicability to large datasets has been addressed with online and randomized methods, that reduce the complexity in one of the matrix dimension, but not in both of them. In this paper, we tackle very large matrices in both dimensions. We propose a new factorization method that scales gracefully to terabytescale datasets, that could not be processed by previous algorithms in a reasonable amount of time. We demonstrate the efficiency of our approach on massive functional Magnetic Resonance Imaging (fMRI) data, and on matrix completion problems for recommender systems, where we obtain significant speedups compared to stateofthe art coordinate descent methods.
05/03/2016 ∙ by Arthur Mensch, et al. ∙ 0 ∙ shareread it