Fast clustering for scalable statistical analysis on structured images

11/16/2015
by   Bertrand Thirion, et al.
0

The use of brain images as markers for diseases or behavioral differences is challenged by the small effects size and the ensuing lack of power, an issue that has incited researchers to rely more systematically on large cohorts. Coupled with resolution increases, this leads to very large datasets. A striking example in the case of brain imaging is that of the Human Connectome Project: 20 Terabytes of data and growing. The resulting data deluge poses severe challenges regarding the tractability of some processing steps (discriminant analysis, multivariate models) due to the memory demands posed by these data. In this work, we revisit dimension reduction approaches, such as random projections, with the aim of replacing costly function evaluations by cheaper ones while decreasing the memory requirements. Specifically, we investigate the use of alternate schemes, based on fast clustering, that are well suited for signals exhibiting a strong spatial structure, such as anatomical and functional brain images. Our contribution is twofold: i) we propose a linear-time clustering scheme that bypasses the percolation issues inherent in these algorithms and thus provides compressions nearly as good as traditional quadratic-complexity variance-minimizing clustering schemes, ii) we show that cluster-based compression can have the virtuous effect of removing high-frequency noise, actually improving subsequent estimations steps. As a consequence, the proposed approach yields very accurate models on several large-scale problems yet with impressive gains in computational efficiency, making it possible to analyze large datasets.

READ FULL TEXT
research
09/15/2016

Recursive nearest agglomeration (ReNA): fast clustering for approximation of structured signals

-In this work, we revisit fast dimension reduction approaches, as with r...
research
07/11/2022

Fast Density-Peaks Clustering: Multicore-based Parallelization Approach

Clustering multi-dimensional points is a fundamental task in many fields...
research
10/05/2021

Fast and Interpretable Consensus Clustering via Minipatch Learning

Consensus clustering has been widely used in bioinformatics and other ap...
research
03/01/2020

Statistical power for cluster analysis

Cluster algorithms are gaining in popularity due to their compelling abi...
research
09/15/2022

Improved fMRI-based Pain Prediction using Bayesian Group-wise Functional Registration

In recent years, neuroimaging has undergone a paradigm shift, moving awa...
research
09/24/2012

Improving accuracy and power with transfer learning using a meta-analytic database

Typical cohorts in brain imaging studies are not large enough for system...
research
02/01/2019

Accuracy Evaluation of Overlapping and Multi-resolution Clustering Algorithms on Large Datasets

Performance of clustering algorithms is evaluated with the help of accur...

Please sign up or login with your details

Forgot password? Click here to reset