Recursive nearest agglomeration (ReNA): fast clustering for approximation of structured signals

09/15/2016
by   Andrés Hoyos-Idrobo, et al.
0

-In this work, we revisit fast dimension reduction approaches, as with random projections and random sampling. Our goal is to summarize the data to decrease computational costs and memory footprint of subsequent analysis. Such dimension reduction can be very efficient when the signals of interest have a strong structure, such as with images. We focus on this setting and investigate feature clustering schemes for data reductions that capture this structure. An impediment to fast dimension reduction is that good clustering comes with large algorithmic costs. We address it by contributing a linear-time agglomerative clustering scheme, Recursive Nearest Agglomeration (ReNA). Unlike existing fast agglomerative schemes, it avoids the creation of giant clusters. We empirically validate that it approximates the data as well as traditional variance-minimizing clustering schemes that have a quadratic complexity. In addition, we analyze signal approximation with feature clustering and show that it can remove noise, improving subsequent analysis steps. As a consequence, data reduction by clustering features with ReNA yields very fast and accurate models, enabling to process large datasets on budget. Our theoretical analysis is backed by extensive experiments on publicly-available data that illustrate the computation efficiency and the denoising properties of the resulting dimension reduction scheme.

READ FULL TEXT

page 4

page 6

page 8

page 11

page 12

research
11/16/2015

Fast clustering for scalable statistical analysis on structured images

The use of brain images as markers for diseases or behavioral difference...
research
08/07/2015

Dimension reduction for model-based clustering

We introduce a dimension reduction method for visualizing the clustering...
research
02/17/2022

Dimension Reduction via Supervised Clustering of Regression Coefficients: A Review

The development and use of dimension reduction methods is prevalent in m...
research
11/07/2012

Randomized Dimension Reduction on Massive Data

Scalability of statistical estimators is of increasing importance in mod...
research
09/04/2017

Persistent homology for low-complexity models

We show that recent results on randomized dimension reduction schemes th...
research
11/28/2018

orthoDr: semiparametric dimension reduction via orthogonality constrained optimization

orthoDr is a package in R that solves dimension reduction problems using...
research
04/20/2023

Ellipsoid fitting with the Cayley transform

We introduce an algorithm, Cayley transform ellipsoid fitting (CTEF), th...

Please sign up or login with your details

Forgot password? Click here to reset