Large-scale subspace clustering using sketching and validation

by   Panagiotis A. Traganitis, et al.

The nowadays massive amounts of generated and communicated data present major challenges in their processing. While capable of successfully classifying nonlinearly separable objects in various settings, subspace clustering (SC) methods incur prohibitively high computational complexity when processing large-scale data. Inspired by the random sampling and consensus (RANSAC) approach to robust regression, the present paper introduces a randomized scheme for SC, termed sketching and validation (SkeVa-)SC, tailored for large-scale data. At the heart of SkeVa-SC lies a randomized scheme for approximating the underlying probability density function of the observed data by kernel smoothing arguments. Sparsity in data representations is also exploited to reduce the computational burden of SC, while achieving high clustering accuracy. Performance analysis as well as extensive numerical tests on synthetic and real data corroborate the potential of SkeVa-SC and its competitive performance relative to state-of-the-art scalable SC approaches. Keywords: Subspace clustering, big data, kernel smoothing, randomization, sketching, validation, sparsity.



There are no comments yet.


page 1

page 2

page 3

page 4


Sketched Subspace Clustering

The immense amount of daily generated and communicated data presents uni...

Probabilistically Sampled and Spectrally Clustered Plant Genotypes using Phenotypic Characteristics

Clustering genotypes based upon their phenotypic characteristics is used...

Sketch and Validate for Big Data Clustering

In response to the need for learning tools tuned to big data analytics, ...

Large-Scale Sparse Subspace Clustering Using Landmarks

Subspace clustering methods based on expressing each data point as a lin...

A Design-Based Perspective on Synthetic Control Methods

Since their introduction in Abadie and Gardeazabal (2003), Synthetic Con...

Large-scale Multi-view Subspace Clustering in Linear Time

A plethora of multi-view subspace clustering (MVSC) methods have been pr...

Modern Subsampling Methods for Large-Scale Least Squares Regression

Subsampling methods aim to select a subsample as a surrogate for the obs...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.