Large-scale subspace clustering using sketching and validation

10/06/2015
by   Panagiotis A. Traganitis, et al.
0

The nowadays massive amounts of generated and communicated data present major challenges in their processing. While capable of successfully classifying nonlinearly separable objects in various settings, subspace clustering (SC) methods incur prohibitively high computational complexity when processing large-scale data. Inspired by the random sampling and consensus (RANSAC) approach to robust regression, the present paper introduces a randomized scheme for SC, termed sketching and validation (SkeVa-)SC, tailored for large-scale data. At the heart of SkeVa-SC lies a randomized scheme for approximating the underlying probability density function of the observed data by kernel smoothing arguments. Sparsity in data representations is also exploited to reduce the computational burden of SC, while achieving high clustering accuracy. Performance analysis as well as extensive numerical tests on synthetic and real data corroborate the potential of SkeVa-SC and its competitive performance relative to state-of-the-art scalable SC approaches. Keywords: Subspace clustering, big data, kernel smoothing, randomization, sketching, validation, sparsity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/22/2017

Sketched Subspace Clustering

The immense amount of daily generated and communicated data presents uni...
research
09/18/2020

Probabilistically Sampled and Spectrally Clustered Plant Genotypes using Phenotypic Characteristics

Clustering genotypes based upon their phenotypic characteristics is used...
research
01/22/2015

Sketch and Validate for Big Data Clustering

In response to the need for learning tools tuned to big data analytics, ...
research
06/24/2022

SC-Ques: A Sentence Completion Question Dataset for English as a Second Language Learners

Sentence completion (SC) questions present a sentence with one or more b...
research
01/23/2021

A Design-Based Perspective on Synthetic Control Methods

Since their introduction in Abadie and Gardeazabal (2003), Synthetic Con...
research
08/23/2021

Cube Sampled K-Prototype Clustering for Featured Data

Clustering large amount of data is becoming increasingly important in th...
research
05/04/2021

Modern Subsampling Methods for Large-Scale Least Squares Regression

Subsampling methods aim to select a subsample as a surrogate for the obs...

Please sign up or login with your details

Forgot password? Click here to reset