Subspace Clustering of Very Sparse High-Dimensional Data

01/25/2019
by   Hankui Peng, et al.
0

In this paper we consider the problem of clustering collections of very short texts using subspace clustering. This problem arises in many applications such as product categorisation, fraud detection, and sentiment analysis. The main challenge lies in the fact that the vectorial representation of short texts is both high-dimensional, due to the large number of unique terms in the corpus, and extremely sparse, as each text contains a very small number of words with no repetition. We propose a new, simple subspace clustering algorithm that relies on linear algebra to cluster such datasets. Experimental results on identifying product categories from product names obtained from the US Amazon website indicate that the algorithm can be competitive against state-of-the-art clustering algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/05/2012

Sparse Subspace Clustering: Algorithm, Theory, and Applications

In many real-world problems, we are dealing with collections of high-dim...
research
03/15/2018

Fast Subspace Clustering Based on the Kronecker Product

Subspace clustering is a useful technique for many computer vision appli...
research
01/31/2020

Enhancement of Short Text Clustering by Iterative Classification

Short text clustering is a challenging task due to the lack of signal co...
research
03/05/2012

Subspace clustering of high-dimensional data: a predictive approach

In several application domains, high-dimensional observations are collec...
research
10/25/2012

A Biomimetic Approach Based on Immune Systems for Classification of Unstructured Data

In this paper we present the results of unstructured data clustering in ...
research
10/09/2017

Toward Multi-Diversified Ensemble Clustering of High-Dimensional Data

The emergence of high-dimensional data in various areas has brought new ...
research
11/03/2020

Kernel Two-Dimensional Ridge Regression for Subspace Clustering

Subspace clustering methods have been widely studied recently. When the ...

Please sign up or login with your details

Forgot password? Click here to reset