Scalable Bottom-up Subspace Clustering using FP-Trees for High Dimensional Data

11/07/2018
by   Minh Tuan Doan, et al.
12

Subspace clustering aims to find groups of similar objects (clusters) that exist in lower dimensional subspaces from a high dimensional dataset. It has a wide range of applications, such as analysing high dimensional sensor data or DNA sequences. However, existing algorithms have limitations in finding clusters in non-disjoint subspaces and scaling to large data, which impinge their applicability in areas such as bioinformatics and the Internet of Things. We aim to address such limitations by proposing a subspace clustering algorithm using a bottom-up strategy. Our algorithm first searches for base clusters in low dimensional subspaces. It then forms clusters in higher-dimensional subspaces using these base clusters, which we formulate as a frequent pattern mining problem. This formulation enables efficient search for clusters in higher-dimensional subspaces, which is done using FP-trees. The proposed algorithm is evaluated against traditional bottom-up clustering algorithms and state-of-the-art subspace clustering algorithms. The experimental results show that the proposed algorithm produces clusters with high accuracy, and scales well to large volumes of data. We also demonstrate the algorithm's performance using real-life data, including ten genomic datasets and a car parking occupancy dataset.

READ FULL TEXT

page 1

page 2

page 8

research
08/01/2013

Learning Robust Subspace Clustering

We propose a low-rank transformation-learning framework to robustify sub...
research
08/28/2018

Probabilistic Sparse Subspace Clustering Using Delayed Association

Discovering and clustering subspaces in high-dimensional data is a funda...
research
10/09/2017

Toward Multi-Diversified Ensemble Clustering of High-Dimensional Data

The emergence of high-dimensional data in various areas has brought new ...
research
10/27/2021

Mining frequency-based sequential trajectory co-clusters

Co-clustering is a specific type of clustering that addresses the proble...
research
11/13/2020

Efficient Subspace Search in Data Streams

In the real world, data streams are ubiquitous – think of network traffi...
research
06/26/2017

Efficient Manifold and Subspace Approximations with Spherelets

Data lying in a high-dimensional ambient space are commonly thought to h...
research
11/18/2019

Subspace Shapes: Enhancing High-Dimensional Subspace Structures via Ambient Occlusion Shading

We test the hypothesis whether transforming a data matrix into a 3D shad...

Please sign up or login with your details

Forgot password? Click here to reset