IPD:An Incremental Prototype based DBSCAN for large-scale data with cluster representatives

02/16/2022
by   Jayasree Saha, et al.
0

DBSCAN is a fundamental density-based clustering technique that identifies any arbitrary shape of the clusters. However, it becomes infeasible while handling big data. On the other hand, centroid-based clustering is important for detecting patterns in a dataset since unprocessed data points can be labeled to their nearest centroid. However, it can not detect non-spherical clusters. For a large data, it is not feasible to store and compute labels of every samples. These can be done as and when the information is required. The purpose can be accomplished when clustering act as a tool to identify cluster representatives and query is served by assigning cluster labels of nearest representative. In this paper, we propose an Incremental Prototype-based DBSCAN (IPD) algorithm which is designed to identify arbitrary-shaped clusters for large-scale data. Additionally, it chooses a set of representatives for each cluster.

READ FULL TEXT

page 12

page 15

research
11/11/2020

Clustering of Big Data with Mixed Features

Clustering large, mixed data is a central problem in data mining. Many a...
research
12/27/2017

The information bottleneck and geometric clustering

The information bottleneck (IB) approach to clustering takes a joint dis...
research
11/02/2020

Ant Colony Inspired Machine Learning Algorithm for Identifying and Emulating Virtual Sensors

The scale of systems employed in industrial environments demands a large...
research
05/07/2014

Representative Selection for Big Data via Sparse Graph and Geodesic Grassmann Manifold Distance

This paper addresses the problem of identifying a very small subset of d...
research
10/16/2019

FISHDBC: Flexible, Incremental, Scalable, Hierarchical Density-Based Clustering for Arbitrary Data and Distance

FISHDBC is a flexible, incremental, scalable, and hierarchical density-b...
research
12/04/2014

Iterative Subsampling in Solution Path Clustering of Noisy Big Data

We develop an iterative subsampling approach to improve the computationa...
research
07/03/2021

Cluster Representatives Selection in Non-Metric Spaces for Nearest Prototype Classification

The nearest prototype classification is a less computationally intensive...

Please sign up or login with your details

Forgot password? Click here to reset