DeepAI AI Chat
Log In Sign Up

Clustering of Big Data with Mixed Features

11/11/2020
by   Joshua Tobin, et al.
0

Clustering large, mixed data is a central problem in data mining. Many approaches adopt the idea of k-means, and hence are sensitive to initialisation, detect only spherical clusters, and require a priori the unknown number of clusters. We here develop a new clustering algorithm for large data of mixed type, aiming at improving the applicability and efficiency of the peak-finding technique. The improvements are threefold: (1) the new algorithm is applicable to mixed data; (2) the algorithm is capable of detecting outliers and clusters of relatively lower density values; (3) the algorithm is competent at deciding the correct number of clusters. The computational complexity of the algorithm is greatly reduced by applying a fast k-nearest neighbors method and by scaling down to component sets. We present experimental results to verify that our algorithm works well in practice. Keywords: Clustering; Big Data; Mixed Attribute; Density Peaks; Nearest-Neighbor Graph; Conductance.

READ FULL TEXT
02/16/2022

IPD:An Incremental Prototype based DBSCAN for large-scale data with cluster representatives

DBSCAN is a fundamental density-based clustering technique that identifi...
07/04/2022

An Improved Probability Propagation Algorithm for Density Peak Clustering Based on Natural Nearest Neighborhood

Clustering by fast search and find of density peaks (DPC) (Since, 2014) ...
05/04/2017

Fast k-means based on KNN Graph

In the era of big data, k-means clustering has been widely adopted as a ...
06/03/2019

Big-Data Clustering: K-Means or K-Indicators?

The K-means algorithm is arguably the most popular data clustering metho...
02/22/2016

Recovering the number of clusters in data sets with noise features using feature rescaling factors

In this paper we introduce three methods for re-scaling data sets aiming...
04/21/2020

Revealing Cluster Structures Based on Mixed Sampling Frequencies

This paper proposes a new nonparametric mixed data sampling (MIDAS) mode...