DeepAI AI Chat
Log In Sign Up

Clustering of Big Data with Mixed Features

by   Joshua Tobin, et al.

Clustering large, mixed data is a central problem in data mining. Many approaches adopt the idea of k-means, and hence are sensitive to initialisation, detect only spherical clusters, and require a priori the unknown number of clusters. We here develop a new clustering algorithm for large data of mixed type, aiming at improving the applicability and efficiency of the peak-finding technique. The improvements are threefold: (1) the new algorithm is applicable to mixed data; (2) the algorithm is capable of detecting outliers and clusters of relatively lower density values; (3) the algorithm is competent at deciding the correct number of clusters. The computational complexity of the algorithm is greatly reduced by applying a fast k-nearest neighbors method and by scaling down to component sets. We present experimental results to verify that our algorithm works well in practice. Keywords: Clustering; Big Data; Mixed Attribute; Density Peaks; Nearest-Neighbor Graph; Conductance.


IPD:An Incremental Prototype based DBSCAN for large-scale data with cluster representatives

DBSCAN is a fundamental density-based clustering technique that identifi...

An Improved Probability Propagation Algorithm for Density Peak Clustering Based on Natural Nearest Neighborhood

Clustering by fast search and find of density peaks (DPC) (Since, 2014) ...

Fast k-means based on KNN Graph

In the era of big data, k-means clustering has been widely adopted as a ...

Big-Data Clustering: K-Means or K-Indicators?

The K-means algorithm is arguably the most popular data clustering metho...

Recovering the number of clusters in data sets with noise features using feature rescaling factors

In this paper we introduce three methods for re-scaling data sets aiming...

Revealing Cluster Structures Based on Mixed Sampling Frequencies

This paper proposes a new nonparametric mixed data sampling (MIDAS) mode...