Selection of the Number of Clusters in Functional Data Analysis

05/02/2019
by   Adriano Zanin Zambom, et al.
0

Identifying the number K of clusters in a dataset is one of the most difficult problems in clustering analysis. A choice of K that correctly characterizes the features of the data is essential for building meaningful clusters. In this paper we tackle the problem of estimating the number of clusters in functional data analysis by introducing a new measure that can be used with different procedures in selecting the optimal K. The main idea is to use a combination of two test statistics, which measure the lack of parallelism and the mean distance between curves, to compute criteria such as the within and between cluster sum of squares. Simulations in challenging scenarios suggest that procedures using this measure can detect the correct number of clusters more frequently than existing methods in the literature. The application of the proposed method is illustrated on several real datasets.

READ FULL TEXT
research
02/05/2020

Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes

A key issue in cluster analysis is the choice of an appropriate clusteri...
research
07/02/2014

Nonparametric Hierarchical Clustering of Functional Data

In this paper, we deal with the problem of curves clustering. We propose...
research
09/24/2017

Interdependence of clusters measures and distance distribution in compact metric spaces

A compact metric space (X, ρ) is given. Let μ be a Borel measure on X. B...
research
07/26/2022

An Effective Method for Identifying Clusters of Robot Strengths

In the analysis of qualification data from the FIRST Robotics Competitio...
research
12/13/2011

Large Scale Correlation Clustering Optimization

Clustering is a fundamental task in unsupervised learning. The focus of ...
research
09/28/2021

An exact test for significance of clusters in binary data

Unsupervised clustering of feature matrix data is an indispensible techn...
research
04/13/2013

Identification of relevant subtypes via preweighted sparse clustering

Cluster analysis methods are used to identify homogeneous subgroups in a...

Please sign up or login with your details

Forgot password? Click here to reset