The effect of measurement error on clustering algorithms

05/24/2020
by   Paulina Pankowska, et al.
0

Clustering consists of a popular set of techniques used to separate data into interesting groups for further analysis. Many data sources on which clustering is performed are well-known to contain random and systematic measurement errors. Such errors may adversely affect clustering. While several techniques have been developed to deal with this problem, little is known about the effectiveness of these solutions. Moreover, no work to-date has examined the effect of systematic errors on clustering solutions. In this paper, we perform a Monte Carlo study to investigate the sensitivity of two common clustering algorithms, GMMs with merging and DBSCAN, to random and systematic error. We find that measurement error is particularly problematic when it is systematic and when it affects all variables in the dataset. For the conditions considered here, we also find that the partition-based GMM with merged components is less sensitive to measurement error than the density-based DBSCAN procedure.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 33

10/13/2016

Removal of Batch Effects using Distribution-Matching Residual Networks

Sources of variability in experimentally derived data include measuremen...
07/02/2020

Random errors are not politically neutral

Errors are inevitable in the implementation of any complex process. Here...
04/03/2019

Measurement error induced by locational uncertainty when estimating discrete choice models with a distance as a regressor

Spatial microeconometric studies typically suffer from various forms of ...
11/04/2017

Merging error analysis of name disambiguation based on author similarity

Falsely identifying different authors as one is called merging error in ...
09/29/2012

Test-cost-sensitive attribute reduction of data with normal distribution measurement errors

The measurement error with normal distribution is universal in applicati...
12/26/2018

Parameter identification in elasto-plasticity: distance between parameters and impact of measurement errors

A special aspect of parameter identification in finite-strain elasto-pla...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.