The effect of measurement error on clustering algorithms

05/24/2020
by   Paulina Pankowska, et al.
0

Clustering consists of a popular set of techniques used to separate data into interesting groups for further analysis. Many data sources on which clustering is performed are well-known to contain random and systematic measurement errors. Such errors may adversely affect clustering. While several techniques have been developed to deal with this problem, little is known about the effectiveness of these solutions. Moreover, no work to-date has examined the effect of systematic errors on clustering solutions. In this paper, we perform a Monte Carlo study to investigate the sensitivity of two common clustering algorithms, GMMs with merging and DBSCAN, to random and systematic error. We find that measurement error is particularly problematic when it is systematic and when it affects all variables in the dataset. For the conditions considered here, we also find that the partition-based GMM with merged components is less sensitive to measurement error than the density-based DBSCAN procedure.

READ FULL TEXT
research
10/13/2016

Removal of Batch Effects using Distribution-Matching Residual Networks

Sources of variability in experimentally derived data include measuremen...
research
07/02/2020

Random errors are not politically neutral

Errors are inevitable in the implementation of any complex process. Here...
research
04/03/2019

Measurement error induced by locational uncertainty when estimating discrete choice models with a distance as a regressor

Spatial microeconometric studies typically suffer from various forms of ...
research
11/04/2017

Merging error analysis of name disambiguation based on author similarity

Falsely identifying different authors as one is called merging error in ...
research
07/18/2023

Unbiased centroiding of point targets close to the Cramer Rao limit

This paper focuses on the achievable accuracy of center-of-gravity (CoG)...
research
12/26/2018

Parameter identification in elasto-plasticity: distance between parameters and impact of measurement errors

A special aspect of parameter identification in finite-strain elasto-pla...

Please sign up or login with your details

Forgot password? Click here to reset