The effect of measurement error on clustering algorithms

by   Paulina Pankowska, et al.

Clustering consists of a popular set of techniques used to separate data into interesting groups for further analysis. Many data sources on which clustering is performed are well-known to contain random and systematic measurement errors. Such errors may adversely affect clustering. While several techniques have been developed to deal with this problem, little is known about the effectiveness of these solutions. Moreover, no work to-date has examined the effect of systematic errors on clustering solutions. In this paper, we perform a Monte Carlo study to investigate the sensitivity of two common clustering algorithms, GMMs with merging and DBSCAN, to random and systematic error. We find that measurement error is particularly problematic when it is systematic and when it affects all variables in the dataset. For the conditions considered here, we also find that the partition-based GMM with merged components is less sensitive to measurement error than the density-based DBSCAN procedure.



There are no comments yet.


page 33


Removal of Batch Effects using Distribution-Matching Residual Networks

Sources of variability in experimentally derived data include measuremen...

Random errors are not politically neutral

Errors are inevitable in the implementation of any complex process. Here...

Measurement error induced by locational uncertainty when estimating discrete choice models with a distance as a regressor

Spatial microeconometric studies typically suffer from various forms of ...

Merging error analysis of name disambiguation based on author similarity

Falsely identifying different authors as one is called merging error in ...

Test-cost-sensitive attribute reduction of data with normal distribution measurement errors

The measurement error with normal distribution is universal in applicati...

Parameter identification in elasto-plasticity: distance between parameters and impact of measurement errors

A special aspect of parameter identification in finite-strain elasto-pla...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.