Decorrelated Clustering with Data Selection Bias

06/29/2020
by   Xiao Wang, et al.
0

Most of existing clustering algorithms are proposed without considering the selection bias in data. In many real applications, however, one cannot guarantee the data is unbiased. Selection bias might bring the unexpected correlation between features and ignoring those unexpected correlations will hurt the performance of clustering algorithms. Therefore, how to remove those unexpected correlations induced by selection bias is extremely important yet largely unexplored for clustering. In this paper, we propose a novel Decorrelation regularized K-Means algorithm (DCKM) for clustering with data selection bias. Specifically, the decorrelation regularizer aims to learn the global sample weights which are capable of balancing the sample distribution, so as to remove unexpected correlations among features. Meanwhile, the learned weights are combined with k-means, which makes the reweighted k-means cluster on the inherent data distribution without unexpected correlation influence. Moreover, we derive the updating rules to effectively infer the parameters in DCKM. Extensive experiments results on real world datasets well demonstrate that our DCKM algorithm achieves significant performance gains, indicating the necessity of removing unexpected feature correlations induced by selection bias when clustering.

READ FULL TEXT
research
08/22/2022

Selection Collider Bias in Large Language Models

In this paper we motivate the causal mechanisms behind sample selection ...
research
09/30/2022

Exploiting Selection Bias on Underspecified Tasks in Large Language Models

In this paper we motivate the causal mechanisms behind sample selection ...
research
05/25/2023

A Robust Classifier Under Missing-Not-At-Random Sample Selection Bias

The shift between the training and testing distributions is commonly due...
research
09/14/2023

On Prediction Feature Assignment in the Heckman Selection Model

Under missing-not-at-random (MNAR) sample selection bias, the performanc...
research
08/23/2021

BiaSwap: Removing dataset bias with bias-tailored swapping augmentation

Deep neural networks often make decisions based on the spurious correlat...
research
08/20/2021

Parameters not identifiable or distinguishable from data, including correlation between Gaussian observations

It is shown that some theoretically identifiable parameters cannot be id...
research
08/21/2023

Spurious Correlations and Where to Find Them

Spurious correlations occur when a model learns unreliable features from...

Please sign up or login with your details

Forgot password? Click here to reset