On the bias of H-scores for comparing biclusters, and how to correct it

07/24/2019
by   Jacopo Di Iorio, et al.
6

In the last two decades several biclustering methods have been developed as new unsupervised learning techniques to simultaneously cluster rows and columns of a data matrix. These algorithms play a central role in contemporary machine learning and in many applications, e.g. to computational biology and bioinformatics. The H-score is the evaluation score underlying the seminal biclustering algorithm by Cheng and Church, as well as many other subsequent biclustering methods. In this paper, we characterize a potentially troublesome bias in this score, that can distort biclustering results. We prove, both analytically and by simulation, that the average H-score increases with the number of rows/columns in a bicluster. This makes the H-score, and hence all algorithms based on it, biased towards small clusters. Based on our analytical proof, we are able to provide a straightforward way to correct this bias, allowing users to accurately compare biclusters.

READ FULL TEXT

page 5

page 6

research
08/07/2020

Clustering, multicollinearity, and singular vectors

Let A be a matrix with its pseudo-matrix A^† and set S=I-A^†A. We prove ...
research
07/08/2021

The Three Ensemble Clustering (3EC) Algorithm for Pattern Discovery in Unsupervised Learning

This paper presents a multiple learner algorithm called the 'Three Ensem...
research
10/31/2017

Calibration for Stratified Classification Models

In classification problems, sampling bias between training data and test...
research
05/31/2022

Social Bias Meets Data Bias: The Impacts of Labeling and Measurement Errors on Fairness Criteria

Although many fairness criteria have been proposed to ensure that machin...
research
01/09/2018

Robust Propensity Score Computation Method based on Machine Learning with Label-corrupted Data

In biostatistics, propensity score is a common approach to analyze the i...
research
12/11/2002

Technical Note: Bias and the Quantification of Stability

Research on bias in machine learning algorithms has generally been conce...

Please sign up or login with your details

Forgot password? Click here to reset