Information-theoretic Classification Accuracy: A Criterion that Guides Data-driven Combination of Ambiguous Outcome Labels in Multi-class Classification

09/01/2021
by   Chihao Zhang, et al.
0

Outcome labeling ambiguity and subjectivity are ubiquitous in real-world datasets. While practitioners commonly combine ambiguous outcome labels in an ad hoc way to improve the accuracy of multi-class classification, there lacks a principled approach to guide label combination by any optimality criterion. To address this problem, we propose the information-theoretic classification accuracy (ITCA), a criterion of outcome "information" conditional on outcome prediction, to guide practitioners on how to combine ambiguous outcome labels. ITCA indicates a balance in the trade-off between prediction accuracy (how well do predicted labels agree with actual labels) and prediction resolution (how many labels are predictable). To find the optimal label combination indicated by ITCA, we develop two search strategies: greedy search and breadth-first search. Notably, ITCA and the two search strategies are adaptive to all machine-learning classification algorithms. Coupled with a classification algorithm and a search strategy, ITCA has two uses: to improve prediction accuracy and to identify ambiguous labels. We first verify that ITCA achieves high accuracy with both search strategies in finding the correct label combinations on synthetic and real data. Then we demonstrate the effectiveness of ITCA in diverse applications including medical prognosis, cancer survival prediction, user demographics prediction, and cell type classification.

READ FULL TEXT

page 35

page 36

research
06/09/2022

Multi-class Classification with Fuzzy-feature Observations: Theory and Algorithms

The theoretical analysis of multi-class classification has proved that t...
research
02/24/2021

Set-valued classification – overview via a unified framework

Multi-class classification problem is among the most popular and well-st...
research
02/01/2023

Learning from Stochastic Labels

Annotating multi-class instances is a crucial task in the field of machi...
research
07/13/2022

Is one annotation enough? A data-centric image classification benchmark for noisy and ambiguous label estimation

High-quality data is necessary for modern machine learning. However, the...
research
06/19/2019

Efficient Algorithms for Set-Valued Prediction in Multi-Class Classification

In cases of uncertainty, a multi-class classifier preferably returns a s...
research
09/30/2022

Improve learning combining crowdsourced labels by weighting Areas Under the Margin

In supervised learning – for instance in image classification – modern m...
research
12/16/2021

Classification Under Ambiguity: When Is Average-K Better Than Top-K?

When many labels are possible, choosing a single one can lead to low pre...

Please sign up or login with your details

Forgot password? Click here to reset