Calibrate: Interactive Analysis of Probabilistic Model Output

07/27/2022
by   Peter Xenopoulos, et al.
1

Analyzing classification model performance is a crucial task for machine learning practitioners. While practitioners often use count-based metrics derived from confusion matrices, like accuracy, many applications, such as weather prediction, sports betting, or patient risk prediction, rely on a classifier's predicted probabilities rather than predicted labels. In these instances, practitioners are concerned with producing a calibrated model, that is, one which outputs probabilities that reflect those of the true distribution. Model calibration is often analyzed visually, through static reliability diagrams, however, the traditional calibration visualization may suffer from a variety of drawbacks due to the strong aggregations it necessitates. Furthermore, count-based approaches are unable to sufficiently analyze model calibration. We present Calibrate, an interactive reliability diagram that addresses the aforementioned issues. Calibrate constructs a reliability diagram that is resistant to drawbacks in traditional approaches, and allows for interactive subgroup analysis and instance-level inspection. We demonstrate the utility of Calibrate through use cases on both real-world and synthetic data. We further validate Calibrate by presenting the results of a think-aloud experiment with data scientists who routinely analyze model calibration.

READ FULL TEXT
research
05/23/2022

What is Your Metric Telling You? Evaluating Classifier Calibration under Context-Specific Definitions of Reliability

Classifier calibration has received recent attention from the machine le...
research
09/21/2023

Smooth ECE: Principled Reliability Diagrams via Kernel Smoothing

Calibration measures and reliability diagrams are two fundamental tools ...
research
10/24/2021

Neo: Generalizing Confusion Matrix Visualization to Hierarchical and Multi-Output Labels

The confusion matrix, a ubiquitous visualization for helping people eval...
research
04/18/2022

Trinary Tools for Continuously Valued Binary Classifiers

Classification methods for binary (yes/no) tasks often produce a continu...
research
02/03/2022

Hidden Heterogeneity: When to Choose Similarity-Based Calibration

Trustworthy classifiers are essential to the adoption of machine learnin...
research
08/07/2020

Evaluating probabilistic classifiers: Reliability diagrams and score decompositions revisited

A probability forecast or probabilistic classifier is reliable or calibr...
research
05/29/2023

Parity Calibration

In a sequential regression setting, a decision-maker may be primarily co...

Please sign up or login with your details

Forgot password? Click here to reset