What classifiers know what they don't?

by   Mohamed Ishmael Belghazi, et al.

Being uncertain when facing the unknown is key to intelligent decision making. However, machine learning algorithms lack reliable estimates about their predictive uncertainty. This leads to wrong and overly-confident decisions when encountering classes unseen during training. Despite the importance of equipping classifiers with uncertainty estimates ready for the real world, prior work has focused on small datasets and little or no class discrepancy between training and testing data. To close this gap, we introduce UIMNET: a realistic, ImageNet-scale test-bed to evaluate predictive uncertainty estimates for deep image classifiers. Our benchmark provides implementations of eight state-of-the-art algorithms, six uncertainty measures, four in-domain metrics, three out-domain metrics, and a fully automated pipeline to train, calibrate, ensemble, select, and evaluate models. Our test-bed is open-source and all of our results are reproducible from a fixed commit in our repository. Adding new datasets, algorithms, measures, or metrics is a matter of a few lines of code-in so hoping that UIMNET becomes a stepping stone towards realistic, rigorous, and reproducible research in uncertainty estimation. Our results show that ensembles of ERM classifiers as well as single MIMO classifiers are the two best alternatives currently available to measure uncertainty about both in-domain and out-domain classes.


page 1

page 2

page 3

page 4


Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning

Uncertainty estimation and ensembling methods go hand-in-hand. Uncertain...

Layer Ensembles: A Single-Pass Uncertainty Estimation in Deep Learning for Segmentation

Uncertainty estimation in deep learning has become a leading research fi...

Can uncertainty boost the reliability of AI-based diagnostic methods in digital pathology?

Deep learning (DL) has shown great potential in digital pathology applic...

Comprehensive Assessment of the Performance of Deep Learning Classifiers Reveals a Surprising Lack of Robustness

Reliable and robust evaluation methods are a necessary first step toward...

MOB-ESP and other Improvements in Probability Estimation

A key prerequisite to optimal reasoning under uncertainty in intelligent...

Rashomon Capacity: A Metric for Predictive Multiplicity in Probabilistic Classification

Predictive multiplicity occurs when classification models with nearly in...

Please sign up or login with your details

Forgot password? Click here to reset