Evaluating State-of-the-Art Classification Models Against Bayes Optimality

06/07/2021
by   Ryan Theisen, et al.
0

Evaluating the inherent difficulty of a given data-driven classification problem is important for establishing absolute benchmarks and evaluating progress in the field. To this end, a natural quantity to consider is the Bayes error, which measures the optimal classification error theoretically achievable for a given data distribution. While generally an intractable quantity, we show that we can compute the exact Bayes error of generative models learned using normalizing flows. Our technique relies on a fundamental result, which states that the Bayes error is invariant under invertible transformation. Therefore, we can compute the exact Bayes error of the learned flow models by computing it for Gaussian base distributions, which can be done efficiently using Holmes-Diaconis-Ross integration. Moreover, we show that by varying the temperature of the learned flow models, we can generate synthetic datasets that closely resemble standard benchmark datasets, but with almost any desired Bayes error. We use our approach to conduct a thorough investigation of state-of-the-art classification models, and find that in some – but not all – cases, these models are capable of obtaining accuracy very near optimal. Finally, we use our method to evaluate the intrinsic "hardness" of standard benchmark datasets, and classes within those datasets.

READ FULL TEXT

page 6

page 13

page 14

page 15

page 16

research
02/01/2022

Is the Performance of My Deep Network Too Good to Be True? A Direct Approach to Estimating the Bayes Error in Binary Classification

There is a fundamental limitation in the prediction performance that a m...
research
09/16/2019

Learning to Benchmark: Determining Best Achievable Misclassification Error from Training Data

We address the problem of learning to benchmark the best achievable clas...
research
07/01/2021

The Interplay between Distribution Parameters and the Accuracy-Robustness Tradeoff in Classification

Adversarial training tends to result in models that are less accurate on...
research
01/20/2014

Generalized Bhattacharyya and Chernoff upper bounds on Bayes error using quasi-arithmetic means

Bayesian classification labels observations based on given prior informa...
research
05/15/2011

Bounds on the Bayes Error Given Moments

We show how to compute lower bounds for the supremum Bayes error if the ...
research
07/01/2021

Normalizing Flow based Hidden Markov Models for Classification of Speech Phones with Explainability

In pursuit of explainability, we develop generative models for sequentia...
research
06/11/2020

Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization

We consider a commonly studied supervised classification of a synthetic ...

Please sign up or login with your details

Forgot password? Click here to reset