Comprehensive Assessment of the Performance of Deep Learning Classifiers Reveals a Surprising Lack of Robustness

08/08/2023
by   Michael W. Spratling, et al.
0

Reliable and robust evaluation methods are a necessary first step towards developing machine learning models that are themselves robust and reliable. Unfortunately, current evaluation protocols typically used to assess classifiers fail to comprehensively evaluate performance as they tend to rely on limited types of test data, and ignore others. For example, using the standard test data fails to evaluate the predictions made by the classifier to samples from classes it was not trained on. On the other hand, testing with data containing samples from unknown classes fails to evaluate how well the classifier can predict the labels for known classes. This article advocates bench-marking performance using a wide range of different types of data and using a single metric that can be applied to all such data types to produce a consistent evaluation of performance. Using such a benchmark it is found that current deep neural networks, including those trained with methods that are believed to produce state-of-the-art robustness, are extremely vulnerable to making mistakes on certain types of data. This means that such models will be unreliable in real-world scenarios where they may encounter data from many different domains, and that they are insecure as they can easily be fooled into making the wrong decisions. It is hoped that these results will motivate the wider adoption of more comprehensive testing methods that will, in turn, lead to the development of more robust machine learning methods in the future. Code is available at: <https://codeberg.org/mwspratling/RobustnessEvaluation>

READ FULL TEXT
research
06/08/2021

Provably Robust Detection of Out-of-distribution Data (almost) for free

When applying machine learning in safety-critical systems, a reliable as...
research
10/13/2022

Large-Scale Open-Set Classification Protocols for ImageNet

Open-Set Classification (OSC) intends to adapt closed-set classification...
research
07/13/2021

What classifiers know what they don't?

Being uncertain when facing the unknown is key to intelligent decision m...
research
03/10/2023

What is the state of the art? Accounting for multiplicity in machine learning benchmark performance

Machine learning methods are commonly evaluated and compared by their pe...
research
07/06/2021

An Evaluation of Machine Learning and Deep Learning Models for Drought Prediction using Weather Data

Drought is a serious natural disaster that has a long duration and a wid...
research
12/10/2020

Learn what you can't learn: Regularized Ensembles for Transductive Out-of-distribution Detection

Machine learning models are often used in practice if they achieve good ...
research
04/22/2022

Metric Learning and Adaptive Boundary for Out-of-Domain Detection

Conversational agents are usually designed for closed-world environments...

Please sign up or login with your details

Forgot password? Click here to reset