A Principled Evaluation Protocol for Comparative Investigation of the Effectiveness of DNN Classification Models on Similar-but-non-identical Datasets

09/05/2022
by   Esla Timothy Anzaku, et al.
6

Deep Neural Network (DNN) models are increasingly evaluated using new replication test datasets, which have been carefully created to be similar to older and popular benchmark datasets. However, running counter to expectations, DNN classification models show significant, consistent, and largely unexplained degradation in accuracy on these replication test datasets. While the popular evaluation approach is to assess the accuracy of a model by making use of all the datapoints available in the respective test datasets, we argue that doing so hinders us from adequately capturing the behavior of DNN models and from having realistic expectations about their accuracy. Therefore, we propose a principled evaluation protocol that is suitable for performing comparative investigations of the accuracy of a DNN model on multiple test datasets, leveraging subsets of datapoints that can be selected using different criteria, including uncertainty-related information. By making use of this new evaluation protocol, we determined the accuracy of 564 DNN models on both (1) the CIFAR-10 and ImageNet datasets and (2) their replication datasets. Our experimental results indicate that the observed accuracy degradation between established benchmark datasets and their replications is consistently lower (that is, models do perform better on the replication test datasets) than the accuracy degradation reported in published works, with these published works relying on conventional evaluation approaches that do not utilize uncertainty-related information.

READ FULL TEXT

page 2

page 6

page 8

page 9

page 15

research
05/19/2020

Identifying Statistical Bias in Dataset Replication

Dataset replication is a useful tool for assessing whether improvements ...
research
11/21/2022

LHDR: HDR Reconstruction for Legacy Content using a Lightweight DNN

High dynamic range (HDR) image is widely-used in graphics and photograph...
research
06/23/2020

Hermes Attack: Steal DNN Models with Lossless Inference Accuracy

Deep Neural Networks (DNNs) models become one of the most valuable enter...
research
05/20/2019

A comprehensive, application-oriented study of catastrophic forgetting in DNNs

We present a large-scale empirical study of catastrophic forgetting (CF)...
research
05/23/2023

Leveraging Uncertainty Quantification for Picking Robust First Break Times

In seismic exploration, the selection of first break times is a crucial ...

Please sign up or login with your details

Forgot password? Click here to reset