Are Labels Necessary for Classifier Accuracy Evaluation?

07/06/2020
by   Weijian Deng, et al.
0

To calculate the model accuracy on a computer vision task, e.g., object recognition, we usually require a test set composed of test samples and their ground truth labels. Whilst standard usage cases satisfy this requirement, many real-world scenarios involve unlabeled test data, rendering common model evaluation methods infeasible. We investigate this important and under-explored problem, Automatic model Evaluation (AutoEval). Specifically, given a labeled training set and a model, we aim to estimate the model accuracy on unlabeled test datasets. We construct a meta-dataset: a dataset comprised of datasets generated from the original training set via various image transformations such as rotation, background substitution, foreground scaling, etc. As the classification accuracy of the model on each sample (dataset) is known from the original dataset labels, our task can be solved via regression. Using the feature statistics to represent the distribution of a sample dataset, we can train regression techniques (e.g., a regression neural network) to predict model performance. Using synthetic meta-dataset and real-world datasets in training and testing, respectively, we report reasonable and promising estimates of the model accuracy. We also provide insights into the application scope, limitation, and future directions of AutoEval.

READ FULL TEXT

page 16

page 17

research
06/10/2021

What Does Rotation Prediction Tell Us about Classifier Accuracy under Varying Testing Environments?

Understanding classifier decision under novel environments is central to...
research
12/08/2022

Leveraging Unlabeled Data to Track Memorization

Deep neural networks may easily memorize noisy labels present in real-wo...
research
01/29/2023

Diverse, Difficult, and Odd Instances (D2O): A New Test Set for Object Classification

Test sets are an integral part of evaluating models and gauging progress...
research
04/17/2023

K-means Clustering Based Feature Consistency Alignment for Label-free Model Evaluation

The label-free model evaluation aims to predict the model performance on...
research
06/26/2019

Near Optimal Stratified Sampling

The performance of a machine learning system is usually evaluated by usi...
research
03/09/2017

Detecting Sockpuppets in Deceptive Opinion Spam

This paper explores the problem of sockpuppet detection in deceptive opi...
research
06/16/2019

Floors are Flat: Leveraging Semantics for Real-Time Surface Normal Prediction

We propose 4 insights that help to significantly improve the performance...

Please sign up or login with your details

Forgot password? Click here to reset