Low-Shot Validation: Active Importance Sampling for Estimating Classifier Performance on Rare Categories

09/13/2021
by   Fait Poms, et al.
0

For machine learning models trained with limited labeled training data, validation stands to become the main bottleneck to reducing overall annotation costs. We propose a statistical validation algorithm that accurately estimates the F-score of binary classifiers for rare categories, where finding relevant examples to evaluate on is particularly challenging. Our key insight is that simultaneous calibration and importance sampling enables accurate estimates even in the low-sample regime (< 300 samples). Critically, we also derive an accurate single-trial estimator of the variance of our method and demonstrate that this estimator is empirically accurate at low sample counts, enabling a practitioner to know how well they can trust a given low-sample estimate. When validating state-of-the-art semi-supervised models on ImageNet and iNaturalist2017, our method achieves the same estimates of model performance with up to 10x fewer labels than competing approaches. In particular, we can estimate model F1 scores with a variance of 0.005 using as few as 100 labels.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2018

Finite-sample Guarantees for Winsorized Importance Sampling

Importance sampling is a widely used technique to estimate the propertie...
research
01/10/2019

The square root rule for adaptive importance sampling

In adaptive importance sampling, and other contexts, we have unbiased an...
research
10/26/2021

CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator

Accurately backpropagating the gradient through categorical variables is...
research
09/24/2021

Sample Efficient Model Evaluation

Labelling data is a major practical bottleneck in training and testing c...
research
11/20/2015

Variance Reduction in SGD by Distributed Importance Sampling

Humans are able to accelerate their learning by selecting training mater...
research
06/05/2023

DISCount: Counting in Large Image Collections with Detector-Based Importance Sampling

Many modern applications use computer vision to detect and count objects...
research
04/25/2021

Model-based metrics: Sample-efficient estimates of predictive model subpopulation performance

Machine learning models - now commonly developed to screen, diagnose, or...

Please sign up or login with your details

Forgot password? Click here to reset