When and Why Test-Time Augmentation Works

by   Divya Shanmugam, et al.

Test-time augmentation (TTA)—the aggregation of predictions across transformed versions of a test input—is a common practice in image classification. In this paper, we present theoretical and experimental analyses that shed light on 1) when test time augmentation is likely to be helpful and 2) when to use various test-time augmentation policies. A key finding is that even when TTA produces a net improvement in accuracy, it can change many correct predictions into incorrect predictions. We delve into when and why test-time augmentation changes a prediction from being correct to incorrect and vice versa. Our analysis suggests that the nature and amount of training data, the model architecture, and the augmentation policy all matter. Building on these insights, we present a learning-based method for aggregating test-time augmentations. Experiments across a diverse set of models, datasets, and augmentations show that our method delivers consistent improvements over existing approaches.


page 5

page 6

page 8


Improved Text Classification via Test-Time Augmentation

Test-time augmentation – the aggregation of predictions across transform...

Adaptive Test-Time Augmentation for Low-Power CPU

Convolutional Neural Networks (ConvNets) are trained offline using the f...

Test-time augmentation with uncertainty estimation for deep learning-based medical image segmentation

Data augmentation has been widely used for training deep learning system...

Greedy Policy Search: A Simple Baseline for Learnable Test-Time Augmentation

Test-time data augmentation—averaging the predictions of a machine learn...

Boosting Anomaly Detection Using Unsupervised Diverse Test-Time Augmentation

Anomaly detection is a well-known task that involves the identification ...

MT-Net Submission to the Waymo 3D Detection Leaderboard

In this technical report, we introduce our submission to the Waymo 3D De...

Test-time Collective Prediction

An increasingly common setting in machine learning involves multiple par...