On the Reproducibility of Neural Network Predictions

by   Srinadh Bhojanapalli, et al.

Standard training techniques for neural networks involve multiple sources of randomness, e.g., initialization, mini-batch ordering and in some cases data augmentation. Given that neural networks are heavily over-parameterized in practice, such randomness can cause churn – for the same input, disagreements between predictions of the two models independently trained by the same algorithm, contributing to the `reproducibility challenges' in modern machine learning. In this paper, we study this problem of churn, identify factors that cause it, and propose two simple means of mitigating it. We first demonstrate that churn is indeed an issue, even for standard image classification tasks (CIFAR and ImageNet), and study the role of the different sources of training randomness that cause churn. By analyzing the relationship between churn and prediction confidences, we pursue an approach with two components for churn reduction. First, we propose using minimum entropy regularizers to increase prediction confidences. Second, we present a novel variant of co-distillation approach <cit.> to increase model agreement and reduce churn. We present empirical results showing the effectiveness of both techniques in reducing churn while improving the accuracy of the underlying model.



page 1

page 2

page 3

page 4


Post-synaptic potential regularization has potential

Improving generalization is one of the main challenges for training deep...

Locally Adaptive Label Smoothing for Predictive Churn

Training modern neural networks is an inherently noisy process that can ...

Occlusions for Effective Data Augmentation in Image Classification

Deep networks for visual recognition are known to leverage "easy to reco...

Anti-Distillation: Improving reproducibility of deep networks

Deep networks have been revolutionary in improving performance of machin...

Fluctuations, Bias, Variance Ensemble of Learners: Exact Asymptotics for Convex Losses in High-Dimension

From the sampling of data to the initialisation of parameters, randomnes...

ResMLP: Feedforward networks for image classification with data-efficient training

We present ResMLP, an architecture built entirely upon multi-layer perce...

Reducing Model Jitter: Stable Re-training of Semantic Parsers in Production Environments

Retraining modern deep learning systems can lead to variations in model ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.