DeepAI AI Chat
Log In Sign Up

Reducing Model Jitter: Stable Re-training of Semantic Parsers in Production Environments

by   Christopher Hidey, et al.

Retraining modern deep learning systems can lead to variations in model performance even when trained using the same data and hyper-parameters by simply using different random seeds. We call this phenomenon model jitter. This issue is often exacerbated in production settings, where models are retrained on noisy data. In this work we tackle the problem of stable retraining with a focus on conversational semantic parsers. We first quantify the model jitter problem by introducing the model agreement metric and showing the variation with dataset noise and model sizes. We then demonstrate the effectiveness of various jitter reduction techniques such as ensembling and distillation. Lastly, we discuss practical trade-offs between such techniques and show that co-distillation provides a sweet spot in terms of jitter reduction for semantic parsing systems with only a modest increase in resource usage.


page 1

page 2

page 3

page 4


Towards Testing of Deep Learning Systems with Training Set Reduction

Testing the implementation of deep learning systems and their training r...

New Properties of the Data Distillation Method When Working With Tabular Data

Data distillation is the problem of reducing the volume oftraining data ...

Controlling Style in Generated Dialogue

Open-domain conversation models have become good at generating natural-s...

Practical Trade-Offs for the Prefix-Sum Problem

Given an integer array A, the prefix-sum problem is to answer sum(i) que...

Empirical Analysis of Knowledge Distillation Technique for Optimization of Quantized Deep Neural Networks

Knowledge distillation (KD) is a very popular method for model size redu...

On the Reproducibility of Neural Network Predictions

Standard training techniques for neural networks involve multiple source...

3D-to-2D Distillation for Indoor Scene Parsing

Indoor scene semantic parsing from RGB images is very challenging due to...