Long Short-Term Sample Distillation

03/02/2020
by   Liang Jiang, et al.
5

In the past decade, there has been substantial progress at training increasingly deep neural networks. Recent advances within the teacher–student training paradigm have established that information about past training updates show promise as a source of guidance during subsequent training steps. Based on this notion, in this paper, we propose Long Short-Term Sample Distillation, a novel training policy that simultaneously leverages multiple phases of the previous training process to guide the later training updates to a neural network, while efficiently proceeding in just one single generation pass. With Long Short-Term Sample Distillation, the supervision signal for each sample is decomposed into two parts: a long-term signal and a short-term one. The long-term teacher draws on snapshots from several epochs ago in order to provide steadfast guidance and to guarantee teacher–student differences, while the short-term one yields more up-to-date cues with the goal of enabling higher-quality updates. Moreover, the teachers for each sample are unique, such that, overall, the model learns from a very diverse set of teachers. Comprehensive experimental results across a range of vision and NLP tasks demonstrate the effectiveness of this new training method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/15/2023

Distilling Knowledge for Short-to-Long Term Trajectory Prediction

Long-term trajectory forecasting is a challenging problem in the field o...
research
02/20/2021

Exploring Knowledge Distillation of a Deep Neural Network for Multi-Script identification

Multi-lingual script identification is a difficult task consisting of di...
research
07/03/2023

Review helps learn better: Temporal Supervised Knowledge Distillation

Reviewing plays an important role when learning knowledge. The knowledge...
research
12/01/2018

Snapshot Distillation: Teacher-Student Optimization in One Generation

Optimizing a deep neural network is a fundamental task in computer visio...
research
08/07/2023

TempFuser: Learning Tactical and Agile Flight Maneuvers in Aerial Dogfights using a Long Short-Term Temporal Fusion Transformer

Aerial dogfights necessitate understanding the tactically changing maneu...
research
04/07/2019

Long-Term Vehicle Localization by Recursive Knowledge Distillation

Most of the current state-of-the-art frameworks for cross-season visual ...
research
02/27/2023

Combining Slow and Fast: Complementary Filtering for Dynamics Learning

Modeling an unknown dynamical system is crucial in order to predict the ...

Please sign up or login with your details

Forgot password? Click here to reset