Imitation networks: Few-shot learning of neural networks from scratch

02/08/2018
by   Akisato Kimura, et al.
0

In this paper, we propose imitation networks, a simple but effective method for training neural networks with a limited amount of training data. Our approach inherits the idea of knowledge distillation that transfers knowledge from a deep or wide reference model to a shallow or narrow target model. The proposed method employs this idea to mimic predictions of reference estimators that are much more robust against overfitting than the network we want to train. Different from almost all the previous work for knowledge distillation that requires a large amount of labeled training data, the proposed method requires only a small amount of training data. Instead, we introduce pseudo training examples that are optimized as a part of model parameters. Experimental results for several benchmark datasets demonstrate that the proposed method outperformed all the other baselines, such as naive training of the target model and standard knowledge distillation.

READ FULL TEXT
research
03/21/2017

Knowledge distillation using unlabeled mismatched images

Current approaches for Knowledge Distillation (KD) either directly use t...
research
07/21/2023

Distribution Shift Matters for Knowledge Distillation with Webly Collected Images

Knowledge distillation aims to learn a lightweight student network from ...
research
03/19/2022

Emulating Quantum Dynamics with Neural Networks via Knowledge Distillation

High-fidelity quantum dynamics emulators can be used to predict the time...
research
08/11/2022

Self-Knowledge Distillation via Dropout

To boost the performance, deep neural networks require deeper or wider n...
research
01/15/2021

Data Impressions: Mining Deep Models to Extract Samples for Data-free Applications

Pretrained deep models hold their learnt knowledge in the form of the mo...
research
09/04/2019

Empirical Analysis of Knowledge Distillation Technique for Optimization of Quantized Deep Neural Networks

Knowledge distillation (KD) is a very popular method for model size redu...
research
07/31/2022

Chinese grammatical error correction based on knowledge distillation

In view of the poor robustness of existing Chinese grammatical error cor...

Please sign up or login with your details

Forgot password? Click here to reset