An Efficient Method of Training Small Models for Regression Problems with Knowledge Distillation

02/28/2020
by   Makoto Takamoto, et al.
0

Compressing deep neural network (DNN) models becomes a very important and necessary technique for real-world applications, such as deploying those models on mobile devices. Knowledge distillation is one of the most popular methods for model compression, and many studies have been made on developing this technique. However, those studies mainly focused on classification problems, and very few attempts have been made on regression problems, although there are many application of DNNs on regression problems. In this paper, we propose a new formalism of knowledge distillation for regression problems. First, we propose a new loss function, teacher outlier rejection loss, which rejects outliers in training samples using teacher model predictions. Second, we consider a multi-task network with two outputs: one estimates training labels which is in general contaminated by noisy labels; And the other estimates teacher model's output which is expected to modify the noise labels following the memorization effects. By considering the multi-task network, training of the feature extraction of student models becomes more effective, and it allows us to obtain a better student model than one trained from scratch. We performed comprehensive evaluation with one simple toy model: sinusoidal function, and two open datasets: MPIIGaze, and Multi-PIE. Our results show consistent improvement in accuracy regardless of the annotation error level in the datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/23/2019

Similarity-Preserving Knowledge Distillation

Knowledge distillation is a widely applicable technique for training a s...
research
09/15/2020

Noisy Self-Knowledge Distillation for Text Summarization

In this paper we apply self-knowledge distillation to text summarization...
research
11/13/2018

Private Model Compression via Knowledge Distillation

The soaring demand for intelligent mobile applications calls for deployi...
research
06/14/2022

SoTeacher: A Student-oriented Teacher Network Training Framework for Knowledge Distillation

How to train an ideal teacher for knowledge distillation is still an ope...
research
05/26/2023

ABC-KD: Attention-Based-Compression Knowledge Distillation for Deep Learning-Based Noise Suppression

Noise suppression (NS) models have been widely applied to enhance speech...
research
08/22/2019

Adversarial-Based Knowledge Distillation for Multi-Model Ensemble and Noisy Data Refinement

Generic Image recognition is a fundamental and fairly important visual p...
research
02/20/2022

Cross-Task Knowledge Distillation in Multi-Task Recommendation

Multi-task learning (MTL) has been widely used in recommender systems, w...

Please sign up or login with your details

Forgot password? Click here to reset