Triplet Loss for Knowledge Distillation

04/17/2020
by   Hideki Oki, et al.
0

In recent years, deep learning has spread rapidly, and deeper, larger models have been proposed. However, the calculation cost becomes enormous as the size of the models becomes larger. Various techniques for compressing the size of the models have been proposed to improve performance while reducing computational costs. One of the methods to compress the size of the models is knowledge distillation (KD). Knowledge distillation is a technique for transferring knowledge of deep or ensemble models with many parameters (teacher model) to smaller shallow models (student model). Since the purpose of knowledge distillation is to increase the similarity between the teacher model and the student model, we propose to introduce the concept of metric learning into knowledge distillation to make the student model closer to the teacher model using pairs or triplets of the training samples. In metric learning, the researchers are developing the methods to build a model that can increase the similarity of outputs for similar samples. Metric learning aims at reducing the distance between similar and increasing the distance between dissimilar. The functionality of the metric learning to reduce the differences between similar outputs can be used for the knowledge distillation to reduce the differences between the outputs of the teacher model and the student model. Since the outputs of the teacher model for different objects are usually different, the student model needs to distinguish them. We think that metric learning can clarify the difference between the different outputs, and the performance of the student model could be improved. We have performed experiments to compare the proposed method with state-of-the-art knowledge distillation methods.

READ FULL TEXT
research
04/10/2019

Relational Knowledge Distillation

Knowledge distillation aims at transferring knowledge acquired in one mo...
research
09/03/2020

Intra-Utterance Similarity Preserving Knowledge Distillation for Audio Tagging

Knowledge Distillation (KD) is a popular area of research for reducing t...
research
12/01/2016

In Teacher We Trust: Learning Compressed Models for Pedestrian Detection

Deep convolutional neural networks continue to advance the state-of-the-...
research
03/10/2022

Online Deep Metric Learning via Mutual Distillation

Deep metric learning aims to transform input data into an embedding spac...
research
03/18/2023

Whole-slide-imaging Cancer Metastases Detection and Localization with Limited Tumorous Data

Recently, various deep learning methods have shown significant successes...
research
06/04/2022

Guided Deep Metric Learning

Deep Metric Learning (DML) methods have been proven relevant for visual ...
research
02/14/2023

Multi-teacher knowledge distillation as an effective method for compressing ensembles of neural networks

Deep learning has contributed greatly to many successes in artificial in...

Please sign up or login with your details

Forgot password? Click here to reset