Visualizing the embedding space to explain the effect of knowledge distillation

10/09/2021
by   Hyun Seung Lee, et al.
0

Recent research has found that knowledge distillation can be effective in reducing the size of a network and in increasing generalization. A pre-trained, large teacher network, for example, was shown to be able to bootstrap a student model that eventually outperforms the teacher in a limited label environment. Despite these advances, it still is relatively unclear why this method works, that is, what the resulting student model does 'better'. To address this issue, here, we utilize two non-linear, low-dimensional embedding methods (t-SNE and IVIS) to visualize representation spaces of different layers in a network. We perform a set of extensive experiments with different architecture parameters and distillation methods. The resulting visualizations and metrics clearly show that distillation guides the network to find a more compact representation space for higher accuracy already in earlier layers compared to its non-distilled version.

READ FULL TEXT
research
02/16/2022

Deeply-Supervised Knowledge Distillation

Knowledge distillation aims to enhance the performance of a lightweight ...
research
10/14/2022

Knowledge Distillation approach towards Melanoma Detection

Melanoma is regarded as the most threatening among all skin cancers. The...
research
04/20/2021

Knowledge Distillation as Semiparametric Inference

A popular approach to model compression is to train an inexpensive stude...
research
06/16/2021

Topology Distillation for Recommender System

Recommender Systems (RS) have employed knowledge distillation which is a...
research
08/27/2021

CoCo DistillNet: a Cross-layer Correlation Distillation Network for Pathological Gastric Cancer Segmentation

In recent years, deep convolutional neural networks have made significan...
research
11/13/2019

Knowledge Representing: Efficient, Sparse Representation of Prior Knowledge for Knowledge Distillation

Despite the recent works on knowledge distillation (KD) have achieved a ...
research
09/13/2020

DistilE: Distiling Knowledge Graph Embeddings for Faster and Cheaper Reasoning

Knowledge Graph Embedding (KGE) is a popular method for KG reasoning and...

Please sign up or login with your details

Forgot password? Click here to reset