Weight Averaging Improves Knowledge Distillation under Domain Shift

09/20/2023
by   Valeriy Berezovskiy, et al.
0

Knowledge distillation (KD) is a powerful model compression technique broadly used in practical deep learning applications. It is focused on training a small student network to mimic a larger teacher network. While it is widely known that KD can offer an improvement to student generalization in i.i.d setting, its performance under domain shift, i.e. the performance of student networks on data from domains unseen during training, has received little attention in the literature. In this paper we make a step towards bridging the research fields of knowledge distillation and domain generalization. We show that weight averaging techniques proposed in domain generalization literature, such as SWAD and SMA, also improve the performance of knowledge distillation under domain shift. In addition, we propose a simplistic weight averaging strategy that does not require evaluation on validation data during training and show that it performs on par with SWAD and SMA when applied to KD. We name our final distillation approach Weight-Averaged Knowledge Distillation (WAKD).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/01/2023

Towards domain generalisation in ASR with elitist sampling and ensemble knowledge distillation

Knowledge distillation has widely been used for model compression and do...
research
09/15/2022

On-Device Domain Generalization

We present a systematic study of domain generalization (DG) for tiny neu...
research
09/19/2020

Weight Distillation: Transferring the Knowledge in Neural Network Parameters

Knowledge distillation has been proven to be effective in model accelera...
research
04/03/2023

Domain Generalization for Crop Segmentation with Knowledge Distillation

In recent years, precision agriculture has gradually oriented farming cl...
research
10/19/2021

Adaptive Distillation: Aggregating Knowledge from Multiple Paths for Efficient Distillation

Knowledge Distillation is becoming one of the primary trends among neura...
research
09/07/2023

Towards Comparable Knowledge Distillation in Semantic Image Segmentation

Knowledge Distillation (KD) is one proposed solution to large model size...
research
10/14/2022

Learning Generalizable Models for Vehicle Routing Problems via Knowledge Distillation

Recent neural methods for vehicle routing problems always train and test...

Please sign up or login with your details

Forgot password? Click here to reset