Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

10/27/2021
by   Gongfan Fang, et al.
0

Knowledge distillation (KD) aims to craft a compact student model that imitates the behavior of a pre-trained teacher in a target domain. Prior KD approaches, despite their gratifying results, have largely relied on the premise that in-domain data is available to carry out the knowledge transfer. Such an assumption, unfortunately, in many cases violates the practical setting, since the original training data or even the data domain is often unreachable due to privacy or copyright reasons. In this paper, we attempt to tackle an ambitious task, termed as out-of-domain knowledge distillation (OOD-KD), which allows us to conduct KD using only OOD data that can be readily obtained at a very low cost. Admittedly, OOD-KD is by nature a highly challenging task due to the agnostic domain gap. To this end, we introduce a handy yet surprisingly efficacious approach, dubbed as MosaicKD. The key insight behind MosaicKD lies in that, samples from various domains share common local patterns, even though their global semantic may vary significantly; these shared local patterns, in turn, can be re-assembled analogous to mosaic tiling, to approximate the in-domain data and to further alleviating the domain discrepancy. In MosaicKD, this is achieved through a four-player min-max game, in which a generator, a discriminator, a student network, are collectively trained in an adversarial manner, partially under the guidance of a pre-trained teacher. We validate MosaicKD over classification and semantic segmentation tasks across various benchmarks, and demonstrate that it yields results much superior to the state-of-the-art counterparts on OOD data. Our code is available at <https://github.com/zju-vipa/MosaicKD>.

READ FULL TEXT

page 2

page 9

page 17

research
02/19/2023

HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers

Knowledge distillation has been shown to be a powerful model compression...
research
04/12/2021

Dual Discriminator Adversarial Distillation for Data-free Model Compression

Knowledge distillation has been widely used to produce portable and effi...
research
05/10/2021

KDExplainer: A Task-oriented Attention Model for Explaining Knowledge Distillation

Knowledge distillation (KD) has recently emerged as an efficacious schem...
research
01/20/2021

Learning to Augment for Data-Scarce Domain BERT Knowledge Distillation

Despite pre-trained language models such as BERT have achieved appealing...
research
08/31/2023

MoMA: Momentum Contrastive Learning with Multi-head Attention-based Knowledge Distillation for Histopathology Image Analysis

There is no doubt that advanced artificial intelligence models and high ...
research
07/15/2023

SoccerKDNet: A Knowledge Distillation Framework for Action Recognition in Soccer Videos

Classifying player actions from soccer videos is a challenging problem, ...
research
10/23/2022

Respecting Transfer Gap in Knowledge Distillation

Knowledge distillation (KD) is essentially a process of transferring a t...

Please sign up or login with your details

Forgot password? Click here to reset