On-Device Domain Generalization

09/15/2022
by   Kaiyang Zhou, et al.
30

We present a systematic study of domain generalization (DG) for tiny neural networks, a problem that is critical to on-device machine learning applications but has been overlooked in the literature where research has been focused on large models only. Tiny neural networks have much fewer parameters and lower complexity, and thus should not be trained the same way as their large counterparts for DG applications. We find that knowledge distillation is a strong candidate for solving the problem: it outperforms state-of-the-art DG methods that were developed using large models with a large margin. Moreover, we observe that the teacher-student performance gap on test data with domain shift is bigger than that on in-distribution data. To improve DG for tiny neural networks without increasing the deployment cost, we propose a simple idea called out-of-distribution knowledge distillation (OKD), which aims to teach the student how the teacher handles (synthetic) out-of-distribution data and is proved to be a promising framework for solving the problem. We also contribute a scalable method of creating DG datasets, called DOmain Shift in COntext (DOSCO), which can be applied to broad data at scale without much human effort. Code and models are released at <https://github.com/KaiyangZhou/on-device-dg>.

READ FULL TEXT

page 5

page 15

page 16

page 17

page 18

page 19

page 20

page 21

research
09/20/2023

Weight Averaging Improves Knowledge Distillation under Domain Shift

Knowledge distillation (KD) is a powerful model compression technique br...
research
07/21/2023

Distribution Shift Matters for Knowledge Distillation with Webly Collected Images

Knowledge distillation aims to learn a lightweight student network from ...
research
07/03/2022

PrUE: Distilling Knowledge from Sparse Teacher Networks

Although deep neural networks have enjoyed remarkable success across a w...
research
09/12/2022

Switchable Online Knowledge Distillation

Online Knowledge Distillation (OKD) improves the involved models by reci...
research
09/21/2022

Momentum Adversarial Distillation: Handling Large Distribution Shifts in Data-Free Knowledge Distillation

Data-free Knowledge Distillation (DFKD) has attracted attention recently...
research
04/17/2018

Neural Compatibility Modeling with Attentive Knowledge Distillation

Recently, the booming fashion sector and its huge potential benefits hav...
research
03/31/2023

Simple Domain Generalization Methods are Strong Baselines for Open Domain Generalization

In real-world applications, a machine learning model is required to hand...

Please sign up or login with your details

Forgot password? Click here to reset