Instance-aware Model Ensemble With Distillation For Unsupervised Domain Adaptation

by   Weimin Wu, et al.

The linear ensemble based strategy, i.e., averaging ensemble, has been proposed to improve the performance in unsupervised domain adaptation tasks. However, a typical UDA task is usually challenged by dynamically changing factors, such as variable weather, views, and background in the unlabeled target domain. Most previous ensemble strategies ignore UDA's dynamic and uncontrollable challenge, facing limited feature representations and performance bottlenecks. To enhance the model, adaptability between domains and reduce the computational cost when deploying the ensemble model, we propose a novel framework, namely Instance aware Model Ensemble With Distillation, IMED, which fuses multiple UDA component models adaptively according to different instances and distills these components into a small model. The core idea of IMED is a dynamic instance aware ensemble strategy, where for each instance, a nonlinear fusion subnetwork is learned that fuses the extracted features and predicted labels of multiple component models. The nonlinear fusion method can help the ensemble model handle dynamically changing factors. After learning a large capacity ensemble model with good adaptability to different changing factors, we leverage the ensemble teacher model to guide the learning of a compact student model by knowledge distillation. Furthermore, we provide the theoretical analysis of the validity of IMED for UDA. Extensive experiments conducted on various UDA benchmark datasets, e.g., Office 31, Office Home, and VisDA 2017, show the superiority of the model based on IMED to the state of the art methods under the comparable computation cost.


page 1

page 9

page 10

page 11


Towards domain generalisation in ASR with elitist sampling and ensemble knowledge distillation

Knowledge distillation has widely been used for model compression and do...

One-Class Knowledge Distillation for Face Presentation Attack Detection

Face presentation attack detection (PAD) has been extensively studied by...

Student Become Decathlon Master in Retinal Vessel Segmentation via Dual-teacher Multi-target Domain Adaptation

Unsupervised domain adaptation has been proposed recently to tackle the ...

Cosine Model Watermarking Against Ensemble Distillation

Many model watermarking methods have been developed to prevent valuable ...

Knowledge Adaptation: Teaching to Adapt

Domain adaptation is crucial in many real-world applications where the d...

Rethinking Ensemble-Distillation for Semantic Segmentation Based Unsupervised Domain Adaptation

Recent researches on unsupervised domain adaptation (UDA) have demonstra...

Please sign up or login with your details

Forgot password? Click here to reset