Prototype-guided Cross-task Knowledge Distillation for Large-scale Models

by   Deng Li, et al.

Recently, large-scale pre-trained models have shown their advantages in many tasks. However, due to the huge computational complexity and storage requirements, it is challenging to apply the large-scale model to real scenes. A common solution is knowledge distillation which regards the large-scale model as a teacher model and helps to train a small student model to obtain a competitive performance. Cross-task Knowledge distillation expands the application scenarios of the large-scale pre-trained model. Existing knowledge distillation works focus on directly mimicking the final prediction or the intermediate layers of the teacher model, which represent the global-level characteristics and are task-specific. To alleviate the constraint of different label spaces, capturing invariant intrinsic local object characteristics (such as the shape characteristics of the leg and tail of the cattle and horse) plays a key role. Considering the complexity and variability of real scene tasks, we propose a Prototype-guided Cross-task Knowledge Distillation (ProC-KD) approach to transfer the intrinsic local-level object knowledge of a large-scale teacher network to various task scenarios. First, to better transfer the generalized knowledge in the teacher model in cross-task scenarios, we propose a prototype learning module to learn from the essential feature representation of objects in the teacher model. Secondly, for diverse downstream tasks, we propose a task-adaptive feature augmentation module to enhance the features of the student model with the learned generalization prototype features and guide the training of the student model to improve its generalization ability. The experimental results on various visual tasks demonstrate the effectiveness of our approach for large-scale model cross-task knowledge distillation scenes.


page 1

page 2

page 3

page 6

page 8

page 9

page 11


Student Network Learning via Evolutionary Knowledge Distillation

Knowledge distillation provides an effective way to transfer knowledge v...

Learning to Teach with Student Feedback

Knowledge distillation (KD) has gained much attention due to its effecti...

Model Distillation with Knowledge Transfer from Face Classification to Alignment and Verification

Knowledge distillation is a potential solution for model compression. Th...

Multi-teacher knowledge distillation as an effective method for compressing ensembles of neural networks

Deep learning has contributed greatly to many successes in artificial in...

MixKD: Towards Efficient Distillation of Large-scale Language Models

Large-scale language models have recently demonstrated impressive empiri...

A Systematic Study of Knowledge Distillation for Natural Language Generation with Pseudo-Target Training

Modern Natural Language Generation (NLG) models come with massive comput...

Accelerating Large Scale Knowledge Distillation via Dynamic Importance Sampling

Knowledge distillation is an effective technique that transfers knowledg...

Please sign up or login with your details

Forgot password? Click here to reset