Improved Knowledge Distillation via Adversarial Collaboration

11/29/2021
by   Zhiqiang Liu, et al.
0

Knowledge distillation has become an important approach to obtain a compact yet effective model. To achieve this goal, a small student model is trained to exploit the knowledge of a large well-trained teacher model. However, due to the capacity gap between the teacher and the student, the student's performance is hard to reach the level of the teacher. Regarding this issue, existing methods propose to reduce the difficulty of the teacher's knowledge via a proxy way. We argue that these proxy-based methods overlook the knowledge loss of the teacher, which may cause the student to encounter capacity bottlenecks. In this paper, we alleviate the capacity gap problem from a new perspective with the purpose of averting knowledge loss. Instead of sacrificing part of the teacher's knowledge, we propose to build a more powerful student via adversarial collaborative learning. To this end, we further propose an Adversarial Collaborative Knowledge Distillation (ACKD) method that effectively improves the performance of knowledge distillation. Specifically, we construct the student model with multiple auxiliary learners. Meanwhile, we devise an adversarial collaborative module (ACM) that introduces attention mechanism and adversarial learning to enhance the capacity of the student. Extensive experiments on four classification tasks show the superiority of the proposed ACKD.

READ FULL TEXT

page 4

page 8

research
04/24/2023

Improving Knowledge Distillation Via Transferring Learning Ability

Existing knowledge distillation methods generally use a teacher-student ...
research
05/20/2023

Lifting the Curse of Capacity Gap in Distilling Language Models

Pretrained language models (LMs) have shown compelling performance on va...
research
03/16/2023

Towards a Smaller Student: Capacity Dynamic Distillation for Efficient Image Retrieval

Previous Knowledge Distillation based efficient image retrieval methods ...
research
06/11/2022

Reducing Capacity Gap in Knowledge Distillation with Review Mechanism for Crowd Counting

The lightweight crowd counting models, in particular knowledge distillat...
research
07/11/2023

The Staged Knowledge Distillation in Video Classification: Harmonizing Student Progress by a Complementary Weakly Supervised Framework

In the context of label-efficient learning on video data, the distillati...
research
05/29/2022

AutoDisc: Automatic Distillation Schedule for Large Language Model Compression

Driven by the teacher-student paradigm, knowledge distillation is one of...
research
06/16/2023

Coaching a Teachable Student

We propose a novel knowledge distillation framework for effectively teac...

Please sign up or login with your details

Forgot password? Click here to reset