Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling

11/15/2022
by   Yu Wang, et al.
0

DETR is a novel end-to-end transformer architecture object detector, which significantly outperforms classic detectors when scaling up the model size. In this paper, we focus on the compression of DETR with knowledge distillation. While knowledge distillation has been well-studied in classic detectors, there is a lack of researches on how to make it work effectively on DETR. We first provide experimental and theoretical analysis to point out that the main challenge in DETR distillation is the lack of consistent distillation points. Distillation points refer to the corresponding inputs of the predictions for student to mimic, and reliable distillation requires sufficient distillation points which are consistent between teacher and student. Based on this observation, we propose a general knowledge distillation paradigm for DETR(KD-DETR) with consistent distillation points sampling. Specifically, we decouple detection and distillation tasks by introducing a set of specialized object queries to construct distillation points. In this paradigm, we further propose a general-to-specific distillation points sampling strategy to explore the extensibility of KD-DETR. Extensive experiments on different DETR architectures with various scales of backbones and transformer layers validate the effectiveness and generalization of KD-DETR. KD-DETR boosts the performance of DAB-DETR with ResNet-18 and ResNet-50 backbone to 41.4%, 45.7% mAP, respectively, which are 5.2%, 3.5% higher than the baseline, and ResNet-50 even surpasses the teacher model by 2.2%.

READ FULL TEXT

page 1

page 5

page 8

research
02/26/2021

PURSUhInT: In Search of Informative Hint Points Based on Layer Clustering for Knowledge Distillation

We propose a novel knowledge distillation methodology for compressing de...
research
11/17/2022

D^3ETR: Decoder Distillation for Detection Transformer

While various knowledge distillation (KD) methods in CNN-based detectors...
research
08/05/2022

Task-Balanced Distillation for Object Detection

Mainstream object detectors are commonly constituted of two sub-tasks, i...
research
11/17/2022

DETRDistill: A Universal Knowledge Distillation Framework for DETR-families

Transformer-based detectors (DETRs) have attracted great attention due t...
research
10/25/2021

Instance-Conditional Knowledge Distillation for Object Detection

Despite the success of Knowledge Distillation (KD) on image classificati...
research
03/26/2021

Hands-on Guidance for Distilling Object Detectors

Knowledge distillation can lead to deploy-friendly networks against the ...
research
04/05/2021

Compressing Visual-linguistic Model via Knowledge Distillation

Despite exciting progress in pre-training for visual-linguistic (VL) rep...

Please sign up or login with your details

Forgot password? Click here to reset