Distilling Object Detectors with Task Adaptive Regularization

06/23/2020
by   Ruoyu Sun, et al.
9

Current state-of-the-art object detectors are at the expense of high computational costs and are hard to deploy to low-end devices. Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization. In this paper, we investigate each module of a typical detector in depth, and propose a general distillation framework that adaptively transfers knowledge from teacher to student according to the task specific priors. The intuition is that simply distilling all information from teacher to student is not advisable, instead we should only borrow priors from the teacher model where the student cannot perform well. Towards this goal, we propose a region proposal sharing mechanism to interflow region responses between the teacher and student models. Based on this, we adaptively transfer knowledge at three levels, i.e., feature backbone, classification head, and bounding box regression head, according to which model performs more reasonably. Furthermore, considering that it would introduce optimization dilemma when minimizing distillation loss and detection loss simultaneously, we propose a distillation decay strategy to help improve model generalization via gradually reducing the distillation penalty. Experiments on widely used detection benchmarks demonstrate the effectiveness of our method. In particular, using Faster R-CNN with FPN as an instantiation, we achieve an accuracy of 39.0% with Resnet-50 on COCO dataset, which surpasses the baseline 36.3% by 2.7% points, and even better than the teacher model with 38.5% mAP.

READ FULL TEXT

page 4

page 6

page 14

research
06/09/2021

Distilling Image Classifiers in Object Detectors

Knowledge distillation constitutes a simple yet effective way to improve...
research
07/12/2022

HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors

Conventional knowledge distillation (KD) methods for object detection ma...
research
06/04/2021

ERNIE-Tiny : A Progressive Distillation Framework for Pretrained Transformer Compression

Pretrained language models (PLMs) such as BERT adopt a training paradigm...
research
07/05/2022

PKD: General Distillation Framework for Object Detectors via Pearson Correlation Coefficient

Knowledge distillation(KD) is a widely-used technique to train compact m...
research
08/24/2021

Improving Object Detection by Label Assignment Distillation

Label assignment in object detection aims to assign targets, foreground ...
research
04/04/2022

Re-examining Distillation For Continual Object Detection

Training models continually to detect and classify objects, from new cla...
research
03/02/2023

Distillation from Heterogeneous Models for Top-K Recommendation

Recent recommender systems have shown remarkable performance by using an...

Please sign up or login with your details

Forgot password? Click here to reset