Efficient Feature Distillation for Zero-shot Detection

03/21/2023
by   Zhuoming Liu, et al.
0

The large-scale vision-language models (e.g., CLIP) are leveraged by different methods to detect unseen objects. However, most of these works require additional captions or images for training, which is not feasible in the context of zero-shot detection. In contrast, the distillation-based method is an extra-data-free method, but it has its limitations. Specifically, existing work creates distillation regions that are biased to the base categories, which limits the distillation of novel category information and harms the distillation efficiency. Furthermore, directly using the raw feature from CLIP for distillation neglects the domain gap between the training data of CLIP and the detection datasets, which makes it difficult to learn the mapping from the image region to the vision-language feature space - an essential component for detecting unseen objects. As a result, existing distillation-based methods require an excessively long training schedule. To solve these problems, we propose Efficient feature distillation for Zero-Shot Detection (EZSD). Firstly, EZSD adapts the CLIP's feature space to the target detection domain by re-normalizing CLIP to bridge the domain gap; Secondly, EZSD uses CLIP to generate distillation proposals with potential novel instances, to avoid the distillation being overly biased to the base categories. Finally, EZSD takes advantage of semantic meaning for regression to further improve the model performance. As a result, EZSD achieves state-of-the-art performance in the COCO zero-shot benchmark with a much shorter training schedule and outperforms previous work by 4 setting with 1/10 training time.

READ FULL TEXT

page 3

page 4

page 8

page 12

research
09/24/2021

ZSD-YOLO: Zero-Shot YOLO Detection using Vision-Language KnowledgeDistillation

Real-world object sampling produces long-tailed distributions requiring ...
research
04/28/2021

Zero-Shot Detection via Vision and Language Knowledge Distillation

Zero-shot image classification has made promising progress by training t...
research
03/20/2022

Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation

Open-vocabulary object detection aims to detect novel object categories ...
research
03/27/2019

Zero-shot Image Recognition Using Relational Matching, Adaptation and Calibration

Zero-shot learning (ZSL) for image classification focuses on recognizing...
research
06/19/2023

Primitive Generation and Semantic-related Alignment for Universal Zero-Shot Segmentation

We study universal zero-shot segmentation in this work to achieve panopt...
research
09/15/2023

Audio-free Prompt Tuning for Language-Audio Models

Contrastive Language-Audio Pretraining (CLAP) is pre-trained to associat...
research
09/10/2023

Exploiting CLIP for Zero-shot HOI Detection Requires Knowledge Distillation at Multiple Levels

In this paper, we investigate the task of zero-shot human-object interac...

Please sign up or login with your details

Forgot password? Click here to reset