ZSD-YOLO: Zero-Shot YOLO Detection using Vision-Language KnowledgeDistillation

09/24/2021
by   Johnathan Xie, et al.
0

Real-world object sampling produces long-tailed distributions requiring exponentially more images for rare types. Zero-shot detection, which aims to detect unseen objects, is one direction to address this problem. A dataset such as COCO is extensively annotated across many images but with a sparse number of categories and annotating all object classes across a diverse domain is expensive and challenging. To advance zero-shot detection, we develop a Vision-Language distillation method that aligns both image and text embeddings from a zero-shot pre-trained model such as CLIP to a modified semantic prediction head from a one-stage detector like YOLOv5. With this method, we are able to train an object detector that achieves state-of-the-art accuracy on the COCO zero-shot detection splits with fewer model parameters. During inference, our model can be adapted to detect any number of object classes without additional training. We also find that the improvements provided by the scaling of our method are consistent across various YOLOv5 scales. Furthermore, we develop a self-training method that provides a significant score improvement without needing extra images nor labels.

READ FULL TEXT

page 3

page 8

research
04/28/2021

Zero-Shot Detection via Vision and Language Knowledge Distillation

Zero-shot image classification has made promising progress by training t...
research
03/21/2023

Efficient Feature Distillation for Zero-shot Detection

The large-scale vision-language models (e.g., CLIP) are leveraged by dif...
research
03/23/2023

Three ways to improve feature alignment for open vocabulary detection

The core problem in zero-shot open vocabulary detection is how to align ...
research
04/06/2023

DoUnseen: Zero-Shot Object Detection for Robotic Grasping

How can we segment varying numbers of objects where each specific object...
research
06/18/2022

VReBERT: A Simple and Flexible Transformer for Visual Relationship Detection

Visual Relationship Detection (VRD) impels a computer vision model to 's...
research
06/12/2023

Augmenting Zero-Shot Detection Training with Image Labels

Zero-shot detection (ZSD), i.e., detection on classes not seen during tr...
research
05/26/2023

Building One-class Detector for Anything: Open-vocabulary Zero-shot OOD Detection Using Text-image Models

We focus on the challenge of out-of-distribution (OOD) detection in deep...

Please sign up or login with your details

Forgot password? Click here to reset