Learning to Detect and Segment for Open Vocabulary Object Detection

12/23/2022
by   PetsTime, et al.
0

Open vocabulary object detection has been greatly advanced by the recent development of vision-language pretrained model, which helps recognize novel objects with only semantic categories. The prior works mainly focus on knowledge transferring to the object proposal classification and employ class-agnostic box and mask prediction. In this work, we propose CondHead, a principled dynamic network design to better generalize the box regression and mask segmentation for open vocabulary setting. The core idea is to conditionally parameterize the network heads on semantic embedding and thus the model is guided with class-specific knowledge to better detect novel categories. Specifically, CondHead is composed of two streams of network heads, the dynamically aggregated head and the dynamically generated head. The former is instantiated with a set of static heads that are conditionally aggregated, these heads are optimized as experts and are expected to learn sophisticated prediction. The latter is instantiated with dynamically generated parameters and encodes general class-specific information. With such a conditional design, the detection model is bridged by the semantic embedding to offer strongly generalizable class-wise box and mask prediction. Our method brings significant improvement to the state-of-the-art open vocabulary object detection methods with very minor overhead, e.g., it surpasses a RegionClip model by 3.0 detection AP on novel categories, with only 1.1

READ FULL TEXT

page 1

page 2

page 7

page 8

research
06/22/2022

Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization

Open-vocabulary object detection (OVD) aims to scale up vocabulary size ...
research
09/30/2022

F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

We present F-VLM, a simple open-vocabulary object detection method built...
research
05/26/2023

OpenVIS: Open-vocabulary Video Instance Segmentation

We propose and study a new computer vision task named open-vocabulary vi...
research
03/10/2023

Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection

Open-vocabulary object detection aims to provide object detectors traine...
research
07/07/2022

Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection

Existing open-vocabulary object detectors typically enlarge their vocabu...
research
02/27/2023

Aligning Bag of Regions for Open-Vocabulary Object Detection

Pre-trained vision-language models (VLMs) learn to align vision and lang...
research
01/23/2023

OvarNet: Towards Open-vocabulary Object Attribute Recognition

In this paper, we consider the problem of simultaneously detecting objec...

Please sign up or login with your details

Forgot password? Click here to reset