Multi-modal Queried Object Detection in the Wild

05/30/2023
by   Yifan Xu, et al.
0

We introduce MQ-Det, an efficient architecture and pre-training strategy design to utilize both textual description with open-set generalization and visual exemplars with rich description granularity as category queries, namely, Multi-modal Queried object Detection, for real-world detection with both open-vocabulary categories and various granularity. MQ-Det incorporates vision queries into existing well-established language-queried-only detectors. A plug-and-play gated class-scalable perceiver module upon the frozen detector is proposed to augment category text with class-wise visual information. To address the learning inertia problem brought by the frozen detector, a vision conditioned masked language prediction strategy is proposed. MQ-Det's simple yet effective architecture and training strategy design is compatible with most language-queried object detectors, thus yielding versatile applications. Experimental results demonstrate that multi-modal queries largely boost open-world detection. For instance, MQ-Det significantly improves the state-of-the-art open-set detector GLIP by +7.8 benchmark and averagely +6.3 3 https://github.com/YifanXu74/MQ-Det.

READ FULL TEXT

page 3

page 8

page 17

research
08/30/2023

Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection

In this paper, we for the first time explore helpful multi-modal context...
research
06/08/2023

Multi-Modal Classifiers for Open-Vocabulary Object Detection

The goal of this paper is open-vocabulary object detection (OVOD) x2013 ...
research
04/12/2022

X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks

In this paper, we study the challenging instance-wise vision-language ta...
research
11/24/2022

Delving into Out-of-Distribution Detection with Vision-Language Representations

Recognizing out-of-distribution (OOD) samples is critical for machine le...
research
05/23/2023

DetGPT: Detect What You Need via Reasoning

In recent years, the field of computer vision has seen significant advan...
research
11/18/2022

Detect Only What You Specify : Object Detection with Linguistic Target

Object detection is a computer vision task of predicting a set of boundi...
research
07/17/2023

Unified Open-Vocabulary Dense Visual Prediction

In recent years, open-vocabulary (OV) dense visual prediction (such as O...

Please sign up or login with your details

Forgot password? Click here to reset