Exposing the Troublemakers in Described Object Detection

07/24/2023
by   Chi Xie, et al.
0

Detecting objects based on language descriptions is a popular task that includes Open-Vocabulary object Detection (OVD) and Referring Expression Comprehension (REC). In this paper, we advance them to a more practical setting called Described Object Detection (DOD) by expanding category names to flexible language expressions for OVD and overcoming the limitation of REC to only grounding the pre-existing object. We establish the research foundation for DOD tasks by constructing a Description Detection Dataset (D^3), featuring flexible language expressions and annotating all described objects without omission. By evaluating previous SOTA methods on D^3, we find some troublemakers that fail current REC, OVD, and bi-functional methods. REC methods struggle with confidence scores, rejecting negative instances, and multi-target scenarios, while OVD methods face constraints with long and complex descriptions. Recent bi-functional methods also do not work well on DOD due to their separated training procedures and inference strategies for REC and OVD tasks. Building upon the aforementioned findings, we propose a baseline that largely improves REC methods by reconstructing the training data and introducing a binary classification sub-task, outperforming existing methods. Data and code is available at https://github.com/shikras/d-cube.

READ FULL TEXT

page 2

page 4

page 14

page 15

page 16

page 20

page 21

research
05/21/2023

Advancing Referring Expression Segmentation Beyond Single Image

Referring Expression Segmentation (RES) is a widely explored multi-modal...
research
11/07/2015

Generation and Comprehension of Unambiguous Object Descriptions

We propose a method that can generate an unambiguous description (known ...
research
07/05/2023

Focusing on what to decode and what to train: Efficient Training with HOI Split Decoders and Specific Target Guided DeNoising

Recent one-stage transformer-based methods achieve notable gains in the ...
research
09/11/2023

Multi3DRefer: Grounding Text Description to Multiple 3D Objects

We introduce the task of localizing a flexible number of objects in real...
research
03/12/2023

Universal Instance Perception as Object Discovery and Retrieval

All instance perception tasks aim at finding certain objects specified b...
research
05/28/2010

Using Soft Constraints To Learn Semantic Models Of Descriptions Of Shapes

The contribution of this paper is to provide a semantic model (using sof...
research
06/17/2019

MMDetection: Open MMLab Detection Toolbox and Benchmark

We present MMDetection, an object detection toolbox that contains a rich...

Please sign up or login with your details

Forgot password? Click here to reset