Open-Vocabulary Object Detection via Scene Graph Discovery

07/07/2023
by   Hengcan Shi, et al.
0

In recent years, open-vocabulary (OV) object detection has attracted increasing research attention. Unlike traditional detection, which only recognizes fixed-category objects, OV detection aims to detect objects in an open category set. Previous works often leverage vision-language (VL) training data (e.g., referring grounding data) to recognize OV objects. However, they only use pairs of nouns and individual objects in VL data, while these data usually contain much more information, such as scene graphs, which are also crucial for OV detection. In this paper, we propose a novel Scene-Graph-Based Discovery Network (SGDN) that exploits scene graph cues for OV detection. Firstly, a scene-graph-based decoder (SGDecoder) including sparse scene-graph-guided attention (SSGA) is presented. It captures scene graphs and leverages them to discover OV objects. Secondly, we propose scene-graph-based prediction (SGPred), where we build a scene-graph-based offset regression (SGOR) mechanism to enable mutual enhancement between scene graph extraction and object localization. Thirdly, we design a cross-modal learning mechanism in SGPred. It takes scene graphs as bridges to improve the consistency between cross-modal embeddings for OV object classification. Experiments on COCO and LVIS demonstrate the effectiveness of our approach. Moreover, we show the ability of our model for OV scene graph detection, while previous OV scene graph generation methods cannot tackle this task.

READ FULL TEXT

page 1

page 3

page 5

page 8

page 9

research
09/18/2023

Object2Scene: Putting Objects in Context for Open-Vocabulary 3D Detection

Point cloud-based open-vocabulary 3D object detection aims to detect 3D ...
research
07/17/2023

Unified Open-Vocabulary Dense Visual Prediction

In recent years, open-vocabulary (OV) dense visual prediction (such as O...
research
04/03/2021

Mutual Graph Learning for Camouflaged Object Detection

Automatically detecting/segmenting object(s) that blend in with their su...
research
07/31/2017

Scene Graph Generation from Objects, Phrases and Region Captions

Object detection, scene graph generation and region captioning, which ar...
research
09/14/2023

GRID: Scene-Graph-based Instruction-driven Robotic Task Planning

Recent works have shown that Large Language Models (LLMs) can promote gr...
research
04/17/2019

Graph based Dynamic Segmentation of Generic Objects in 3D

We propose a novel 3D segmentation method for RBGD stream data to deal w...
research
10/06/2020

Scene Graph Modification Based on Natural Language Commands

Structured representations like graphs and parse trees play a crucial ro...

Please sign up or login with your details

Forgot password? Click here to reset