Suspected Object Matters: Rethinking Model's Prediction for One-stage Visual Grounding

03/10/2022
by   Yang Jiao, et al.
0

Recently, one-stage visual grounders attract high attention due to the comparable accuracy but significantly higher efficiency than two-stage grounders. However, inter-object relation modeling has not been well studied for one-stage grounders. Inter-object relationship modeling, though important, is not necessarily performed among all the objects within the image, as only a part of them are related to the text query and may confuse the model. We call these objects "suspected objects". However, exploring relationships among these suspected objects in the one-stage visual grounding paradigm is non-trivial due to two core problems: (1) no object proposals are available as the basis on which to select suspected objects and perform relationship modeling; (2) compared with those irrelevant to the text query, suspected objects are more confusing, as they may share similar semantics, be entangled with certain relationships, etc, and thereby more easily mislead the model's prediction. To address the above issues, this paper proposes a Suspected Object Graph (SOG) approach to encourage the correct referred object selection among the suspected ones in the one-stage visual grounding. Suspected objects are dynamically selected from a learned activation map as nodes to adapt to the current discrimination ability of the model during training. Afterward, on top of the suspected objects, a Keyword-aware Node Representation module (KNR) and an Exploration by Random Connection strategy (ERC) are concurrently proposed within the SOG to help the model rethink its initial prediction. Extensive ablation studies and comparison with state-of-the-art approaches on prevalent visual grounding benchmarks demonstrate the effectiveness of our proposed method.

READ FULL TEXT

page 2

page 7

page 11

research
05/12/2021

VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching

The prevailing framework for matching multimodal inputs is based on a tw...
research
09/03/2020

Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding

The prevailing framework for solving referring expression grounding is b...
research
11/03/2022

Grounding Scene Graphs on Natural Images via Visio-Lingual Message Passing

This paper presents a framework for jointly grounding objects that follo...
research
01/09/2023

Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding Network

Panoptic Narrative Grounding (PNG) is an emerging cross-modal grounding ...
research
08/03/2020

Improving One-stage Visual Grounding by Recursive Sub-query Construction

We improve one-stage visual grounding by addressing current limitations ...
research
04/13/2022

3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection

3D visual grounding aims to locate the referred target object in 3D poin...
research
12/10/2021

IFR-Explore: Learning Inter-object Functional Relationships in 3D Indoor Scenes

Building embodied intelligent agents that can interact with 3D indoor en...

Please sign up or login with your details

Forgot password? Click here to reset