A Fast and Accurate One-Stage Approach to Visual Grounding

08/18/2019
by   Zhengyuan Yang, et al.
1

We propose a simple, fast, and accurate one-stage approach to visual grounding, inspired by the following insight. The performances of existing propose-and-rank two-stage methods are capped by the quality of the region candidates they propose in the first stage --- if none of the candidates could cover the ground truth region, there is no hope in the second stage to rank the right region to the top. To avoid this caveat, we propose a one-stage model that enables end-to-end joint optimization. The main idea is as straightforward as fusing a text query's embedding into the YOLOv3 object detector, augmented by spatial features so as to account for spatial mentions in the query. Despite being simple, this one-stage approach shows great potential in terms of both accuracy and speed for both phrase localization and referring expression comprehension, according to our experiments. Given these results along with careful investigations into some popular region proposals, we advocate for visual grounding a paradigm shift from the conventional two-stage methods to the one-stage framework.

READ FULL TEXT

page 1

page 7

page 8

research
12/09/2018

Real-Time Referring Expression Comprehension by Single-Stage Grounding Network

In this paper, we propose a novel end-to-end model, namely Single-Stage ...
research
09/11/2020

AttnGrounder: Talking to Cars with Attention

We propose Attention Grounder (AttnGrounder), a single-stage end-to-end ...
research
08/03/2020

Improving One-stage Visual Grounding by Recursive Sub-query Construction

We improve one-stage visual grounding by addressing current limitations ...
research
10/20/2020

Shortlisting Rules and Incentives in an End-to-End Model for Participatory Budgeting

We introduce an end-to-end model of participatory budgeting grounded in ...
research
08/11/2022

PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative Grounding

Panoptic Narrative Grounding (PNG) is an emerging task whose goal is to ...
research
09/03/2020

Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding

The prevailing framework for solving referring expression grounding is b...
research
03/19/2020

Giving Commands to a Self-driving Car: A Multimodal Reasoner for Visual Grounding

We propose a new spatial memory module and a spatial reasoner for the Vi...

Please sign up or login with your details

Forgot password? Click here to reset