Discriminative Triad Matching and Reconstruction for Weakly Referring Expression Grounding

06/08/2021
by   Mingjie Sun, et al.
5

In this paper, we are tackling the weakly-supervised referring expression grounding task, for the localization of a referent object in an image according to a query sentence, where the mapping between image regions and queries are not available during the training stage. In traditional methods, an object region that best matches the referring expression is picked out, and then the query sentence is reconstructed from the selected region, where the reconstruction difference serves as the loss for back-propagation. The existing methods, however, conduct both the matching and the reconstruction approximately as they ignore the fact that the matching correctness is unknown. To overcome this limitation, a discriminative triad is designed here as the basis to the solution, through which a query can be converted into one or multiple discriminative triads in a very scalable way. Based on the discriminative triad, we further propose the triad-level matching and reconstruction modules which are lightweight yet effective for the weakly-supervised training, making it three times lighter and faster than the previous state-of-the-art methods. One important merit of our work is its superior performance despite the simple and neat design. Specifically, the proposed method achieves a new state-of-the-art accuracy when evaluated on RefCOCO (39.21 4.17

READ FULL TEXT

page 4

page 6

page 7

research
08/28/2019

Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding

Weakly supervised referring expression grounding aims at localizing the ...
research
02/22/2023

Focusing On Targets For Improving Weakly Supervised Visual Grounding

Weakly supervised visual grounding aims to predict the region in an imag...
research
07/18/2022

Entity-enhanced Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding

Weakly supervised Referring Expression Grounding (REG) aims to ground a ...
research
07/18/2023

Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding

3D visual grounding involves finding a target object in a 3D scene that ...
research
08/27/2019

Attention-based Dropout Layer for Weakly Supervised Object Localization

Weakly Supervised Object Localization (WSOL) techniques learn the object...
research
12/01/2021

Weakly-Supervised Video Object Grounding via Causal Intervention

We target at the task of weakly-supervised video object grounding (WSVOG...
research
08/03/2020

Improving One-stage Visual Grounding by Recursive Sub-query Construction

We improve one-stage visual grounding by addressing current limitations ...

Please sign up or login with your details

Forgot password? Click here to reset