Focusing On Targets For Improving Weakly Supervised Visual Grounding

02/22/2023
by   Viet-Quoc Pham, et al.
0

Weakly supervised visual grounding aims to predict the region in an image that corresponds to a specific linguistic query, where the mapping between the target object and query is unknown in the training stage. The state-of-the-art method uses a vision language pre-training model to acquire heatmaps from Grad-CAM, which matches every query word with an image region, and uses the combined heatmap to rank the region proposals. In this paper, we propose two simple but efficient methods for improving this approach. First, we propose a target-aware cropping approach to encourage the model to learn both object and scene level semantic representations. Second, we apply dependency parsing to extract words related to the target object, and then put emphasis on these words in the heatmap combination. Our method surpasses the previous SOTA methods on RefCOCO, RefCOCO+, and RefCOCOg by a notable margin.

READ FULL TEXT

page 3

page 4

research
09/05/2019

Knowledge-guided Pairwise Reconstruction Network for Weakly Supervised Referring Expression Grounding

Weakly supervised referring expression grounding (REG) aims at localizin...
research
08/28/2019

Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding

Weakly supervised referring expression grounding aims at localizing the ...
research
06/08/2021

Discriminative Triad Matching and Reconstruction for Weakly Referring Expression Grounding

In this paper, we are tackling the weakly-supervised referring expressio...
research
07/18/2023

Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding

3D visual grounding involves finding a target object in a 3D scene that ...
research
08/03/2022

Integrating Object-aware and Interaction-aware Knowledge for Weakly Supervised Scene Graph Generation

Recently, increasing efforts have been focused on Weakly Supervised Scen...
research
03/16/2023

LOCATE: Localize and Transfer Object Parts for Weakly Supervised Affordance Grounding

Humans excel at acquiring knowledge through observation. For example, we...
research
03/19/2020

Giving Commands to a Self-driving Car: A Multimodal Reasoner for Visual Grounding

We propose a new spatial memory module and a spatial reasoner for the Vi...

Please sign up or login with your details

Forgot password? Click here to reset