Deconfounded Visual Grounding

12/31/2021
by   Jianqiang Huang, et al.
1

We focus on the confounding bias between language and location in the visual grounding pipeline, where we find that the bias is the major visual reasoning bottleneck. For example, the grounding process is usually a trivial language-location association without visual reasoning, e.g., grounding any language query containing sheep to the nearly central regions, due to that most queries about sheep have ground-truth locations at the image center. First, we frame the visual grounding pipeline into a causal graph, which shows the causalities among image, query, target location and underlying confounder. Through the causal graph, we know how to break the grounding bottleneck: deconfounded visual grounding. Second, to tackle the challenge that the confounder is unobserved in general, we propose a confounder-agnostic approach called: Referring Expression Deconfounder (RED), to remove the confounding bias. Third, we implement RED as a simple language attention, which can be applied in any grounding method. On popular benchmarks, RED improves various state-of-the-art grounding methods by a significant margin. Code will soon be available at: https://github.com/JianqiangH/Deconfounded_VG.

READ FULL TEXT

page 1

page 7

page 13

research
09/06/2023

A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models

Key to tasks that require reasoning about natural language in visual con...
research
11/28/2022

G^3: Geolocation via Guidebook Grounding

We demonstrate how language can improve geolocation: the task of predict...
research
07/21/2023

Advancing Visual Grounding with Scene Knowledge: Benchmark and Method

Visual grounding (VG) aims to establish fine-grained alignment between v...
research
09/10/2021

Panoptic Narrative Grounding

This paper proposes Panoptic Narrative Grounding, a spatially fine and g...
research
06/09/2019

Referring Expression Grounding by Marginalizing Scene Graph Likelihood

We focus on the task of grounding referring expressions in images, e.g.,...
research
08/03/2020

Improving One-stage Visual Grounding by Recursive Sub-query Construction

We improve one-stage visual grounding by addressing current limitations ...
research
10/27/2019

Task-Oriented Language Grounding for Language Input with Multiple Sub-Goals of Non-Linear Order

In this work, we analyze the performance of general deep reinforcement l...

Please sign up or login with your details

Forgot password? Click here to reset