VQA Therapy: Exploring Answer Differences by Visually Grounding Answers

08/21/2023
by   Chongyan Chen, et al.
0

Visual question answering is a task of predicting the answer to a question about an image. Given that different people can provide different answers to a visual question, we aim to better understand why with answer groundings. We introduce the first dataset that visually grounds each unique answer to each visual question, which we call VQAAnswerTherapy. We then propose two novel problems of predicting whether a visual question has a single answer grounding and localizing all answer groundings. We benchmark modern algorithms for these novel problems to show where they succeed and struggle. The dataset and evaluation server can be found publicly at https://vizwiz.org/tasks-and-datasets/vqa-answer-therapy/.

READ FULL TEXT

page 9

page 15

page 16

page 18

page 21

page 22

page 23

page 24

research
02/04/2022

Grounding Answers for Visual Questions Asked by Visually Impaired People

Visual question answering is the task of answering questions about image...
research
08/12/2019

Why Does a Visual Question Have Different Answers?

Visual question answering is the task of returning the answer to a quest...
research
06/28/2021

Adventurer's Treasure Hunt: A Transparent System for Visually Grounded Compositional Visual Question Answering based on Scene Graphs

With the expressed goal of improving system transparency and visual grou...
research
08/29/2016

Visual Question: Predicting If a Crowd Will Agree on the Answer

Visual question answering (VQA) systems are emerging from a desire to em...
research
05/24/2023

Measuring Faithful and Plausible Visual Grounding in VQA

Metrics for Visual Grounding (VG) in Visual Question Answering (VQA) sys...
research
09/20/2023

Sentence Attention Blocks for Answer Grounding

Answer grounding is the task of locating relevant visual evidence for th...

Please sign up or login with your details

Forgot password? Click here to reset