VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning

03/05/2023
by   Kang Chen, et al.
0

The ideal form of Visual Question Answering requires understanding, grounding and reasoning in the joint space of vision and language and serves as a proxy for the AI task of scene understanding. However, most existing VQA benchmarks are limited to just picking the answer from a pre-defined set of options and lack attention to text. We present a new challenge with a dataset that contains 23,781 questions based on 10124 image-text pairs. Specifically, the task requires the model to align multimedia representations of the same entity to implement multi-hop reasoning between image and text and finally use natural language to answer the question. The aim of this challenge is to develop and benchmark models that are capable of multimedia entity alignment, multi-step reasoning and open-ended answer generation.

READ FULL TEXT

page 1

page 4

research
12/20/2021

MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding

Recently, there has been an increasing interest in building question ans...
research
09/01/2021

WebQA: Multihop and Multimodal QA

Web search is fundamentally multimodal and multihop. Often, even before ...
research
06/05/2018

Focal Visual-Text Attention for Visual Question Answering

Recent insights on language and vision with neural networks have been su...
research
10/14/2022

SQA3D: Situated Question Answering in 3D Scenes

We propose a new task to benchmark scene understanding of embodied agent...
research
07/26/2022

Equivariant and Invariant Grounding for Video Question Answering

Video Question Answering (VideoQA) is the task of answering the natural ...
research
12/13/2021

Improving and Diagnosing Knowledge-Based Visual Question Answering via Entity Enhanced Knowledge Injection

Knowledge-Based Visual Question Answering (KBVQA) is a bi-modal task req...
research
12/21/2020

Object-Centric Diagnosis of Visual Reasoning

When answering questions about an image, it not only needs knowing what ...

Please sign up or login with your details

Forgot password? Click here to reset