MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding

12/20/2021
by   Revanth Gangi Reddy, et al.
14

Recently, there has been an increasing interest in building question answering (QA) models that reason across multiple modalities, such as text and images. However, QA using images is often limited to just picking the answer from a pre-defined set of options. In addition, images in the real world, especially in news, have objects that are co-referential to the text, with complementary information from both modalities. In this paper, we present a new QA evaluation benchmark with 1,384 questions over news articles that require cross-media grounding of objects in images onto text. Specifically, the task involves multi-hop questions that require reasoning over image-caption pairs to identify the grounded visual object being referred to and then predicting a span from the news body text to answer the question. In addition, we introduce a novel multimedia data augmentation framework, based on cross-media knowledge extraction and synthetic question-answer generation, to automatically augment data that can provide weak supervision for this task. We evaluate both pipeline-based and end-to-end pretraining-based multimedia QA models on our benchmark, and show that they achieve promising performance, while considerably lagging behind human performance hence leaving large room for future work on this challenging new task.

READ FULL TEXT

page 2

page 4

page 6

page 7

page 10

page 11

page 12

research
03/05/2023

VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning

The ideal form of Visual Question Answering requires understanding, grou...
research
10/20/2020

Open Question Answering over Tables and Text

In open question answering (QA), the answer to a question is produced by...
research
11/02/2020

Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps

A multi-hop question answering (QA) dataset aims to test reasoning and i...
research
04/16/2022

Calibrating Trust of Multi-Hop Question Answering Systems with Decompositional Probes

Multi-hop Question Answering (QA) is a challenging task since it require...
research
11/10/2022

Watching the News: Towards VideoQA Models that can Read

Video Question Answering methods focus on commonsense reasoning and visu...
research
01/12/2020

Focal Visual-Text Attention for Memex Question Answering

Recent insights on language and vision with neural networks have been su...
research
12/14/2022

DialogQAE: N-to-N Question Answer Pair Extraction from Customer Service Chatlog

Harvesting question-answer (QA) pairs from customer service chatlog in t...

Please sign up or login with your details

Forgot password? Click here to reset