MAQA: A Multimodal QA Benchmark for Negation

01/09/2023
by   Judith Yue Li, et al.
0

Multimodal learning can benefit from the representation power of pretrained Large Language Models (LLMs). However, state-of-the-art transformer based LLMs often ignore negations in natural language and there is no existing benchmark to quantitatively evaluate whether multimodal transformers inherit this weakness. In this study, we present a new multimodal question answering (QA) benchmark adapted from labeled music videos in AudioSet (Gemmeke et al., 2017) with the goal of systematically evaluating if multimodal transformers can perform complex reasoning to recognize new concepts as negation of previously learned concepts. We show that with standard fine-tuning approach multimodal transformers are still incapable of correctly interpreting negation irrespective of model size. However, our experiments demonstrate that augmenting the original training task distributions with negated QA examples allow the model to reliably reason with negation. To do this, we describe a novel data generation procedure that prompts the 540B-parameter PaLM model to automatically generate negated QA examples as compositions of easily accessible video tags. The generated examples contain more natural linguistic patterns and the gains compared to template-based task augmentation approach are significant.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/12/2016

Leveraging Video Descriptions to Learn Video Question Answering

We propose a scalable approach to learn video-based question answering (...
research
09/01/2021

WebQA: Multihop and Multimodal QA

Web search is fundamentally multimodal and multihop. Often, even before ...
research
04/24/2020

Template-Based Question Generation from Retrieved Sentences for Improved Unsupervised Question Answering

Question Answering (QA) is in increasing demand as the amount of informa...
research
07/20/2020

Multimodal Dialogue State Tracking By QA Approach with Data Augmentation

Recently, a more challenging state tracking task, Audio-Video Scene-Awar...
research
10/31/2018

A task in a suit and a tie: paraphrase generation with semantic augmentation

Paraphrasing is rooted in semantics. We show the effectiveness of transf...
research
10/24/2020

ReadOnce Transformers: Reusable Representations of Text for Transformers

While large-scale language models are extremely effective when directly ...
research
09/09/2019

Neural Conversational QA: Learning to Reason v.s. Exploiting Patterns

In this paper we work on the recently introduced ShARC task - a challeng...

Please sign up or login with your details

Forgot password? Click here to reset