Edited Media Understanding: Reasoning About Implications of Manipulated Images

12/08/2020
by   Jeff Da, et al.
0

Multimodal disinformation, from `deepfakes' to simple edits that deceive, is an important societal problem. Yet at the same time, the vast majority of media edits are harmless – such as a filtered vacation photo. The difference between this example, and harmful edits that spread disinformation, is one of intent. Recognizing and describing this intent is a major challenge for today's AI systems. We present the task of Edited Media Understanding, requiring models to answer open-ended questions that capture the intent and implications of an image edit. We introduce a dataset for our task, EMU, with 48k question-answer pairs written in rich natural language. We evaluate a wide variety of vision-and-language models for our task, and introduce a new model PELICAN, which builds upon recent progress in pretrained multimodal representations. Our model obtains promising results on our dataset, with humans rating its answers as accurate 40.35 be done – humans prefer human-annotated captions 93.56 provide analysis that highlights areas for further progress.

READ FULL TEXT

page 1

page 3

page 6

page 7

page 11

research
10/13/2021

MMIU: Dataset for Visual Intent Understanding in Multimodal Assistants

In multimodal assistant, where vision is also one of the input modalitie...
research
12/19/2022

Asking Clarification Questions for Code Generation in General-Purpose Programming Language

Code generation from text requires understanding the user's intent from ...
research
11/26/2019

PIQA: Reasoning about Physical Commonsense in Natural Language

To apply eyeshadow without a brush, should I use a cotton swab or a toot...
research
09/01/2021

WebQA: Multihop and Multimodal QA

Web search is fundamentally multimodal and multihop. Often, even before ...
research
06/28/2023

Query Understanding in the Age of Large Language Models

Querying, conversing, and controlling search and information-seeking int...
research
11/11/2020

Intentonomy: a Dataset and Study towards Human Intent Understanding

An image is worth a thousand words, conveying information that goes beyo...
research
04/15/2017

Neural Paraphrase Identification of Questions with Noisy Pretraining

We present a solution to the problem of paraphrase identification of que...

Please sign up or login with your details

Forgot password? Click here to reset