What is More Likely to Happen Next? Video-and-Language Future Event Prediction

10/15/2020
by   Jie Lei, et al.
3

Given a video with aligned dialogue, people can often infer what is more likely to happen next. Making such predictions requires not only a deep understanding of the rich dynamics underlying the video and dialogue, but also a significant amount of commonsense knowledge. In this work, we explore whether AI models are able to learn to make such multimodal commonsense next-event predictions. To support research in this direction, we collect a new dataset, named Video-and-Language Event Prediction (VLEP), with 28,726 future event prediction examples (along with their rationales) from 10,234 diverse TV Show and YouTube Lifestyle Vlog video clips. In order to promote the collection of non-trivial challenging examples, we employ an adversarial human-and-model-in-the-loop data collection procedure. We also present a strong baseline incorporating information from video, dialogue, and commonsense knowledge. Experiments show that each type of information is useful for this challenging task, and that compared to the high human performance on VLEP, our model provides a good starting point but leaves large room for future work. Our dataset and code are available at: https://github.com/jayleicn/VideoLanguageFuturePred

READ FULL TEXT

page 1

page 9

page 13

page 15

page 16

page 17

research
03/25/2020

VIOLIN: A Large-Scale Dataset for Video-and-Language Inference

We introduce a new task, Video-and-Language Inference, for joint multimo...
research
05/12/2023

ACCENT: An Automatic Event Commonsense Evaluation Metric for Open-Domain Dialogue Systems

Commonsense reasoning is omnipresent in human communications and thus is...
research
09/17/2021

Does Commonsense help in detecting Sarcasm?

Sarcasm detection is important for several NLP tasks such as sentiment i...
research
07/08/2022

CoSIm: Commonsense Reasoning for Counterfactual Scene Imagination

As humans, we can modify our assumptions about a scene by imagining alte...
research
06/01/2021

CIDER: Commonsense Inference for Dialogue Explanation and Reasoning

Commonsense inference to understand and explain human language is a fund...
research
10/06/2022

Multiview Contextual Commonsense Inference: A New Dataset and Task

Contextual commonsense inference is the task of generating various types...
research
05/17/2018

Event2Mind: Commonsense Inference on Events, Intents, and Reactions

We investigate a new commonsense inference task: given an event describe...

Please sign up or login with your details

Forgot password? Click here to reset