TGIF: A New Dataset and Benchmark on Animated GIF Description

04/10/2016
by   Yuncheng Li, et al.
0

With the recent popularity of animated GIFs on social media, there is need for ways to index them with rich metadata. To advance research on animated GIF understanding, we collected a new dataset, Tumblr GIF (TGIF), with 100K animated GIFs from Tumblr and 120K natural language descriptions obtained via crowdsourcing. The motivation for this work is to develop a testbed for image sequence description systems, where the task is to generate natural language descriptions for animated GIFs or video clips. To ensure a high quality dataset, we developed a series of novel quality controls to validate free-form text input from crowdworkers. We show that there is unambiguous association between visual content and natural language descriptions in our dataset, making it an ideal benchmark for the visual content captioning task. We perform extensive statistical analyses to compare our dataset to existing image and video description datasets. Next, we provide baseline results on the animated GIF description task, using three representative techniques: nearest neighbor, statistical machine translation, and recurrent neural networks. Finally, we show that models fine-tuned from our animated GIF description dataset can be helpful for automatic movie description.

READ FULL TEXT

page 4

page 6

page 8

page 12

page 13

page 14

page 15

research
12/13/2020

MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision and Language Research in Turkish

Automatic generation of video descriptions in natural language, also cal...
research
08/27/2018

Attentive Sequence to Sequence Translation for Localizing Clips of Interest by Natural Language Descriptions

We propose a novel attentive sequence to sequence translator (ASST) for ...
research
01/28/2022

Summarizing Differences between Text Distributions with Natural Language

How do two distributions of texts differ? Humans are slow at answering t...
research
03/27/2021

Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review

Research in the area of Vision and Language encompasses challenging topi...
research
04/06/2016

Improving LSTM-based Video Description with Linguistic Knowledge Mined from Text

This paper investigates how linguistic knowledge mined from large text c...
research
10/05/2021

Truth-Conditional Captioning of Time Series Data

In this paper, we explore the task of automatically generating natural l...
research
03/03/2015

Using Descriptive Video Services to Create a Large Data Source for Video Annotation Research

In this work, we introduce a dataset of video annotated with high qualit...

Please sign up or login with your details

Forgot password? Click here to reset