A Dataset for Movie Description

01/12/2015
by   Anna Rohrbach, et al.
0

Descriptive video service (DVS) provides linguistic descriptions of movies and allows visually impaired people to follow a movie along with their peers. Such descriptions are by design mainly visual and thus naturally form an interesting data source for computer vision and computational linguistics. In this work we propose a novel dataset which contains transcribed DVS, which is temporally aligned to full length HD movies. In addition we also collected the aligned movie scripts which have been used in prior work and compare the two different sources of descriptions. In total the Movie Description dataset contains a parallel corpus of over 54,000 sentences and video snippets from 72 HD movies. We characterize the dataset by benchmarking different approaches for generating video descriptions. Comparing DVS to scripts, we find that DVS is far more visual and describes precisely what is shown rather than what should happen according to the scripts created prior to movie production.

READ FULL TEXT

page 1

page 2

page 9

research
05/12/2016

Movie Description

Audio Description (AD) provides linguistic descriptions of movies and al...
research
04/05/2017

Generating Descriptions with Grounded and Co-Referenced People

Learning how to generate descriptions of images or videos received major...
research
08/22/2020

Identity-Aware Multi-Sentence Video Description

Standard video and movie description tasks abstract away from person ide...
research
03/03/2015

Using Descriptive Video Services to Create a Large Data Source for Video Annotation Research

In this work, we introduce a dataset of video annotated with high qualit...
research
06/04/2015

The Long-Short Story of Movie Description

Generating descriptions for videos has many applications including assis...
research
06/14/2018

From Trailers to Storylines: An Efficient Way to Learn from Movies

The millions of movies produced in the human history are valuable resour...
research
04/07/2017

Egocentric Video Description based on Temporally-Linked Sequences

Egocentric vision consists in acquiring images along the day from a firs...

Please sign up or login with your details

Forgot password? Click here to reset