M-VAD Names: a Dataset for Video Captioning with Naming

03/04/2019
by   Stefano Pini, et al.
12

Current movie captioning architectures are not capable of mentioning characters with their proper name, replacing them with a generic "someone" tag. The lack of movie description datasets with characters' visual annotations surely plays a relevant role in this shortage. Recently, we proposed to extend the M-VAD dataset by introducing such information. In this paper, we present an improved version of the dataset, namely M-VAD Names, and its semi-automatic annotation procedure. The resulting dataset contains 63k visual tracks and 34k textual mentions, all associated with character identities. To showcase the features of the dataset and quantify the complexity of the naming task, we investigate multimodal architectures to replace the "someone" tags with proper character names in existing video captions. The evaluation is further extended by testing this application on videos outside of the M-VAD Names dataset.

READ FULL TEXT

page 5

page 11

page 18

page 19

research
09/19/2018

MTLE: A Multitask Learning Encoder of Visual Feature Representations for Video and Movie Description

Learning visual feature representations for video analysis is a daunting...
research
05/20/2023

Movie101: A New Movie Understanding Benchmark

To help the visually impaired enjoy movies, automatic movie narrating sy...
research
07/21/2020

MovieNet: A Holistic Dataset for Movie Understanding

Recent years have seen remarkable advances in visual understanding. Howe...
research
03/21/2022

Audio visual character profiles for detecting background characters in entertainment media

An essential goal of computational media intelligence is to support unde...
research
07/29/2020

Enriching Video Captions With Contextual Text

Understanding video content and generating caption with context is an im...
research
10/20/2022

MBTI Personality Prediction for Fictional Characters Using Movie Scripts

An NLP model that understands stories should be able to understand the c...
research
09/26/2016

Learning Language-Visual Embedding for Movie Understanding with Natural-Language

Learning a joint language-visual embedding has a number of very appealin...

Please sign up or login with your details

Forgot password? Click here to reset