Identity-Aware Multi-Sentence Video Description

08/22/2020
by   Jae Sung Park, et al.
11

Standard video and movie description tasks abstract away from person identities, thus failing to link identities across sentences. We propose a multi-sentence Identity-Aware Video Description task, which overcomes this limitation and requires to re-identify persons locally within a set of consecutive clips. We introduce an auxiliary task of Fill-in the Identity, that aims to predict persons' IDs consistently within a set of clips, when the video descriptions are given. Our proposed approach to this task leverages a Transformer architecture allowing for coherent joint prediction of multiple IDs. One of the key components is a gender-aware textual representation as well an additional gender prediction objective in the main model. This auxiliary task allows us to propose a two-stage approach to Identity-Aware Video Description. We first generate multi-sentence video descriptions, and then apply our Fill-in the Identity model to establish links between the predicted person entities. To be able to tackle both tasks, we augment the Large Scale Movie Description Challenge (LSMDC) benchmark with new annotations suited for our problem statement. Experiments show that our proposed Fill-in the Identity model is superior to several baselines and recent works, and allows us to generate descriptions with locally re-identified people.

READ FULL TEXT

page 19

page 20

page 21

research
05/12/2016

Movie Description

Audio Description (AD) provides linguistic descriptions of movies and al...
research
01/12/2015

A Dataset for Movie Description

Descriptive video service (DVS) provides linguistic descriptions of movi...
research
12/13/2018

Adversarial Inference for Multi-Sentence Video Description

While significant progress has been made in the image captioning task, v...
research
06/04/2015

The Long-Short Story of Movie Description

Generating descriptions for videos has many applications including assis...
research
05/20/2023

Movie101: A New Movie Understanding Benchmark

To help the visually impaired enjoy movies, automatic movie narrating sy...
research
11/12/2019

Creating Auxiliary Representations from Charge Definitions for Criminal Charge Prediction

Charge prediction, determining charges for criminal cases by analyzing t...
research
09/27/2021

Multi-Task and Multi-Corpora Training Strategies to Enhance Argumentative Sentence Linking Performance

Argumentative structure prediction aims to establish links between textu...

Please sign up or login with your details

Forgot password? Click here to reset