Move Forward and Tell: A Progressive Generator of Video Descriptions

07/26/2018
by   Yilei Xiong, et al.
2

We present an efficient framework that can generate a coherent paragraph to describe a given video. Previous works on video captioning usually focus on video clips. They typically treat an entire video as a whole and generate the caption conditioned on a single embedding. On the contrary, we consider videos with rich temporal structures and aim to generate paragraph descriptions that can preserve the story flow while being coherent and concise. Towards this goal, we propose a new approach, which produces a descriptive paragraph by assembling temporally localized descriptions. Given a video, it selects a sequence of distinctive clips and generates sentences thereon in a coherent manner. Particularly, the selection of clips and the production of sentences are done jointly and progressively driven by a recurrent network -- what to describe next depends on what have been said before. Here, the recurrent network is learned via self-critical sequence training with both sentence-level and paragraph-level rewards. On the ActivityNet Captions dataset, our method demonstrated the capability of generating high-quality paragraph descriptions for videos. Compared to those by other methods, the descriptions produced by our method are often more relevant, more coherent, and more concise.

READ FULL TEXT

page 2

page 5

page 14

page 18

page 19

page 20

research
05/11/2020

MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning

Generating multi-sentence descriptions for videos is one of the most cha...
research
10/26/2019

Diverse Video Captioning Through Latent Variable Expansion with Conditional GAN

Automatically describing video content with text description is challeng...
research
02/28/2015

Generating Multi-Sentence Lingual Descriptions of Indoor Scenes

This paper proposes a novel framework for generating lingual description...
research
11/26/2015

TennisVid2Text: Fine-grained Descriptions for Domain Specific Videos

Automatically describing videos has ever been fascinating. In this work,...
research
12/02/2021

Syntax Customized Video Captioning by Imitating Exemplar Sentences

Enhancing the diversity of sentences to describe video contents is an im...
research
03/12/2022

Taking an Emotional Look at Video Paragraph Captioning

Translating visual data into natural language is essential for machines ...
research
04/25/2023

TCR: Short Video Title Generation and Cover Selection with Attention Refinement

With the widespread popularity of user-generated short videos, it become...

Please sign up or login with your details

Forgot password? Click here to reset