Video Storytelling

07/25/2018
by   Junnan Li, et al.
2

Bridging vision and natural language is a longstanding goal in computer vision and multimedia research. While earlier works focus on generating a single-sentence description for visual content, recent works have studied paragraph generation. In this work, we introduce the problem of video storytelling, which aims at generating coherent and succinct stories for long videos. Video storytelling introduces new challenges, mainly due to the diversity of the story and the length and complexity of the video. We propose novel methods to address the challenges. First, we propose a context-aware framework for multimodal embedding learning, where we design a Residual Bidirectional Recurrent Neural Network to leverage contextual information from past and future. Second, we propose a Narrator model to discover the underlying storyline. The Narrator is formulated as a reinforcement learning agent which is trained by directly optimizing the textual metric of the generated story. We evaluate our method on the Video Story dataset, a new dataset that we have collected to enable the study. We compare our method with multiple state-of-the-art baselines, and show that our method achieves better performance, in terms of quantitative measures and user study.

READ FULL TEXT

page 2

page 3

page 5

page 8

research
10/14/2019

TruNet: Short Videos Generation from Long Videos via Story-Preserving Truncation

In this work, we introduce a new problem, named as story-preserving lon...
research
03/27/2021

Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review

Research in the area of Vision and Language encompasses challenging topi...
research
01/31/2018

Learning Video-Story Composition via Recurrent Neural Network

In this paper, we propose a learning-based method to compose a video-sto...
research
07/29/2021

Video Generation from Text Employing Latent Path Construction for Temporal Modeling

Video generation is one of the most challenging tasks in Machine Learnin...
research
09/13/2021

Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation

Have you ever looked at a painting and wondered what is the story behind...
research
05/08/2018

A Memory Network Approach for Story-based Temporal Summarization of 360° Videos

We address the problem of story-based temporal summarization of long 360...
research
10/13/2022

Re3: Generating Longer Stories With Recursive Reprompting and Revision

We consider the problem of automatically generating longer stories of ov...

Please sign up or login with your details

Forgot password? Click here to reset