Video captioning with recurrent networks based on frame- and video-level features and visual content classification

12/09/2015
by   Rakshith Shetty, et al.
0

In this paper, we describe the system for generating textual descriptions of short video clips using recurrent neural networks (RNN), which we used while participating in the Large Scale Movie Description Challenge 2015 in ICCV 2015. Our work builds on static image captioning systems with RNN based language models and extends this framework to videos utilizing both static image features and video-specific features. In addition, we study the usefulness of visual content classifiers as a source of additional information for caption generation. With experimental results we show that utilizing keyframe based features, dense trajectory video features and content classifier outputs together gives better performance than any one of them individually.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2015

The Long-Short Story of Movie Description

Generating descriptions for videos has many applications including assis...
research
08/17/2016

Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation

We present our submission to the Microsoft Video to Language Challenge o...
research
11/28/2016

Hierarchical Boundary-Aware Neural Encoder for Video Captioning

The use of Recurrent Neural Networks for video captioning has recently g...
research
09/29/2018

FusedLSTM: Fusing frame-level and video-level features for Content-based Video Relevance Prediction

This paper describes two of my best performing approaches on the Content...
research
06/07/2020

NITS-VC System for VATEX Video Captioning Challenge 2020

Video captioning is process of summarising the content, event and action...
research
08/31/2017

Generating Video Descriptions with Topic Guidance

Generating video descriptions in natural language (a.k.a. video captioni...
research
09/19/2018

MTLE: A Multitask Learning Encoder of Visual Feature Representations for Video and Movie Description

Learning visual feature representations for video analysis is a daunting...

Please sign up or login with your details

Forgot password? Click here to reset