Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning

11/11/2015
by   Pingbo Pan, et al.
0

Recently, deep learning approach, especially deep Convolutional Neural Networks (ConvNets), have achieved overwhelming accuracy with fast processing speed for image classification. Incorporating temporal structure with deep ConvNets for video representation becomes a fundamental problem for video content analysis. In this paper, we propose a new approach, namely Hierarchical Recurrent Neural Encoder (HRNE), to exploit temporal information of videos. Compared to recent video representation inference approaches, this paper makes the following three contributions. First, our HRNE is able to efficiently exploit video temporal structure in a longer range by reducing the length of input information flow, and compositing multiple consecutive inputs at a higher level. Second, computation operations are significantly lessened while attaining more non-linearity. Third, HRNE is able to uncover temporal transitions between frame chunks with different granularities, i.e., it can model the temporal transitions between frames as well as the transitions between segments. We apply the new method to video captioning where temporal information plays a crucial role. Experiments demonstrate that our method outperforms the state-of-the-art on video captioning benchmarks. Notably, even using a single network with only RGB stream as input, HRNE beats all the recent systems which combine multiple inputs, such as RGB ConvNet plus 3D ConvNet.

READ FULL TEXT
research
11/28/2016

Bidirectional Multirate Reconstruction for Temporal Modeling in Videos

Despite the recent success of neural networks in image feature learning,...
research
06/05/2015

Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video

Recent studies have demonstrated the power of recurrent neural networks ...
research
11/28/2016

Hierarchical Boundary-Aware Neural Encoder for Video Captioning

The use of Recurrent Neural Networks for video captioning has recently g...
research
04/28/2019

Hierarchical Recurrent Neural Network for Video Summarization

Exploiting the temporal dependency among video frames or subshots is ver...
research
02/22/2022

Exploiting long-term temporal dynamics for video captioning

Automatically describing videos with natural language is a fundamental c...
research
08/13/2018

Fast Video Shot Transition Localization with Deep Structured Models

Detection of video shot transition is a crucial pre-processing step in v...
research
06/02/2016

Storytelling of Photo Stream with Bidirectional Multi-thread Recurrent Neural Network

Visual storytelling aims to generate human-level narrative language (i.e...

Please sign up or login with your details

Forgot password? Click here to reset