Bidirectional Multirate Reconstruction for Temporal Modeling in Videos

11/28/2016
by   Linchao Zhu, et al.
0

Despite the recent success of neural networks in image feature learning, a major problem in the video domain is the lack of sufficient labeled data for learning to model temporal information. In this paper, we propose an unsupervised temporal modeling method that learns from untrimmed videos. The speed of motion varies constantly, e.g., a man may run quickly or slowly. We therefore train a Multirate Visual Recurrent Model (MVRM) by encoding frames of a clip with different intervals. This learning process makes the learned model more capable of dealing with motion speed variance. Given a clip sampled from a video, we use its past and future neighboring clips as the temporal context, and reconstruct the two temporal transitions, i.e., present→past transition and present→future transition, reflecting the temporal information in different views. The proposed method exploits the two transitions simultaneously by incorporating a bidirectional reconstruction which consists of a backward reconstruction and a forward reconstruction. We apply the proposed method to two challenging video tasks, i.e., complex event detection and video captioning, in which it achieves state-of-the-art performance. Notably, our method generates the best single feature for event detection with a relative improvement of 10.4 achieves the best performance in video captioning across all evaluation metrics on the YouTube2Text dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/11/2015

Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning

Recently, deep learning approach, especially deep Convolutional Neural N...
research
05/30/2021

Towards Diverse Paragraph Captioning for Untrimmed Videos

Video paragraph captioning aims to describe multiple events in untrimmed...
research
11/11/2020

Unsupervised Video Representation Learning by Bidirectional Feature Prediction

This paper introduces a novel method for self-supervised video represent...
research
12/24/2020

An Efficient Recurrent Adversarial Framework for Unsupervised Real-Time Video Enhancement

Video enhancement is a challenging problem, more than that of stills, ma...
research
07/13/2022

You Only Align Once: Bidirectional Interaction for Spatial-Temporal Video Super-Resolution

Spatial-Temporal Video Super-Resolution (ST-VSR) technology generates hi...
research
03/31/2018

Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning

Dense video captioning is a newly emerging task that aims at both locali...
research
06/15/2022

PlanarRecon: Real-time 3D Plane Detection and Reconstruction from Posed Monocular Videos

We present PlanarRecon – a novel framework for globally coherent detecti...

Please sign up or login with your details

Forgot password? Click here to reset