End-to-End Learning of Motion Representation for Video Understanding

04/02/2018
by   Lijie Fan, et al.
0

Despite the recent success of end-to-end learned representations, hand-crafted optical flow features are still widely used in video analysis tasks. To fill this gap, we propose TVNet, a novel end-to-end trainable neural network, to learn optical-flow-like features from data. TVNet subsumes a specific optical flow solver, the TV-L1 method, and is initialized by unfolding its optimization iterations as neural layers. TVNet can therefore be used directly without any extra learning. Moreover, it can be naturally concatenated with other task-specific networks to formulate an end-to-end architecture, thus making our method more efficient than current multi-stage approaches by avoiding the need to pre-compute and store features on disk. Finally, the parameters of the TVNet can be further fine-tuned by end-to-end training. This enables TVNet to learn richer and task-specific patterns beyond exact optical flow. Extensive experiments on two action recognition benchmarks verify the effectiveness of the proposed approach. Our TVNet achieves better accuracies than all compared methods, while being competitive with the fastest counterpart in terms of features extraction time.

READ FULL TEXT

page 6

page 8

research
03/05/2021

Unsupervised Motion Representation Enhanced Network for Action Recognition

Learning reliable motion representation between consecutive frames, such...
research
09/20/2017

SegFlow: Joint Learning for Video Object Segmentation and Optical Flow

This paper proposes an end-to-end trainable network, SegFlow, for simult...
research
10/02/2018

Representation Flow for Action Recognition

In this paper, we propose a convolutional layer inspired by optical flow...
research
07/26/2018

Conditional Prior Networks for Optical Flow

Classical computation of optical flow involves generic priors (regulariz...
research
09/01/2021

An End-to-End learnable Flow Regularized Model for Brain Tumor Segmentation

Many segmentation tasks for biomedical images can be modeled as the mini...
research
06/13/2019

Hallucinating Bag-of-Words and Fisher Vector IDT terms for CNN-based Action Recognition

In this paper, we revive the use of old-fashioned handcrafted video repr...
research
01/29/2019

Visual Rhythm Prediction with Feature-Aligning Network

In this paper, we propose a data-driven visual rhythm prediction method,...

Please sign up or login with your details

Forgot password? Click here to reset