Unsupervised Motion Representation Enhanced Network for Action Recognition

03/05/2021
by   Xiaohang Yang, et al.
2

Learning reliable motion representation between consecutive frames, such as optical flow, has proven to have great promotion to video understanding. However, the TV-L1 method, an effective optical flow solver, is time-consuming and expensive in storage for caching the extracted optical flow. To fill the gap, we propose UF-TSN, a novel end-to-end action recognition approach enhanced with an embedded lightweight unsupervised optical flow estimator. UF-TSN estimates motion cues from adjacent frames in a coarse-to-fine manner and focuses on small displacement for each level by extracting pyramid of feature and warping one to the other according to the estimated flow of the last level. Due to the lack of labeled motion for action datasets, we constrain the flow prediction with multi-scale photometric consistency and edge-aware smoothness. Compared with state-of-the-art unsupervised motion representation learning methods, our model achieves better accuracy while maintaining efficiency, which is competitive with some supervised or more complicated approaches.

READ FULL TEXT
research
12/22/2017

On the Integration of Optical Flow and Action Recognition

Most of the top performing action recognition methods use optical flow a...
research
04/02/2018

End-to-End Learning of Motion Representation for Video Understanding

Despite the recent success of end-to-end learned representations, hand-c...
research
07/26/2019

Unsupervised Learning for Optical Flow Estimation Using Pyramid Convolution LSTM

Most of current Convolution Neural Network (CNN) based methods for optic...
research
01/11/2019

DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition

Motion has shown to be useful for video understanding, where motion is t...
research
11/30/2020

Unsupervised Optical Flow Using Cost Function Unrolling

Analyzing motion between two consecutive images is one of the fundamenta...
research
04/26/2016

Real-time Action Recognition with Enhanced Motion Vector CNNs

The deep two-stream architecture exhibited excellent performance on vide...
research
05/06/2020

Exploiting Inter-Frame Regional Correlation for Efficient Action Recognition

Temporal feature extraction is an important issue in video-based action ...

Please sign up or login with your details

Forgot password? Click here to reset