Learning Discriminative Motion Features Through Detection

12/11/2018
by   Gedas Bertasius, et al.
0

Despite huge success in the image domain, modern detection models such as Faster R-CNN have not been used nearly as much for video analysis. This is arguably due to the fact that detection models are designed to operate on single frames and as a result do not have a mechanism for learning motion representations directly from video. We propose a learning procedure that allows detection models such as Faster R-CNN to learn motion features directly from the RGB video data while being optimized with respect to a pose estimation task. Given a pair of video frames---Frame A and Frame B---we force our model to predict human pose in Frame A using the features from Frame B. We do so by leveraging deformable convolutions across space and time. Our network learns to spatially sample features from Frame B in order to maximize pose detection accuracy in Frame A. This naturally encourages our network to learn motion offsets encoding the spatial correspondences between the two frames. We refer to these motion offsets as DiMoFs (Discriminative Motion Features). In our experiments we show that our training scheme helps learn effective motion cues, which can be used to estimate and localize salient human motion. Furthermore, we demonstrate that as a byproduct, our model also learns features that lead to improved pose detection in still-images, and better keypoint tracking. Finally, we show how to leverage our learned model for the tasks of spatiotemporal action localization and fine-grained action recognition.

READ FULL TEXT
research
09/03/2021

Video Pose Distillation for Few-Shot, Fine-Grained Sports Action Recognition

Human pose is a useful feature for fine-grained sports action understand...
research
04/26/2018

Deep Keyframe Detection in Human Action Videos

Detecting representative frames in videos based on human actions is quit...
research
12/04/2016

Online Localization and Prediction of Actions and Interactions

This paper proposes a person-centric and online approach to the challeng...
research
10/21/2014

Compositional Structure Learning for Action Understanding

The focus of the action understanding literature has predominately been ...
research
04/05/2019

Paying More Attention to Motion: Attention Distillation for Learning Video Representations

We address the challenging problem of learning motion representations us...
research
08/03/2017

Unsupervised Video Understanding by Reconciliation of Posture Similarities

Understanding human activity and being able to explain it in detail surp...
research
06/24/2021

Generalized One-Class Learning Using Pairs of Complementary Classifiers

One-class learning is the classic problem of fitting a model to the data...

Please sign up or login with your details

Forgot password? Click here to reset