Hierarchical Contrastive Motion Learning for Video Action Recognition

07/20/2020
by   Xitong Yang, et al.
4

One central question for video action recognition is how to model motion. In this paper, we present hierarchical contrastive motion learning, a new self-supervised learning framework to extract effective motion representations from raw video frames. Our approach progressively learns a hierarchy of motion features that correspond to different abstraction levels in a network. This hierarchical design bridges the semantic gap between low-level motion cues and high-level recognition tasks, and promotes the fusion of appearance and motion information at multiple levels. At each level, an explicit motion self-supervision is provided via contrastive learning to enforce the motion features at the current level to predict the future ones at the previous level. Thus, the motion features at higher levels are trained to gradually capture semantic dynamics and evolve more discriminative for action recognition. Our motion learning module is lightweight and flexible to be embedded into various backbone networks. Extensive experiments on four benchmarks show that the proposed approach consistently achieves superior results.

READ FULL TEXT
research
10/12/2020

MS^2L: Multi-Task Self-Supervised Learning for Skeleton Based Action Recognition

In this paper, we address self-supervised representation learning from h...
research
03/19/2018

Featureless: Bypassing feature extraction in action categorization

This method introduces an efficient manner of learning action categories...
research
04/20/2023

Video-based Contrastive Learning on Decision Trees: from Action Recognition to Autism Diagnosis

How can we teach a computer to recognize 10,000 different actions? Deep ...
research
12/21/2022

MoQuad: Motion-focused Quadruple Construction for Video Contrastive Learning

Learning effective motion features is an essential pursuit of video repr...
research
06/05/2022

Variable-rate hierarchical CPC leads to acoustic unit discovery in speech

The success of deep learning comes from its ability to capture the hiera...
research
04/03/2023

MoLo: Motion-augmented Long-short Contrastive Learning for Few-shot Action Recognition

Current state-of-the-art approaches for few-shot action recognition achi...
research
01/01/2023

Hierarchical Explanations for Video Action Recognition

We propose Hierarchical ProtoPNet: an interpretable network that explain...

Please sign up or login with your details

Forgot password? Click here to reset