Video Representation Learning by Recognizing Temporal Transformations

07/21/2020
by   Simon Jenni, et al.
7

We introduce a novel self-supervised learning approach to learn representations of videos that are responsive to changes in the motion dynamics. Our representations can be learned from data without human annotation and provide a substantial boost to the training of neural networks on small labeled data sets for tasks such as action recognition, which require to accurately distinguish the motion of objects. We promote an accurate learning of motion without human annotation by training a neural network to discriminate a video sequence from its temporally transformed versions. To learn to distinguish non-trivial motions, the design of the transformations is based on two principles: 1) To define clusters of motions based on time warps of different magnitude; 2) To ensure that the discrimination is feasible only by observing and analyzing as many image frames as possible. Thus, we introduce the following transformations: forward-backward playback, random frame skipping, and uniform frame skipping. Our experiments show that networks trained with the proposed method yield representations with improved transfer performance for action recognition on UCF101 and HMDB51.

READ FULL TEXT

page 3

page 6

page 11

page 13

page 15

research
12/07/2021

Time-Equivariant Contrastive Video Representation Learning

We introduce a novel self-supervised contrastive learning method to lear...
research
02/20/2021

Self-Supervised Learning via multi-Transformation Classification for Action Recognition

Self-supervised tasks have been utilized to build useful representations...
research
05/08/2015

Learning image representations tied to ego-motion

Understanding how images of objects and scenes behave in response to spe...
research
11/28/2018

Self-supervised Spatiotemporal Feature Learning by Video Geometric Transformations

To alleviate the expensive cost of data collection and annotation, many ...
research
05/04/2022

TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition

Recognizing transformation types applied to a video clip (RecogTrans) is...
research
03/20/2023

Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization

We propose a self-supervised method for learning motion-focused video re...
research
04/20/2021

MGSampler: An Explainable Sampling Strategy for Video Action Recognition

Frame sampling is a fundamental problem in video action recognition due ...

Please sign up or login with your details

Forgot password? Click here to reset