Memory-augmented Dense Predictive Coding for Video Representation Learning

08/03/2020
by   Tengda Han, et al.
6

The objective of this paper is self-supervised learning from video, in particular for representations for action recognition. We make the following contributions: (i) We propose a new architecture and learning framework Memory-augmented Dense Predictive Coding (MemDPC) for the task. It is trained with a predictive attention mechanism over the set of compressed memories, such that any future states can always be constructed by a convex combination of the condense representations, allowing to make multiple hypotheses efficiently. (ii) We investigate visual-only self-supervised video representation learning from RGB frames, or from unsupervised optical flow, or both. (iii) We thoroughly evaluate the quality of learnt representation on four different downstream tasks: action recognition, video retrieval, learning with scarce annotations, and unintentional action classification. In all cases, we demonstrate state-of-the-art or comparable performance over other approaches with orders of magnitude fewer training data.

READ FULL TEXT

page 2

page 21

page 23

research
10/19/2020

Self-supervised Co-training for Video Representation Learning

The objective of this paper is visual-only self-supervised video represe...
research
06/24/2020

PredNet and Predictive Coding: A Critical Review

PredNet, a deep predictive coding network developed by Lotter et al., co...
research
07/06/2022

Learning Invariant World State Representations with Predictive Coding

Self-supervised learning methods overcome the key bottleneck for buildin...
research
06/14/2017

Learning without Prejudice: Avoiding Bias in Webly-Supervised Action Recognition

Webly-supervised learning has recently emerged as an alternative paradig...
research
02/26/2020

Evolving Losses for Unsupervised Video Representation Learning

We present a new method to learn video representations from large-scale ...
research
05/04/2021

Motion-Augmented Self-Training for Video Recognition at Smaller Scale

The goal of this paper is to self-train a 3D convolutional neural networ...
research
04/10/2022

SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric Action Recognition

Learning an egocentric action recognition model from video data is chall...

Please sign up or login with your details

Forgot password? Click here to reset