Motion-Augmented Self-Training for Video Recognition at Smaller Scale

05/04/2021
by   Kirill Gavrilyuk, et al.
5

The goal of this paper is to self-train a 3D convolutional neural network on an unlabeled video collection for deployment on small-scale video collections. As smaller video datasets benefit more from motion than appearance, we strive to train our network using optical flow, but avoid its computation during inference. We propose the first motion-augmented self-training regime, we call MotionFit. We start with supervised training of a motion model on a small, and labeled, video collection. With the motion model we generate pseudo-labels for a large unlabeled video collection, which enables us to transfer knowledge by learning to predict these pseudo-labels with an appearance model. Moreover, we introduce a multi-clip loss as a simple yet efficient way to improve the quality of the pseudo-labeling, even without additional auxiliary tasks. We also take into consideration the temporal granularity of videos during self-training of the appearance model, which was missed in previous works. As a result we obtain a strong motion-augmented representation model suited for video downstream tasks like action recognition and clip retrieval. On small-scale video datasets, MotionFit outperforms alternatives for knowledge transfer by 5 learning by 9

READ FULL TEXT
research
04/01/2021

Multiview Pseudo-Labeling for Semi-supervised Learning from Video

We present a multiview pseudo-labeling approach to video learning, a nov...
research
11/27/2017

Hierarchical Video Generation from Orthogonal Information: Optical Flow and Texture

Learning to represent and generate videos from unlabeled data is a very ...
research
12/17/2021

Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition

Semi-supervised action recognition is a challenging but important task d...
research
12/01/2019

Exploiting Motion Information from Unlabeled Videos for Static Image Action Recognition

Static image action recognition, which aims to recognize action based on...
research
08/03/2020

Memory-augmented Dense Predictive Coding for Video Representation Learning

The objective of this paper is self-supervised learning from video, in p...
research
08/03/2020

Residual Frames with Efficient Pseudo-3D CNN for Human Action Recognition

Human action recognition is regarded as a key cornerstone in domains suc...
research
03/31/2023

Procedure-Aware Pretraining for Instructional Video Understanding

Our goal is to learn a video representation that is useful for downstrea...

Please sign up or login with your details

Forgot password? Click here to reset