Masked Feature Prediction for Self-Supervised Visual Pre-Training

12/16/2021
by   Chen Wei, et al.
37

We present Masked Feature Prediction (MaskFeat) for self-supervised pre-training of video models. Our approach first randomly masks out a portion of the input sequence and then predicts the feature of the masked regions. We study five different types of features and find Histograms of Oriented Gradients (HOG), a hand-crafted feature descriptor, works particularly well in terms of both performance and efficiency. We observe that the local contrast normalization in HOG is essential for good results, which is in line with earlier work using HOG for visual recognition. Our approach can learn abundant visual knowledge and drive large-scale Transformer-based models. Without using extra model weights or supervision, MaskFeat pre-trained on unlabeled videos achieves unprecedented results of 86.7 Kinetics-600, 80.4 MaskFeat further generalizes to image input, which can be interpreted as a video with a single frame and obtains competitive results on ImageNet.

READ FULL TEXT

page 1

page 9

page 13

page 14

research
04/27/2022

Masked Spectrogram Prediction For Self-Supervised Audio Pre-Training

Transformer-based models attain excellent results and generalize well wh...
research
08/19/2023

Scalable Video Object Segmentation with Simplified Framework

The current popular methods for video object segmentation (VOS) implemen...
research
09/02/2023

Self-Supervised Video Transformers for Isolated Sign Language Recognition

This paper presents an in-depth analysis of various self-supervision met...
research
06/17/2021

An Evaluation of Self-Supervised Pre-Training for Skin-Lesion Analysis

Self-supervised pre-training appears as an advantageous alternative to s...
research
06/15/2022

Masked Frequency Modeling for Self-Supervised Visual Pre-Training

We present Masked Frequency Modeling (MFM), a unified frequency-domain-b...
research
09/20/2023

Weak Supervision for Label Efficient Visual Bug Detection

As video games evolve into expansive, detailed worlds, visual quality be...

Please sign up or login with your details

Forgot password? Click here to reset