Predicting masked tokens in stochastic locations improves masked image modeling

07/31/2023
by   Amir Bar, et al.
0

Self-supervised learning is a promising paradigm in deep learning that enables learning from unlabeled data by constructing pretext tasks that require learning useful representations. In natural language processing, the dominant pretext task has been masked language modeling (MLM), while in computer vision there exists an equivalent called Masked Image Modeling (MIM). However, MIM is challenging because it requires predicting semantic content in accurate locations. E.g, given an incomplete picture of a dog, we can guess that there is a tail, but we cannot determine its exact location. In this work, we propose FlexPredict, a stochastic model that addresses this challenge by incorporating location uncertainty into the model. Specifically, we condition the model on stochastic masked token positions to guide the model toward learning features that are more robust to location uncertainties. Our approach improves downstream performance on a range of tasks, e.g, compared to MIM baselines, FlexPredict boosts ImageNet linear probing by 1.6 semi-supervised video segmentation using ViT-L.

READ FULL TEXT

page 1

page 3

page 7

page 8

research
03/23/2022

What to Hide from Your Students: Attention-Guided Masked Image Modeling

Transformers and masked language modeling are quickly being adopted and ...
research
12/03/2022

Exploring Stochastic Autoregressive Image Modeling for Visual Representation

Autoregressive language modeling (ALM) have been successfully used in se...
research
11/05/2022

Learning to Infer from Unlabeled Data: A Semi-supervised Learning Approach for Robust Natural Language Inference

Natural Language Inference (NLI) or Recognizing Textual Entailment (RTE)...
research
07/01/2021

Pretext Tasks selection for multitask self-supervised speech representation learning

Through solving pretext tasks, self-supervised learning leverages unlabe...
research
08/12/2022

USB: A Unified Semi-supervised Learning Benchmark

Semi-supervised learning (SSL) improves model generalization by leveragi...
research
06/06/2023

DenseDINO: Boosting Dense Self-Supervised Learning with Token-Based Point-Level Consistency

In this paper, we propose a simple yet effective transformer framework f...
research
12/13/2018

Next Hit Predictor - Self-exciting Risk Modeling for Predicting Next Locations of Serial Crimes

Our goal is to predict the location of the next crime in a crime series,...

Please sign up or login with your details

Forgot password? Click here to reset