Selfie: Self-supervised Pretraining for Image Embedding

06/07/2019
by   Trieu H. Trinh, et al.
2

We introduce a pretraining technique called Selfie, which stands for SELF-supervised Image Embedding. Selfie generalizes the concept of masked language modeling to continuous data, such as images. Given masked-out patches in an input image, our method learns to select the correct patch, among other "distractor" patches sampled from the same image, to fill in the masked location. This classification objective sidesteps the need for predicting exact pixel values of the target patches. The pretraining architecture includes a network of convolutional blocks to process patches followed by an attention pooling network to summarize the content of unmasked patches before predicting masked ones. During finetuning, we reuse the convolutional weights found by pretraining. We evaluate our method on three benchmarks (CIFAR-10, ImageNet 32 x 32, and ImageNet 224 x 224) with varying amounts of labeled data, from 5 100 improvements to ResNet-50 across all settings compared to the standard supervised training of the same network. Notably, on ImageNet 224 x 224 with 60 examples per class (5 from 35.6 pretraining method also improves ResNet-50 training stability, especially on low data regime, by significantly lowering the standard deviation of test accuracies across datasets.

READ FULL TEXT
research
11/23/2022

Can we Adopt Self-supervised Pretraining for Chest X-Rays?

Chest radiograph (or Chest X-Ray, CXR) is a popular medical imaging moda...
research
08/20/2019

Saccader: Improving Accuracy of Hard Attention Models for Vision

Although deep convolutional neural networks achieve state-of-the-art per...
research
12/13/2020

Learning Heatmap-Style Jigsaw Puzzles Provides Good Pretraining for 2D Human Pose Estimation

The target of 2D human pose estimation is to locate the keypoints of bod...
research
03/11/2017

Colorization as a Proxy Task for Visual Understanding

We investigate and improve self-supervision as a drop-in replacement for...
research
12/05/2022

Location-Aware Self-Supervised Transformers

Pixel-level labels are particularly expensive to acquire. Hence, pretrai...
research
01/19/2021

The Unreasonable Effectiveness of Patches in Deep Convolutional Kernels Methods

A recent line of work showed that various forms of convolutional kernel ...
research
09/27/2021

PASS: An ImageNet replacement for self-supervised pretraining without humans

Computer vision has long relied on ImageNet and other large datasets of ...

Please sign up or login with your details

Forgot password? Click here to reset