DenseDINO: Boosting Dense Self-Supervised Learning with Token-Based Point-Level Consistency

06/06/2023
by   Yike Yuan, et al.
0

In this paper, we propose a simple yet effective transformer framework for self-supervised learning called DenseDINO to learn dense visual representations. To exploit the spatial information that the dense prediction tasks require but neglected by the existing self-supervised transformers, we introduce point-level supervision across views in a novel token-based way. Specifically, DenseDINO introduces some extra input tokens called reference tokens to match the point-level features with the position prior. With the reference token, the model could maintain spatial consistency and deal with multi-object complex scene images, thus generalizing better on dense prediction tasks. Compared with the vanilla DINO, our approach obtains competitive performance when evaluated on classification in ImageNet and achieves a large margin (+7.2 linear probing protocol for segmentation.

READ FULL TEXT

page 1

page 3

page 5

page 7

research
06/10/2021

MST: Masked Self-Supervised Transformer for Visual Representation

Transformer has been widely used for self-supervised pre-training in Nat...
research
11/21/2021

HoughCL: Finding Better Positive Pairs in Dense Self-supervised Learning

Recently, self-supervised methods show remarkable achievements in image-...
research
06/06/2022

Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning

Vision Transformers (ViTs) and their multi-scale and hierarchical variat...
research
03/23/2022

What to Hide from Your Students: Attention-Guided Masked Image Modeling

Transformers and masked language modeling are quickly being adopted and ...
research
07/28/2017

Putting Self-Supervised Token Embedding on the Tables

Information distribution by electronic messages is a privileged means of...
research
06/08/2023

Improving Visual Prompt Tuning for Self-supervised Vision Transformers

Visual Prompt Tuning (VPT) is an effective tuning method for adapting pr...
research
07/31/2023

Predicting masked tokens in stochastic locations improves masked image modeling

Self-supervised learning is a promising paradigm in deep learning that e...

Please sign up or login with your details

Forgot password? Click here to reset