ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers

05/24/2023
by   Jingfeng Yao, et al.
0

Recently, plain vision Transformers (ViTs) have shown impressive performance on various computer vision tasks, thanks to their strong modeling capacity and large-scale pretraining. However, they have not yet conquered the problem of image matting. We hypothesize that image matting could also be boosted by ViTs and present a new efficient and robust ViT-based matting system, named ViTMatte. Our method utilizes (i) a hybrid attention mechanism combined with a convolution neck to help ViTs achieve an excellent performance-computation trade-off in matting tasks. (ii) Additionally, we introduce the detail capture module, which just consists of simple lightweight convolutions to complement the detailed information required by matting. To the best of our knowledge, ViTMatte is the first work to unleash the potential of ViT on image matting with concise adaptation. It inherits many superior properties from ViT to matting, including various pretraining strategies, concise architecture design, and flexible inference strategies. We evaluate ViTMatte on Composition-1k and Distinctions-646, the most commonly used benchmark for image matting, our method achieves state-of-the-art performance and outperforms prior matting works by a large margin.

READ FULL TEXT

page 1

page 4

page 6

page 8

page 11

research
09/16/2022

ConvFormer: Closing the Gap Between CNN and Vision Transformers

Vision transformers have shown excellent performance in computer vision ...
research
07/21/2022

TinyViT: Fast Pretraining Distillation for Small Vision Transformers

Vision transformer (ViT) recently has drawn great attention in computer ...
research
03/31/2023

LaCViT: A Label-aware Contrastive Training Framework for Vision Transformers

Vision Transformers have been incredibly effective when tackling compute...
research
04/13/2021

Co-Scale Conv-Attentional Image Transformers

In this paper, we present Co-scale conv-attentional image Transformers (...
research
10/19/2022

A Unified View of Masked Image Modeling

Masked image modeling has demonstrated great potential to eliminate the ...
research
10/20/2022

SimpleClick: Interactive Image Segmentation with Simple Vision Transformers

Click-based interactive image segmentation aims at extracting objects wi...
research
12/15/2022

Vision Transformers are Parameter-Efficient Audio-Visual Learners

Vision transformers (ViTs) have achieved impressive results on various c...

Please sign up or login with your details

Forgot password? Click here to reset