Interactive Image Segmentation with Cross-Modality Vision Transformers

07/05/2023
by   Kun Li, et al.
0

Interactive image segmentation aims to segment the target from the background with the manual guidance, which takes as input multimodal data such as images, clicks, scribbles, and bounding boxes. Recently, vision transformers have achieved a great success in several downstream visual tasks, and a few efforts have been made to bring this powerful architecture to interactive segmentation task. However, the previous works neglect the relations between two modalities and directly mock the way of processing purely visual information with self-attentions. In this paper, we propose a simple yet effective network for click-based interactive segmentation with cross-modality vision transformers. Cross-modality transformers exploits mutual information to better guide the learning process. The experiments on several benchmarks show that the proposed method achieves superior performance in comparison to the previous state-of-the-art models. The stability of our method in term of avoiding failure cases shows its potential to be a practical annotation tool. The code and pretrained models will be released under https://github.com/lik1996/iCMFormer.

READ FULL TEXT

page 1

page 3

page 7

page 8

page 12

page 13

page 14

page 15

research
12/21/2021

iSegFormer: Interactive Image Segmentation with Transformers

We propose iSegFormer, a novel transformer-based approach for interactiv...
research
10/20/2022

SimpleClick: Interactive Image Segmentation with Simple Vision Transformers

Click-based interactive image segmentation aims at extracting objects wi...
research
09/20/2021

EdgeFlow: Achieving Practical Interactive Segmentation with Edge-Guided Flow

High-quality training data play a key role in image segmentation tasks. ...
research
03/20/2022

simCrossTrans: A Simple Cross-Modality Transfer Learning for Object Detection with ConvNets or Vision Transformers

Transfer learning is widely used in computer vision (CV), natural langua...
research
07/15/2017

Dominant Sets for "Constrained" Image Segmentation

Image segmentation has come a long way since the early days of computer ...
research
12/16/2022

Autoencoders as Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning?

The success of deep learning heavily relies on large-scale data with com...
research
06/11/2023

VPUFormer: Visual Prompt Unified Transformer for Interactive Image Segmentation

The integration of diverse visual prompts like clicks, scribbles, and bo...

Please sign up or login with your details

Forgot password? Click here to reset