VPUFormer: Visual Prompt Unified Transformer for Interactive Image Segmentation

06/11/2023
by   Xu Zhang, et al.
0

The integration of diverse visual prompts like clicks, scribbles, and boxes in interactive image segmentation could significantly facilitate user interaction as well as improve interaction efficiency. Most existing studies focus on a single type of visual prompt by simply concatenating prompts and images as input for segmentation prediction, which suffers from low-efficiency prompt representation and weak interaction issues. This paper proposes a simple yet effective Visual Prompt Unified Transformer (VPUFormer), which introduces a concise unified prompt representation with deeper interaction to boost the segmentation performance. Specifically, we design a Prompt-unified Encoder (PuE) by using Gaussian mapping to generate a unified one-dimensional vector for click, box, and scribble prompts, which well captures users' intentions as well as provides a denser representation of user prompts. In addition, we present a Prompt-to-Pixel Contrastive Loss (P2CL) that leverages user feedback to gradually refine candidate semantic features, aiming to bring image semantic features closer to the features that are similar to the user prompt, while pushing away those image semantic features that are dissimilar to the user prompt, thereby correcting results that deviate from expectations. On this basis, our approach injects prompt representations as queries into Dual-cross Merging Attention (DMA) blocks to perform a deeper interaction between image and query inputs. A comprehensive variety of experiments on seven challenging datasets demonstrates that the proposed VPUFormer with PuE, DMA, and P2CL achieves consistent improvements, yielding state-of-the-art segmentation performance. Our code will be made publicly available at https://github.com/XuZhang1211/VPUFormer.

READ FULL TEXT

page 1

page 4

page 5

page 9

page 11

research
03/21/2023

Focused and Collaborative Feedback Integration for Interactive Image Segmentation

Interactive image segmentation aims at obtaining a segmentation mask for...
research
06/28/2021

Multi-Compound Transformer for Accurate Biomedical Image Segmentation

The recent vision transformer(i.e.for image classification) learns non-l...
research
03/09/2023

CFR-ICL: Cascade-Forward Refinement with Iterative Click Loss for Interactive Image Segmentation

The click-based interactive segmentation aims to extract the object of i...
research
05/07/2023

AdaptiveClick: Clicks-aware Transformer with Adaptive Focal Loss for Interactive Image Segmentation

Interactive Image Segmentation (IIS) has emerged as a promising techniqu...
research
04/06/2023

InterFormer: Real-time Interactive Image Segmentation

Interactive image segmentation enables annotators to efficiently perform...
research
07/05/2023

Interactive Image Segmentation with Cross-Modality Vision Transformers

Interactive image segmentation aims to segment the target from the backg...
research
12/18/2018

SwipeCut: Interactive Segmentation with Diversified Seed Proposals

Interactive image segmentation algorithms rely on the user to provide an...

Please sign up or login with your details

Forgot password? Click here to reset