DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models

03/21/2023
by   Weijia Wu, et al.
0

Collecting and annotating images with pixel-wise labels is time-consuming and laborious. In contrast, synthetic data can be freely available using a generative model (e.g., DALL-E, Stable Diffusion). In this paper, we show that it is possible to automatically obtain accurate semantic masks of synthetic images generated by the Off-the-shelf Stable Diffusion model, which uses only text-image pairs during training. Our approach, called DiffuMask, exploits the potential of the cross-attention map between text and image, which is natural and seamless to extend the text-driven image synthesis to semantic mask generation. DiffuMask uses text-guided cross-attention information to localize class/word-specific regions, which are combined with practical techniques to create a novel high-resolution and class-discriminative pixel-wise mask. The methods help to reduce data collection and annotation costs obviously. Experiments demonstrate that the existing segmentation methods trained on synthetic data of DiffuMask can achieve a competitive performance over the counterpart of real data (VOC 2012, Cityscapes). For some classes (e.g., bird), DiffuMask presents promising performance, close to the stateof-the-art result of real data (within 3 segmentation (zero-shot) setting, DiffuMask achieves a new SOTA result on Unseen class of VOC 2012. The project website can be found at https://weijiawu.github.io/DiffusionMask/.

READ FULL TEXT

page 1

page 2

page 5

page 6

page 9

page 11

research
08/11/2023

DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models

Current deep networks are very data-hungry and benefit from training on ...
research
09/04/2023

Attention as Annotation: Generating Images and Pseudo-masks for Weakly Supervised Semantic Segmentation with Diffusion

Although recent advancements in diffusion models enabled high-fidelity a...
research
08/13/2023

Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks

Despite the rapid advancement of unsupervised learning in visual represe...
research
07/01/2023

All-in-SAM: from Weak Annotation to Pixel-wise Nuclei Segmentation with Prompt-based Finetuning

The Segment Anything Model (SAM) is a recently proposed prompt-based seg...
research
10/25/2022

Synthetic Data Supervised Salient Object Detection

Although deep salient object detection (SOD) has achieved remarkable pro...
research
10/10/2022

What the DAAM: Interpreting Stable Diffusion Using Cross Attention

Large-scale diffusion neural networks represent a substantial milestone ...
research
04/19/2022

Dual-Domain Image Synthesis using Segmentation-Guided GAN

We introduce a segmentation-guided approach to synthesise images that in...

Please sign up or login with your details

Forgot password? Click here to reset