Attention as Annotation: Generating Images and Pseudo-masks for Weakly Supervised Semantic Segmentation with Diffusion

09/04/2023
by   Ryota Yoshihashi, et al.
0

Although recent advancements in diffusion models enabled high-fidelity and diverse image generation, training of discriminative models largely depends on collections of massive real images and their manual annotation. Here, we present a training method for semantic segmentation that neither relies on real images nor manual annotation. The proposed method attn2mask utilizes images generated by a text-to-image diffusion model in combination with its internal text-to-image cross-attention as supervisory pseudo-masks. Since the text-to-image generator is trained with image-caption pairs but without pixel-wise labels, attn2mask can be regarded as a weakly supervised segmentation method overall. Experiments show that attn2mask achieves promising results in PASCAL VOC for not using real training data for segmentation at all, and it is also useful to scale up segmentation to a more-class scenario, i.e., ImageNet segmentation. It also shows adaptation ability with LoRA-based fine-tuning, which enables the transfer to a distant domain i.e., Cityscapes.

READ FULL TEXT
research
03/07/2018

Decoupled Spatial Neural Attention for Weakly Supervised Semantic Segmentation

Weakly supervised semantic segmentation receives much research attention...
research
09/08/2023

From Text to Mask: Localizing Entities Using the Attention of Text-to-Image Diffusion Models

Diffusion models have revolted the field of text-to-image generation rec...
research
03/27/2021

Few-shot Semantic Image Synthesis Using StyleGAN Prior

This paper tackles a challenging problem of generating photorealistic im...
research
03/21/2023

DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models

Collecting and annotating images with pixel-wise labels is time-consumin...
research
05/10/2022

Weakly-supervised segmentation of referring expressions

Visual grounding localizes regions (boxes or segments) in the image corr...
research
06/23/2023

DiffInfinite: Large Mask-Image Synthesis via Parallel Random Patch Diffusion in Histopathology

We present DiffInfinite, a hierarchical diffusion model that generates a...
research
10/10/2022

What the DAAM: Interpreting Stable Diffusion Using Cross Attention

Large-scale diffusion neural networks represent a substantial milestone ...

Please sign up or login with your details

Forgot password? Click here to reset