Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks

08/13/2023
by   David Junhao Zhang, et al.
0

Despite the rapid advancement of unsupervised learning in visual representation, it requires training on large-scale datasets that demand costly data collection, and pose additional challenges due to concerns regarding data privacy. Recently, synthetic images generated by text-to-image diffusion models, have shown great potential for benefiting image recognition. Although promising, there has been inadequate exploration dedicated to unsupervised learning on diffusion-generated images. To address this, we start by uncovering that diffusion models' cross-attention layers inherently provide annotation-free attention masks aligned with corresponding text inputs on generated images. We then investigate the problems of three prevalent unsupervised learning techniques ( i.e., contrastive learning, masked modeling, and vision-language pretraining) and introduce customized solutions by fully exploiting the aforementioned free attention masks. Our approach is validated through extensive experiments that show consistent improvements in baseline models across various downstream tasks, including image classification, detection, segmentation, and image-text retrieval. By utilizing our method, it is possible to close the performance gap between unsupervised pretraining on synthetic data and real-world scenarios.

READ FULL TEXT

page 2

page 3

page 6

page 10

research
06/01/2023

StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners

We investigate the potential of learning visual representations using sy...
research
03/21/2023

DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models

Collecting and annotating images with pixel-wise labels is time-consumin...
research
03/22/2023

CLIP^2: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data

Contrastive Language-Image Pre-training, benefiting from large-scale unl...
research
09/23/2018

Learning to Read by Spelling: Towards Unsupervised Text Recognition

This work presents a method for visual text recognition without using an...
research
09/27/2022

FreeSeg: Free Mask from Interpretable Contrastive Language-Image Pretraining for Semantic Segmentation

Fully supervised semantic segmentation learns from dense masks, which re...
research
01/23/2023

RainDiffusion:When Unsupervised Learning Meets Diffusion Models for Real-world Image Deraining

What will happen when unsupervised learning meets diffusion models for r...
research
06/23/2023

DiffInfinite: Large Mask-Image Synthesis via Parallel Random Patch Diffusion in Histopathology

We present DiffInfinite, a hierarchical diffusion model that generates a...

Please sign up or login with your details

Forgot password? Click here to reset