StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners

06/01/2023
by   Yonglong Tian, et al.
0

We investigate the potential of learning visual representations using synthetic images generated by text-to-image models. This is a natural question in the light of the excellent performance of such models in generating high-quality images. We consider specifically the Stable Diffusion, one of the leading open source text-to-image models. We show that (1) when the generative model is configured with proper classifier-free guidance scale, training self-supervised methods on synthetic images can match or beat the real image counterpart; (2) by treating the multiple images generated from the same text prompt as positives for each other, we develop a multi-positive contrastive learning method, which we call StableRep. With solely synthetic images, the representations learned by StableRep surpass the performance of representations learned by SimCLR and CLIP using the same set of text prompts and corresponding real images, on large scale datasets. When we further add language supervision, StableRep trained with 20M synthetic images achieves better accuracy than CLIP trained with 50M real images.

READ FULL TEXT

page 1

page 9

page 19

research
06/01/2023

Diffusion Self-Guidance for Controllable Image Generation

Large-scale generative models are capable of producing high-quality imag...
research
08/13/2023

Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks

Despite the rapid advancement of unsupervised learning in visual represe...
research
02/18/2023

Closed-Loop Transcription via Convolutional Sparse Coding

Autoencoding has achieved great empirical success as a framework for lea...
research
07/07/2022

Back to the Basics: Revisiting Out-of-Distribution Detection Baselines

We study simple methods for out-of-distribution (OOD) image detection th...
research
07/01/2022

Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition

Existing text recognition methods usually need large-scale training data...
research
11/29/2022

Procedural Image Programs for Representation Learning

Learning image representations using synthetic data allows training neur...
research
08/21/2018

Text-to-image Synthesis via Symmetrical Distillation Networks

Text-to-image synthesis aims to automatically generate images according ...

Please sign up or login with your details

Forgot password? Click here to reset