Beyond Generation: Harnessing Text to Image Models for Object Detection and Segmentation

09/12/2023
by   Yunhao Ge, et al.
0

We propose a new paradigm to automatically generate training data with accurate labels at scale using the text-to-image synthesis frameworks (e.g., DALL-E, Stable Diffusion, etc.). The proposed approach1 decouples training data generation into foreground object generation, and contextually coherent background generation. To generate foreground objects, we employ a straightforward textual template, incorporating the object class name as input prompts. This is fed into a text-to-image synthesis framework, producing various foreground images set against isolated backgrounds. A foreground-background segmentation algorithm is then used to generate foreground object masks. To generate context images, we begin by creating language descriptions of the context. This is achieved by applying an image captioning method to a small set of images representing the desired context. These textual descriptions are then transformed into a diverse array of context images via a text-to-image synthesis framework. Subsequently, we composite these with the foreground object masks produced in the initial step, utilizing a cut-and-paste method, to formulate the training data. We demonstrate the advantages of our approach on five object detection and segmentation datasets, including Pascal VOC and COCO. We found that detectors trained solely on synthetic data produced by our method achieve performance comparable to those trained on real data (Fig. 1). Moreover, a combination of real and synthetic data yields even much better results. Further analysis indicates that the synthetic data distribution complements the real data distribution effectively. Additionally, we emphasize the compositional nature of our data generation approach in out-of-distribution and zero-shot data generation scenarios. We open-source our code at https://github.com/gyhandy/Text2Image-for-Detection

READ FULL TEXT

page 17

page 18

page 19

page 20

page 21

page 22

page 23

page 24

research
06/20/2022

DALL-E for Detection: Language-driven Context Image Synthesis for Object Detection

Object cut-and-paste has become a promising approach to efficiently gene...
research
03/26/2023

ZBS: Zero-shot Background Subtraction via Instance-level Background Modeling and Foreground Selection

Background subtraction (BGS) aims to extract all moving objects in the v...
research
08/22/2022

FurryGAN: High Quality Foreground-aware Image Synthesis

Foreground-aware image synthesis aims to generate images as well as thei...
research
08/19/2023

ControlCom: Controllable Image Composition using Diffusion Model

Image composition targets at synthesizing a realistic composite image fr...
research
08/19/2023

ASPIRE: Language-Guided Augmentation for Robust Image Classification

Neural image classifiers can often learn to make predictions by overly r...
research
07/13/2019

SynthText3D: Synthesizing Scene Text Images from 3D Virtual Worlds

With the development of deep neural networks, the demand for a significa...
research
12/20/2021

DeePaste – Inpainting for Pasting

One of the challenges of supervised learning training is the need to pro...

Please sign up or login with your details

Forgot password? Click here to reset