Can Pre-Trained Text-to-Image Models Generate Visual Goals for Reinforcement Learning?

07/15/2023
by   Jialu Gao, et al.
0

Pre-trained text-to-image generative models can produce diverse, semantically rich, and realistic images from natural language descriptions. Compared with language, images usually convey information with more details and less ambiguity. In this study, we propose Learning from the Void (LfVoid), a method that leverages the power of pre-trained text-to-image models and advanced image editing techniques to guide robot learning. Given natural language instructions, LfVoid can edit the original observations to obtain goal images, such as "wiping" a stain off a table. Subsequently, LfVoid trains an ensembled goal discriminator on the generated image to provide reward signals for a reinforcement learning agent, guiding it to achieve the goal. The ability of LfVoid to learn with zero in-domain training on expert demonstrations or true goal observations (the void) is attributed to the utilization of knowledge from web-scale generative models. We evaluate LfVoid across three simulated tasks and validate its feasibility in the corresponding real-world scenarios. In addition, we offer insights into the key considerations for the effective integration of visual generative models into robot learning workflows. We posit that our work represents an initial step towards the broader application of pre-trained visual generative models in the robotics field. Our project page: https://lfvoid-rl.github.io/.

READ FULL TEXT

page 7

page 19

page 20

page 21

page 22

page 23

page 24

page 25

research
11/01/2022

Learning to Solve Voxel Building Embodied Tasks from Pixels and Natural Language Instructions

The adoption of pre-trained language models to generate action plans for...
research
09/19/2023

Guide Your Agent with Adaptive Multimodal Rewards

Developing an agent capable of adapting to unseen environments remains a...
research
06/13/2023

Paste, Inpaint and Harmonize via Denoising: Subject-Driven Image Editing with Pre-Trained Diffusion Model

Text-to-image generative models have attracted rising attention for flex...
research
11/21/2022

Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task

Benefiting from large-scale datasets and pre-trained models, the field o...
research
10/02/2019

Unsupervised Doodling and Painting with Improved SPIRAL

We investigate using reinforcement learning agents as generative models ...
research
06/08/2023

Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models

The advent of large pre-trained models has brought about a paradigm shif...
research
11/17/2022

Is the Elephant Flying? Resolving Ambiguities in Text-to-Image Generative Models

Natural language often contains ambiguities that can lead to misinterpre...

Please sign up or login with your details

Forgot password? Click here to reset