DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics

10/05/2022
by   Ivan Kapelyukh, et al.
0

We introduce the first work to explore web-scale diffusion models for robotics. DALL-E-Bot enables a robot to rearrange objects in a scene, by first inferring a text description of those objects, then generating an image representing a natural, human-like arrangement of those objects, and finally physically arranging the objects according to that image. The significance is that we achieve this zero-shot using DALL-E, without needing any further data collection or training. Encouraging real-world results with human studies show that this is an exciting direction for the future of web-scale robot learning algorithms. We also propose a list of recommendations to the text-to-image community, to align further developments of these models with applications to robotics. Videos are available at: https://www.robot-learning.uk/dall-e-bot

READ FULL TEXT

page 1

page 3

page 5

research
02/22/2023

Scaling Robot Learning with Semantically Imagined Experience

Recent advances in robot learning have shown promise in enabling robots ...
research
03/30/2023

Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models

Large-scale text-to-image diffusion models achieve unprecedented success...
research
09/12/2022

Leveraging Large Language Models for Robot 3D Scene Understanding

Semantic 3D scene understanding is a problem of critical importance in r...
research
11/04/2021

A System for General In-Hand Object Re-Orientation

In-hand object reorientation has been a challenging problem in robotics ...
research
10/16/2022

LAION-5B: An open large-scale dataset for training next generation image-text models

Groundbreaking language-vision architectures like CLIP and DALL-E proved...
research
01/20/2023

Robot Skill Learning Via Classical Robotics-Based Generated Datasets: Advantages, Disadvantages, and Future Improvement

Why do we not profit from our long-existing classical robotics knowledge...
research
06/09/2022

Extracting Zero-shot Common Sense from Large Language Models for Robot 3D Scene Understanding

Semantic 3D scene understanding is a problem of critical importance in r...

Please sign up or login with your details

Forgot password? Click here to reset