VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation

06/17/2022
by   Kaizhi Zheng, et al.
0

Benefiting from language flexibility and compositionality, humans naturally intend to use language to command an embodied agent for complex tasks such as navigation and object manipulation. In this work, we aim to fill the blank of the last mile of embodied agents – object manipulation by following human guidance, e.g., "move the red mug next to the box while keeping it upright." To this end, we introduce an Automatic Manipulation Solver (AMSolver) simulator and build a Vision-and-Language Manipulation benchmark (VLMbench) based on it, containing various language instructions on categorized robotic manipulation tasks. Specifically, modular rule-based task templates are created to automatically generate robot demonstrations with language instructions, consisting of diverse object shapes and appearances, action types, and motion constraints. We also develop a keypoint-based model 6D-CLIPort to deal with multi-view observations and language input and output a sequence of 6 degrees of freedom (DoF) actions. We hope the new simulator and benchmark will facilitate future research on language-guided robotic manipulation.

READ FULL TEXT

page 2

page 4

page 5

page 7

page 17

page 18

page 19

research
06/20/2023

RM-PRT: Realistic Robotic Manipulation Simulator and Benchmark with Progressive Reasoning Tasks

Recently, the advent of pre-trained large-scale language models (LLMs) l...
research
08/02/2023

LEMMA: Learning Language-Conditioned Multi-Robot Manipulation

Complex manipulation tasks often require robots with complementary capab...
research
01/19/2021

A modular vision language navigation and manipulation framework for long horizon compositional tasks in indoor environment

In this paper we propose a new framework - MoViLan (Modular Vision and L...
research
12/06/2021

CALVIN: A Benchmark for Language-conditioned Policy Learning for Long-horizon Robot Manipulation Tasks

General-purpose robots coexisting with humans in their environment must ...
research
06/17/2023

MO-VLN: A Multi-Task Benchmark for Open-set Zero-Shot Vision-and-Language Navigation

Given a natural language, a general robot has to comprehend the instruct...
research
08/28/2021

DASH: Modularized Human Manipulation Simulation with Vision and Language for Embodied AI

Creating virtual humans with embodied, human-like perceptual and actuati...
research
09/02/2023

Developmental Scaffolding with Large Language Models

Exploratoration and self-observation are key mechanisms of infant sensor...

Please sign up or login with your details

Forgot password? Click here to reset