Visually-Grounded Planning without Vision: Language Models Infer Detailed Plans from High-level Instructions

09/29/2020
by   Peter A. Jansen, et al.
0

The recently proposed ALFRED challenge task aims for a virtual robotic agent to complete complex multi-step everyday tasks in a virtual home environment from high-level natural language directives, such as "put a hot piece of bread on a plate". Currently, the best-performing models are able to complete less than 5 translation problem of converting natural language directives into detailed multi-step sequences of actions that accomplish those goals in the virtual environment. We empirically demonstrate that it is possible to generate gold multi-step plans from language directives alone without any visual input in 26 of unseen cases. When a small amount of visual information is incorporated, namely the starting location in the virtual environment, our best-performing GPT-2 model successfully generates gold command sequences in 58 results suggest that contextualized language models may provide strong visual semantic planning modules for grounded virtual agents.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

research
12/08/2022

LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models

This study focuses on embodied agents that can follow natural language i...
research
07/04/2023

Embodied Task Planning with Large Language Models

Equipping embodied agents with commonsense is important for robots to su...
research
12/03/2019

ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

We present ALFRED (Action Learning From Realistic Environments and Direc...
research
10/08/2020

ALFWorld: Aligning Text and Embodied Environments for Interactive Learning

Given a simple request (e.g., Put a washed apple in the kitchen fridge),...
research
01/18/2022

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

Can world knowledge learned by large language models (LLMs) be used to a...
research
10/04/2021

Skill Induction and Planning with Latent Language

We present a framework for learning hierarchical policies from demonstra...
research
09/18/2023

SMART-LLM: Smart Multi-Agent Robot Task Planning using Large Language Models

In this work, we introduce SMART-LLM, an innovative framework designed f...

Please sign up or login with your details

Forgot password? Click here to reset