A Picture is Worth a Thousand Words: Language Models Plan from Pixels

03/16/2023
by   Anthony Z. Liu, et al.
0

Planning is an important capability of artificial agents that perform long-horizon tasks in real-world environments. In this work, we explore the use of pre-trained language models (PLMs) to reason about plan sequences from text instructions in embodied visual environments. Prior PLM based approaches for planning either assume observations are available in the form of text (e.g., provided by a captioning model), reason about plans from the instruction alone, or incorporate information about the visual environment in limited ways (such as a pre-trained affordance function). In contrast, we show that PLMs can accurately plan even when observations are directly encoded as input prompts for the PLM. We show that this simple approach outperforms prior approaches in experiments on the ALFWorld and VirtualHome benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2022

Few-shot Subgoal Planning with Language Models

Pre-trained large language models have shown successful progress in many...
research
10/24/2022

Instruction-Following Agents with Jointly Pre-Trained Vision-Language Models

Humans are excellent at understanding language and vision to accomplish ...
research
07/04/2023

Embodied Task Planning with Large Language Models

Equipping embodied agents with commonsense is important for robots to su...
research
05/03/2023

Plan, Eliminate, and Track – Language Models are Good Teachers for Embodied Agents

Pre-trained large language models (LLMs) capture procedural knowledge ab...
research
06/09/2023

Embodied Executable Policy Learning with Language-based Scene Summarization

Large Language models (LLMs) have shown remarkable success in assisting ...
research
07/24/2023

A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

Pre-trained large language models (LLMs) have recently achieved better g...
research
06/21/2023

Improving Long-Horizon Imitation Through Instruction Prediction

Complex, long-horizon planning and its combinatorial nature pose steep c...

Please sign up or login with your details

Forgot password? Click here to reset