Learning to Model the World with Language

07/31/2023
by   Jessy Lin, et al.
0

To interact with humans in the world, agents need to understand the diverse types of language that people use, relate them to the visual world, and act based on them. While current agents learn to execute simple language instructions from task rewards, we aim to build agents that leverage diverse language that conveys general knowledge, describes the state of the world, provides interactive feedback, and more. Our key idea is that language helps agents predict the future: what will be observed, how the world will behave, and which situations will be rewarded. This perspective unifies language understanding with future prediction as a powerful self-supervised learning objective. We present Dynalang, an agent that learns a multimodal world model that predicts future text and image representations and learns to act from imagined model rollouts. Unlike traditional agents that use language only to predict actions, Dynalang acquires rich language understanding by using past language also to predict future language, video, and rewards. In addition to learning from online interaction in an environment, Dynalang can be pretrained on datasets of text, video, or both without actions or rewards. From using language hints in grid worlds to navigating photorealistic scans of homes, Dynalang utilizes diverse types of language to improve task performance, including environment descriptions, game rules, and instructions.

READ FULL TEXT

page 3

page 8

page 9

page 10

research
07/04/2022

WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents

Existing benchmarks for grounding language in interactive environments e...
research
06/20/2017

Grounded Language Learning in a Simulated 3D World

We are increasingly surrounded by artificially intelligent technology th...
research
03/25/2021

Reading and Acting while Blindfolded: The Need for Semantics in Text Game Agents

Text-based games simulate worlds and interact with players using natural...
research
08/14/2019

Mastering emergent language: learning to guide in simulated navigation

To cooperate with humans effectively, virtual agents need to be able to ...
research
07/10/2023

On the Computational Modeling of Meaning: Embodied Cognition Intertwined with Emotion

This document chronicles this author's attempt to explore how words come...
research
02/10/2023

Long-Context Language Decision Transformers and Exponential Tilt for Interactive Text Environments

Text-based game environments are challenging because agents must deal wi...
research
05/19/2023

Neural Foundations of Mental Simulation: Future Prediction of Latent Representations on Dynamic Scenes

Humans and animals have a rich and flexible understanding of the physica...

Please sign up or login with your details

Forgot password? Click here to reset