PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World

06/01/2021
by   Rowan Zellers, et al.
0

We propose PIGLeT: a model that learns physical commonsense knowledge through interaction, and then uses this knowledge to ground language. We factorize PIGLeT into a physical dynamics model, and a separate language model. Our dynamics model learns not just what objects are but also what they do: glass cups break when thrown, plastic ones don't. We then use it as the interface to our language model, giving us a unified model of linguistic form and grounded meaning. PIGLeT can read a sentence, simulate neurally what might happen next, and then communicate that result through a literal symbolic representation, or natural language. Experimental results show that our model effectively learns world dynamics, along with how to communicate them. It is able to correctly forecast "what happens next" given an English sentence over 80 100x larger, text-to-text approach by over 10 summaries of physical interactions are also judged by humans as more accurate than LM alternatives. We present comprehensive analysis showing room for future work.

READ FULL TEXT

page 1

page 8

page 13

research
11/01/2018

Understanding Learning Dynamics Of Language Models with SVCCA

Recent work has demonstrated that neural language models encode linguist...
research
11/13/2021

Explainable Semantic Space by Grounding Language to Vision with Cross-Modal Contrastive Learning

In natural language processing, most models try to learn semantic repres...
research
10/15/2020

Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs

Natural language rationales could provide intuitive, higher-level explan...
research
10/20/2021

SILG: The Multi-environment Symbolic Interactive Language Grounding Benchmark

Existing work in language grounding typically study single environments....
research
03/01/2023

Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control

Recent progress in large language models (LLMs) has demonstrated the abi...
research
02/16/2023

What A Situated Language-Using Agent Must be Able to Do: A Top-Down Analysis

Even in our increasingly text-intensive times, the primary site of langu...
research
10/11/2016

From phonemes to images: levels of representation in a recurrent neural model of visually-grounded language learning

We present a model of visually-grounded language learning based on stack...

Please sign up or login with your details

Forgot password? Click here to reset