PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World

06/01/2021
by   Rowan Zellers, et al.
0

We propose PIGLeT: a model that learns physical commonsense knowledge through interaction, and then uses this knowledge to ground language. We factorize PIGLeT into a physical dynamics model, and a separate language model. Our dynamics model learns not just what objects are but also what they do: glass cups break when thrown, plastic ones don't. We then use it as the interface to our language model, giving us a unified model of linguistic form and grounded meaning. PIGLeT can read a sentence, simulate neurally what might happen next, and then communicate that result through a literal symbolic representation, or natural language. Experimental results show that our model effectively learns world dynamics, along with how to communicate them. It is able to correctly forecast "what happens next" given an English sentence over 80 100x larger, text-to-text approach by over 10 summaries of physical interactions are also judged by humans as more accurate than LM alternatives. We present comprehensive analysis showing room for future work.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 8

page 13

11/01/2018

Understanding Learning Dynamics Of Language Models with SVCCA

Recent work has demonstrated that neural language models encode linguist...
11/13/2021

Explainable Semantic Space by Grounding Language to Vision with Cross-Modal Contrastive Learning

In natural language processing, most models try to learn semantic repres...
10/15/2020

Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs

Natural language rationales could provide intuitive, higher-level explan...
10/20/2021

SILG: The Multi-environment Symbolic Interactive Language Grounding Benchmark

Existing work in language grounding typically study single environments....
06/20/2017

Grounded Language Learning in a Simulated 3D World

We are increasingly surrounded by artificially intelligent technology th...
10/11/2016

From phonemes to images: levels of representation in a recurrent neural model of visually-grounded language learning

We present a model of visually-grounded language learning based on stack...
05/05/2021

TANGO: Commonsense Generalization in Predicting Tool Interactions for Mobile Manipulators

Robots assisting us in factories or homes must learn to make use of obje...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.