Compositional Networks Enable Systematic Generalization for Grounded Language Understanding

by   Yen-Ling Kuo, et al.

Humans are remarkably flexible when understanding new sentences that include combinations of concepts they have never encountered before. Recent work has shown that while deep networks can mimic some human language abilities when presented with novel sentences, systematic variation uncovers the limitations in the language-understanding abilities of neural networks. We demonstrate that these limitations can be overcome by addressing the generalization challenges in a recently-released dataset, gSCAN, which explicitly measures how well a robotic agent is able to interpret novel ideas grounded in vision, e.g., novel pairings of adjectives and nouns. The key principle we employ is compositionality: that the compositional structure of networks should reflect the compositional structure of the problem domain they address, while allowing all other parameters and properties to be learned end-to-end with weak supervision. We build a general-purpose mechanism that enables robots to generalize their language understanding to compositional domains. Crucially, our base network has the same state-of-the-art performance as prior work, 97 execution accuracy, while at the same time generalizing its knowledge when prior work does not; for example, achieving 95 adjective-noun compositions where previous work has 55 Robust language understanding without dramatic failures and without corner causes is critical to building safe and fair robots; we demonstrate the significant role that compositionality can play in achieving that goal.



There are no comments yet.


page 1

page 2

page 3

page 4


A Benchmark for Systematic Generalization in Grounded Language Understanding

Human language users easily interpret expressions that describe unfamili...

Systematic Generalization: What Is Required and Can It Be Learned?

Numerous models for grounded language understanding have been recently p...

Think before you act: A simple baseline for compositional generalization

Contrarily to humans who have the ability to recombine familiar expressi...

ShapeWorld - A new test methodology for multimodal language understanding

We introduce a novel framework for evaluating multimodal deep learning m...

Rearranging the Familiar: Testing Compositional Generalization in Recurrent Networks

Systematic compositionality is the ability to recombine meaningful units...

Systematic Generalization on gSCAN: What is Nearly Solved and What is Next?

We analyze the grounded SCAN (gSCAN) benchmark, which was recently propo...

Dyna-bAbI: unlocking bAbI's potential with dynamic synthetic benchmarking

While neural language models often perform surprisingly well on natural ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.