Compositional Learning of Visually-Grounded Concepts Using Reinforcement

09/08/2023
by   Zijun Lin, et al.
0

Deep reinforcement learning agents need to be trained over millions of episodes to decently solve navigation tasks grounded to instructions. Furthermore, their ability to generalize to novel combinations of instructions is unclear. Interestingly however, children can decompose language-based instructions and navigate to the referred object, even if they have not seen the combination of queries prior. Hence, we created three 3D environments to investigate how deep RL agents learn and compose color-shape based combinatorial instructions to solve novel combinations in a spatial navigation task. First, we explore if agents can perform compositional learning, and whether they can leverage on frozen text encoders (e.g. CLIP, BERT) to learn word combinations in fewer episodes. Next, we demonstrate that when agents are pretrained on the shape or color concepts separately, they show a 20 times decrease in training episodes needed to solve unseen combinations of instructions. Lastly, we show that agents pretrained on concept and compositional learning achieve significantly higher reward when evaluated zero-shot on novel color-shape1-shape2 visual object combinations. Overall, our results highlight the foundations needed to increase an agent's proficiency in composing word groups through reinforcement learning and its ability for zero-shot generalization to new combinations.

READ FULL TEXT

page 3

page 13

page 14

research
06/15/2017

Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning

As a step towards developing zero-shot task generalization capabilities ...
research
01/13/2020

Exploiting Language Instructions for Interpretable and Compositional Reinforcement Learning

In this work, we present an alternative approach to making an agent comp...
research
12/09/2020

Infinite use of finite means: Zero-Shot Generalization using Compositional Emergent Protocols

Human language has been described as a system that makes use of finite m...
research
05/11/2021

Zero-Shot Generalization using Intrinsically Motivated Compositional Emergent Protocols

Human language has been described as a system that makes use of finite m...
research
10/09/2021

Learning to Follow Language Instructions with Compositional Policies

We propose a framework that learns to execute natural language instructi...
research
02/25/2021

Reinforcement Learning of Implicit and Explicit Control Flow in Instructions

Learning to flexibly follow task instructions in dynamic environments po...
research
03/31/2022

Do Vision-Language Pretrained Models Learn Primitive Concepts?

Vision-language pretrained models have achieved impressive performance o...

Please sign up or login with your details

Forgot password? Click here to reset