Self-Educated Language Agent With Hindsight Experience Replay For Instruction Following

10/21/2019
by   Geoffrey Cideron, et al.
26

Language creates a compact representation of the world and allows the description of unlimited situations and objectives through compositionality. These properties make it a natural fit to guide the training of interactive agents as it could ease recurrent challenges in Reinforcement Learning such as sample complexity, generalization, or multi-tasking. Yet, it remains an open-problem to relate language and RL in even simple instruction following scenarios. Current methods rely on expert demonstrations, auxiliary losses, or inductive biases in neural architectures. In this paper, we propose an orthogonal approach called Textual Hindsight Experience Replay (THER) that extends the Hindsight Experience Replay approach to the language setting. Whenever the agent does not fulfill its instruction, THER learns to output a new directive that matches the agent trajectory, and it relabels the episode with a positive reward. To do so, THER learns to map a state into an instruction by using past successful trajectories, which removes the need to have external expert interventions to relabel episodes as in vanilla HER. We observe that this simple idea also initiates a learning synergy between language acquisition and policy learning on instruction following tasks in the BabyAI environment.

READ FULL TEXT

page 2

page 4

page 7

page 14

page 15

research
06/17/2020

Forgetful Experience Replay in Hierarchical Reinforcement Learning from Demonstrations

Currently, deep reinforcement learning (RL) shows impressive results in ...
research
06/12/2020

Language-Conditioned Goal Generation: a New Approach to Language Grounding for RL

In the real world, linguistic agents are also embodied agents: they perc...
research
05/19/2020

Human Instruction-Following with Deep Reinforcement Learning via Transfer-Learning from Text

Recent work has described neural-network-based agents that are trained w...
research
11/19/2018

Guiding Policies with Language via Meta-Learning

Behavioral skills or policies for autonomous agents are conventionally l...
research
12/19/2022

Continual Learning for Instruction Following from Realtime Feedback

We study the problem of continually training an instruction-following ag...
research
12/01/2019

Automated curriculum generation for Policy Gradients from Demonstrations

In this paper, we present a technique that improves the process of train...
research
06/20/2017

Grounded Language Learning in a Simulated 3D World

We are increasingly surrounded by artificially intelligent technology th...

Please sign up or login with your details

Forgot password? Click here to reset