Improving Policy Learning via Language Dynamics Distillation

09/30/2022
by   Victor Zhong, et al.
0

Recent work has shown that augmenting environments with language descriptions improves policy learning. However, for environments with complex language abstractions, learning how to ground language to observations is difficult due to sparse, delayed rewards. We propose Language Dynamics Distillation (LDD), which pretrains a model to predict environment dynamics given demonstrations with language descriptions, and then fine-tunes these language-aware pretrained representations via reinforcement learning (RL). In this way, the model is trained to both maximize expected reward and retain knowledge about how language relates to environment dynamics. On SILG, a benchmark of five tasks with language descriptions that evaluate distinct generalization challenges on unseen environments (NetHack, ALFWorld, RTFM, Messenger, and Touchdown), LDD outperforms tabula-rasa RL, VAE pretraining, and methods that learn from unlabeled demonstrations in inverse RL and reward shaping with pretrained experts. In our analyses, we show that language descriptions in demonstrations improve sample-efficiency and generalization across environments, and that dynamics modelling with expert demonstrations is more effective than with non-experts.

READ FULL TEXT

page 2

page 6

page 7

page 8

page 9

page 15

page 16

research
06/15/2021

Residual Reinforcement Learning from Demonstrations

Residual reinforcement learning (RL) has been proposed as a way to solve...
research
10/18/2019

RTFM: Generalising to Novel Environment Dynamics via Reading

Obtaining policies that can generalise to new environments in reinforcem...
research
08/23/2023

Language Reward Modulation for Pretraining Reinforcement Learning

Using learned reward functions (LRFs) as a means to solve sparse-reward ...
research
04/08/2022

Semantic Exploration from Language Abstractions and Pretrained Representations

Continuous first-person 3D environments pose unique exploration challeng...
research
08/01/2017

Deep Transfer in Reinforcement Learning by Language Grounding

In this paper, we explore the utilization of natural language to drive t...
research
09/19/2023

Guide Your Agent with Adaptive Multimodal Rewards

Developing an agent capable of adapting to unseen environments remains a...
research
05/18/2021

Fixed β-VAE Encoding for Curious Exploration in Complex 3D Environments

Curiosity is a general method for augmenting an environment reward with ...

Please sign up or login with your details

Forgot password? Click here to reset