Learning to Generalize for Sequential Decision Making

10/05/2020
by   Xusen Yin, et al.
11

We consider problems of making sequences of decisions to accomplish tasks, interacting via the medium of language. These problems are often tackled with reinforcement learning approaches. We find that these models do not generalize well when applied to novel task domains. However, the large amount of computation necessary to adequately train and explore the search space of sequential decision making, under a reinforcement learning paradigm, precludes the inclusion of large contextualized language models, which might otherwise enable the desired generalization ability. We introduce a teacher-student imitation learning methodology and a means of converting a reinforcement learning model into a natural language understanding model. Together, these methodologies enable the introduction of contextualized language models into the sequential decision making problem space. We show that models can learn faster and generalize more, leveraging both the imitation learning and the reformulation. Our models exceed teacher performance on various held-out decision problems, by up to 7 problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/03/2020

Tracking the Race Between Deep Reinforcement Learning and Imitation Learning – Extended Version

Learning-based approaches for solving large sequential decision making p...
research
03/19/2022

Teachable Reinforcement Learning via Advice Distillation

Training automated agents to complete complex tasks in interactive envir...
research
07/21/2023

Selective Perception: Optimizing State Descriptions with Reinforcement Learning for Language Model Actors

Large language models (LLMs) are being applied as actors for sequential ...
research
12/04/2022

Learning Automata-Based Task Knowledge Representation from Large-Scale Generative Language Models

Automata-based representations play an important role in control and pla...
research
06/12/2020

Recurrent Sum-Product-Max Networks for Decision Making in Perfectly-Observed Environments

Recent investigations into sum-product-max networks (SPMN) that generali...
research
09/11/2019

Correlation Priors for Reinforcement Learning

Many decision-making problems naturally exhibit pronounced structures in...
research
04/14/2023

Synthetically Generating Human-like Data for Sequential Decision Making Tasks via Reward-Shaped Imitation Learning

We consider the problem of synthetically generating data that can closel...

Please sign up or login with your details

Forgot password? Click here to reset