Generalized Hidden Parameter MDPs Transferable Model-based RL in a Handful of Trials

02/08/2020
by   Christian F. Perez, et al.
0

There is broad interest in creating RL agents that can solve many (related) tasks and adapt to new tasks and environments after initial training. Model-based RL leverages learned surrogate models that describe dynamics and rewards of individual tasks, such that planning in a good surrogate can lead to good control of the true system. Rather than solving each task individually from scratch, hierarchical models can exploit the fact that tasks are often related by (unobserved) causal factors of variation in order to achieve efficient generalization, as in learning how the mass of an item affects the force required to lift it can generalize to previously unobserved masses. We propose Generalized Hidden Parameter MDPs (GHP-MDPs) that describe a family of MDPs where both dynamics and reward can change as a function of hidden parameters that vary across tasks. The GHP-MDP augments model-based RL with latent variables that capture these hidden parameters, facilitating transfer across tasks. We also explore a variant of the model that incorporates explicit latent structure mirroring the causal factors of variation across tasks (for instance: agent properties, environmental factors, and goals). We experimentally demonstrate state-of-the-art performance and sample-efficiency on a new challenging MuJoCo task using reward and dynamics latent spaces, while beating a previous state-of-the-art baseline with >10× less data. Using test-time inference of the latent variables, our approach generalizes in a single episode to novel combinations of dynamics and reward, and to novel rewards.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/08/2018

Efficient transfer learning and online adaptation with latent variable models for continuous control

Traditional model-based RL relies on hand-specified or learned models of...
research
08/15/2013

Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations

Control applications often feature tasks with similar, but not identical...
research
10/17/2019

Single Episode Policy Transfer in Reinforcement Learning

Transfer and adaptation to new unknown environmental dynamics is a key c...
research
07/08/2015

Spotlight the Negatives: A Generalized Discriminative Latent Model

Discriminative latent variable models (LVM) are frequently applied to va...
research
10/09/2019

Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models

Humans are masters at quickly learning many complex tasks, relying on an...
research
05/14/2020

Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning

Model-based reinforcement learning (RL) enjoys several benefits, such as...
research
08/04/2020

Learning Transition Models with Time-delayed Causal Relations

This paper introduces an algorithm for discovering implicit and delayed ...

Please sign up or login with your details

Forgot password? Click here to reset