Fast Adaptation via Policy-Dynamics Value Functions

07/06/2020
by   Roberta Raileanu, et al.
12

Standard RL algorithms assume fixed environment dynamics and require a significant amount of interaction to adapt to new environments. We introduce Policy-Dynamics Value Functions (PD-VF), a novel approach for rapidly adapting to dynamics different from those previously seen in training. PD-VF explicitly estimates the cumulative reward in a space of policies and environments. An ensemble of conventional RL policies is used to gather experience on training environments, from which embeddings of both policies and environments can be learned. Then, a value function conditioned on both embeddings is trained. At test time, a few actions are sufficient to infer the environment embedding, enabling a policy to be selected by maximizing the learned value function (which requires no additional environment interaction). We show that our method can rapidly adapt to new dynamics on a set of MuJoCo domains. Code available at https://github.com/rraileanu/policy-dynamics-value-functions.

READ FULL TEXT

page 5

page 6

page 7

page 16

research
07/04/2022

General Policy Evaluation and Improvement by Learning to Identify Few But Crucial States

Learning to evaluate and improve policies is a core problem of Reinforce...
research
04/06/2022

PAnDR: Fast Adaptation to New Environments from Offline Experiences via Decoupling Policy and Environment Representations

Deep Reinforcement Learning (DRL) has been a promising solution to many ...
research
07/26/2019

Environment Probing Interaction Policies

A key challenge in reinforcement learning (RL) is environment generaliza...
research
03/15/2019

Adaptive Variance for Changing Sparse-Reward Environments

Robots that are trained to perform a task in a fixed environment often f...
research
09/28/2021

Not Only Domain Randomization: Universal Policy with Embedding System Identification

Domain randomization (DR) cannot provide optimal policies for adapting t...
research
02/02/2023

Is Model Ensemble Necessary? Model-based RL via a Single Model with Lipschitz Regularized Value Function

Probabilistic dynamics model ensemble is widely used in existing model-b...
research
10/18/2019

RTFM: Generalising to Novel Environment Dynamics via Reading

Obtaining policies that can generalise to new environments in reinforcem...

Please sign up or login with your details

Forgot password? Click here to reset