Robust Reinforcement Learning for Continuous Control with Model Misspecification

06/18/2019
by   Daniel J. Mankowitz, et al.
0

We provide a framework for incorporating robustness -- to perturbations in the transition dynamics which we refer to as model misspecification -- into continuous control Reinforcement Learning (RL) algorithms. We specifically focus on incorporating robustness into a state-of-the-art continuous control RL algorithm called Maximum a-posteriori Policy Optimization (MPO). We achieve this by learning a policy that optimizes for a worst case, entropy-regularized, expected return objective and derive a corresponding robust entropy-regularized Bellman contraction operator. In addition, we introduce a less conservative, soft-robust, entropy-regularized objective with a corresponding Bellman operator. We show that both, robust and soft-robust policies, outperform their non-robust counterparts in nine Mujoco domains with environment perturbations. Finally, we present multiple investigative experiments that provide a deeper insight into the robustness framework; including an adaptation to another continuous control RL algorithm as well as comparing this approach to domain randomization. Performance videos can be found online at https://sites.google.com/view/robust-rl.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/07/2019

Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning

Maximum entropy deep reinforcement learning (RL) methods have been demon...
research
02/23/2019

Distributionally Robust Reinforcement Learning

Generalization to unknown/uncertain environments of reinforcement learni...
research
10/21/2022

Group Distributionally Robust Reinforcement Learning with Hierarchical Latent Variables

One key challenge for multi-task Reinforcement learning (RL) in practice...
research
06/04/2021

Robustifying Reinforcement Learning Policies with ℒ_1 Adaptive Control

A reinforcement learning (RL) policy trained in a nominal environment co...
research
01/26/2023

Policy Optimization with Robustness Certificates

We present a policy optimization framework in which the learned policy c...
research
03/23/2022

Your Policy Regularizer is Secretly an Adversary

Policy regularization methods such as maximum entropy regularization are...
research
11/01/2019

Generalized Speedy Q-learning

In this paper, we derive a generalization of the Speedy Q-learning (SQL)...

Please sign up or login with your details

Forgot password? Click here to reset