Differentiable Trust Region Layers for Deep Reinforcement Learning

01/22/2021
by   Fabian Otto, et al.
0

Trust region methods are a popular tool in reinforcement learning as they yield robust policy updates in continuous and discrete action spaces. However, enforcing such trust regions in deep reinforcement learning is difficult. Hence, many approaches, such as Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), are based on approximations. Due to those approximations, they violate the constraints or fail to find the optimal solution within the trust region. Moreover, they are difficult to implement, lack sufficient exploration, and have been shown to depend on seemingly unrelated implementation choices. In this work, we propose differentiable neural network layers to enforce trust regions for deep Gaussian policies via closed-form projections. Unlike existing methods, those layers formalize trust regions for each state individually and can complement existing reinforcement learning algorithms. We derive trust region projections based on the Kullback-Leibler divergence, the Wasserstein L2 distance, and the Frobenius norm for Gaussian distributions. We empirically demonstrate that those projection layers achieve similar or better results than existing methods while being almost agnostic to specific implementation choices.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/25/2023

Provably Convergent Policy Optimization via Metric-aware Trust Region Methods

Trust-region methods based on Kullback-Leibler divergence are pervasivel...
research
03/19/2019

Truly Proximal Policy Optimization

Proximal policy optimization (PPO) is one of the most successful deep re...
research
04/11/2022

Computing a Sparse Projection into a Box

We describe a procedure to compute a projection of w ∈ℝ^n into the inter...
research
10/24/2022

Understanding the Evolution of Linear Regions in Deep Reinforcement Learning

Policies produced by deep reinforcement learning are typically character...
research
09/23/2020

Revisiting Design Choices in Proximal Policy Optimization

Proximal Policy Optimization (PPO) is a popular deep policy gradient alg...
research
01/18/2019

On-Policy Trust Region Policy Optimisation with Replay Buffers

Building upon the recent success of deep reinforcement learning methods,...
research
05/20/2020

Mirror Descent Policy Optimization

We propose deep Reinforcement Learning (RL) algorithms inspired by mirro...

Please sign up or login with your details

Forgot password? Click here to reset