Quantile Regression Deep Reinforcement Learning

06/27/2019
by   Oliver Richter, et al.
6

Policy gradient based reinforcement learning algorithms coupled with neural networks have shown success in learning complex policies in the model free continuous action space control setting. However, explicitly parameterized policies are limited by the scope of the chosen parametric probability distribution. We show that alternatively to the likelihood based policy gradient, a related objective can be optimized through advantage weighted quantile regression. Our approach models the policy implicitly in the network, which gives the agent the freedom to approximate any distribution in each action dimension, not limiting its capabilities to the commonly used unimodal Gaussian parameterization. This broader spectrum of policies makes our algorithm suitable for problems where Gaussian policies cannot fit the optimal policy. Moreover, our results on the MuJoCo physics simulator benchmarks are comparable or superior to state-of-the-art on-policy methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/12/2023

Quantile-Based Deep Reinforcement Learning using Two-Timescale Policy Gradient Algorithms

Classical reinforcement learning (RL) aims to optimize the expected cumu...
research
01/27/2022

Quantile-Based Policy Optimization for Reinforcement Learning

Classical reinforcement learning (RL) aims to optimize the expected cumu...
research
02/08/2022

Bingham Policy Parameterization for 3D Rotations in Reinforcement Learning

We propose a new policy parameterization for representing 3D rotations d...
research
05/31/2019

Diversity-Inducing Policy Gradient: Using Maximum Mean Discrepancy to Find a Set of Diverse Policies

Standard reinforcement learning methods aim to master one way of solving...
research
08/02/2022

Implicit Two-Tower Policies

We present a new class of structured reinforcement learning policy-archi...
research
02/22/2022

Reward-Free Policy Space Compression for Reinforcement Learning

In reinforcement learning, we encode the potential behaviors of an agent...
research
04/28/2019

Learning walk and trot from the same objective using different types of exploration

In quadruped gait learning, policy search methods that scale high dimens...

Please sign up or login with your details

Forgot password? Click here to reset