Proximal Policy Optimization with Continuous Bounded Action Space via the Beta Distribution

11/03/2021
by   Irving G. B. Petrazzini, et al.
0

Reinforcement learning methods for continuous control tasks have evolved in recent years generating a family of policy gradient methods that rely primarily on a Gaussian distribution for modeling a stochastic policy. However, the Gaussian distribution has an infinite support, whereas real world applications usually have a bounded action space. This dissonance causes an estimation bias that can be eliminated if the Beta distribution is used for the policy instead, as it presents a finite support. In this work, we investigate how this Beta policy performs when it is trained by the Proximal Policy Optimization (PPO) algorithm on two continuous control tasks from OpenAI gym. For both tasks, the Beta policy is superior to the Gaussian policy in terms of agent's final expected reward, also showing more stability and faster convergence of the training process. For the CarRacing environment with high-dimensional image input, the agent's success rate was improved by 63

READ FULL TEXT

page 4

page 7

research
12/02/2019

On-policy Reinforcement Learning with Entropy Regularization

Entropy regularization is an imported idea in reinforcement learning, wi...
research
04/03/2019

PaintBot: A Reinforcement Learning Approach for Natural Media Painting

We propose a new automated digital painting framework, based on a painti...
research
03/09/2020

Stable Policy Optimization via Off-Policy Divergence Regularization

Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization...
research
12/14/2022

Robust Policy Optimization in Deep Reinforcement Learning

The policy gradient method enjoys the simplicity of the objective where ...
research
01/28/2022

On the Hidden Biases of Policy Mirror Ascent in Continuous Action Spaces

We focus on parameterized policy search for reinforcement learning over ...
research
03/07/2020

Convergence of Q-value in case of Gaussian rewards

In this paper, as a study of reinforcement learning, we converge the Q f...
research
02/20/2021

On Proximal Policy Optimization's Heavy-tailed Gradients

Modern policy gradient algorithms, notably Proximal Policy Optimization ...

Please sign up or login with your details

Forgot password? Click here to reset