Robust Policy Optimization in Deep Reinforcement Learning

12/14/2022
by   Md Masudur Rahman, et al.
0

The policy gradient method enjoys the simplicity of the objective where the agent optimizes the cumulative reward directly. Moreover, in the continuous action domain, parameterized distribution of action distribution allows easy control of exploration, resulting from the variance of the representing distribution. Entropy can play an essential role in policy optimization by selecting the stochastic policy, which eventually helps better explore the environment in reinforcement learning (RL). However, the stochasticity often reduces as the training progresses; thus, the policy becomes less exploratory. Additionally, certain parametric distributions might only work for some environments and require extensive hyperparameter tuning. This paper aims to mitigate these issues. In particular, we propose an algorithm called Robust Policy Optimization (RPO), which leverages a perturbed distribution. We hypothesize that our method encourages high-entropy actions and provides a way to represent the action space better. We further provide empirical evidence to verify our hypothesis. We evaluated our methods on various continuous control tasks from DeepMind Control, OpenAI Gym, Pybullet, and IsaacGym. We observed that in many settings, RPO increases the policy entropy early in training and then maintains a certain level of entropy throughout the training period. Eventually, our agent RPO shows consistently improved performance compared to PPO and other techniques: entropy regularization, different distributions, and data augmentation. Furthermore, in several settings, our method stays robust in performance, while other baseline mechanisms fail to improve and even worsen the performance.

READ FULL TEXT

page 11

page 15

page 16

page 17

research
12/02/2019

On-policy Reinforcement Learning with Entropy Regularization

Entropy regularization is an imported idea in reinforcement learning, wi...
research
10/20/2020

Iterative Amortized Policy Optimization

Policy networks are a central feature of deep reinforcement learning (RL...
research
12/11/2019

Marginalized State Distribution Entropy Regularization in Policy Optimization

Entropy regularization is used to get improved optimization performance ...
research
10/13/2022

Bootstrap Advantage Estimation for Policy Optimization in Reinforcement Learning

This paper proposes an advantage estimation approach based on data augme...
research
09/20/2022

A Joint Imitation-Reinforcement Learning Framework for Reduced Baseline Regret

In various control task domains, existing controllers provide a baseline...
research
11/03/2021

Proximal Policy Optimization with Continuous Bounded Action Space via the Beta Distribution

Reinforcement learning methods for continuous control tasks have evolved...
research
08/19/2022

Entropy Augmented Reinforcement Learning

Deep reinforcement learning has gained a lot of success with the presenc...

Please sign up or login with your details

Forgot password? Click here to reset