Proximal Policy Optimization via Enhanced Exploration Efficiency

11/11/2020
by   Junwei Zhang, et al.
0

Proximal policy optimization (PPO) algorithm is a deep reinforcement learning algorithm with outstanding performance, especially in continuous control tasks. But the performance of this method is still affected by its exploration ability. For classical reinforcement learning, there are some schemes that make exploration more full and balanced with data exploitation, but they can't be applied in complex environments due to the complexity of algorithm. Based on continuous control tasks with dense reward, this paper analyzes the assumption of the original Gaussian action exploration mechanism in PPO algorithm, and clarifies the influence of exploration ability on performance. Afterward, aiming at the problem of exploration, an exploration enhancement mechanism based on uncertainty estimation is designed in this paper. Then, we apply exploration enhancement theory to PPO algorithm and propose the proximal policy optimization algorithm with intrinsic exploration module (IEM-PPO) which can be used in complex environments. In the experimental parts, we evaluate our method on multiple tasks of MuJoCo physical simulator, and compare IEM-PPO algorithm with curiosity driven exploration algorithm (ICM-PPO) and original algorithm (PPO). The experimental results demonstrate that IEM-PPO algorithm needs longer training time, but performs better in terms of sample efficiency and cumulative reward, and has stability and robustness.

READ FULL TEXT
research
11/11/2019

Multi-Path Policy Optimization

Recent years have witnessed a tremendous improvement of deep reinforceme...
research
12/13/2022

PPO-UE: Proximal Policy Optimization via Uncertainty-Aware Exploration

Proximal Policy Optimization (PPO) is a highly popular policy-based deep...
research
08/19/2022

Entropy Augmented Reinforcement Learning

Deep reinforcement learning has gained a lot of success with the presenc...
research
02/20/2021

Decaying Clipping Range in Proximal Policy Optimization

Proximal Policy Optimization (PPO) is among the most widely used algorit...
research
05/06/2020

Learning Adaptive Exploration Strategies in Dynamic Environments Through Informed Policy Regularization

We study the problem of learning exploration-exploitation strategies tha...
research
02/22/2022

Cellular Network Capacity and Coverage Enhancement with MDT Data and Deep Reinforcement Learning

Recent years witnessed a remarkable increase in the availability of data...
research
12/21/2021

Adversarial Gradient Driven Exploration for Deep Click-Through Rate Prediction

Nowadays, data-driven deep neural models have already shown remarkable p...

Please sign up or login with your details

Forgot password? Click here to reset