Joint action loss for proximal policy optimization

01/26/2023
by   Xiulei Song, et al.
0

PPO (Proximal Policy Optimization) is a state-of-the-art policy gradient algorithm that has been successfully applied to complex computer games such as Dota 2 and Honor of Kings. In these environments, an agent makes compound actions consisting of multiple sub-actions. PPO uses clipping to restrict policy updates. Although clipping is simple and effective, it is not efficient in its sample use. For compound actions, most PPO implementations consider the joint probability (density) of sub-actions, which means that if the ratio of a sample (state compound-action pair) exceeds the range, the gradient the sample produces is zero. Instead, for each sub-action we calculate the loss separately, which is less prone to clipping during updates thereby making better use of samples. Further, we propose a multi-action mixed loss that combines joint and separate probabilities. We perform experiments in Gym-μRTS and MuJoCo. Our hybrid model improves performance by more than 50% in different MuJoCo environments compared to OpenAI's PPO benchmark results. And in Gym-μRTS, we find the sub-action loss outperforms the standard PPO approach, especially when the clip range is large. Our findings suggest this method can better balance the use-efficiency and quality of samples.

READ FULL TEXT
research
06/25/2020

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

In recent years, Deep Reinforcement Learning (DRL) algorithms have achie...
research
07/15/2019

Proximal Policy Optimization with Mixed Distributed Training

Instability and slowness are two main problems in deep reinforcement lea...
research
11/28/2019

Hierarchical model-based policy optimization: from actions to action sequences and back

We develop a normative framework for hierarchical model-based policy opt...
research
02/20/2021

Decaying Clipping Range in Proximal Policy Optimization

Proximal Policy Optimization (PPO) is among the most widely used algorit...
research
10/24/2022

On All-Action Policy Gradients

In this paper, we analyze the variance of stochastic policy gradient wit...
research
01/29/2019

Discretizing Continuous Action Space for On-Policy Optimization

In this work, we show that discretizing action space for continuous cont...
research
09/23/2020

Revisiting Design Choices in Proximal Policy Optimization

Proximal Policy Optimization (PPO) is a popular deep policy gradient alg...

Please sign up or login with your details

Forgot password? Click here to reset