Phasic Policy Gradient

09/09/2020
by   Karl Cobbe, et al.
0

We introduce Phasic Policy Gradient (PPG), a reinforcement learning framework which modifies traditional on-policy actor-critic methods by separating policy and value function training into distinct phases. In prior methods, one must choose between using a shared network or separate networks to represent the policy and value function. Using separate networks avoids interference between objectives, while using a shared network allows useful features to be shared. PPG is able to achieve the best of both worlds by splitting optimization into two phases, one that advances training and one that distills features. PPG also enables the value function to be more aggressively optimized with a higher level of sample reuse. Compared to PPO, we find that PPG significantly improves sample efficiency on the challenging Procgen Benchmark.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/02/2023

Accelerating Policy Gradient by Estimating Value Function from Prior Computation in Deep Reinforcement Learning

This paper investigates the use of prior computation to estimate the val...
research
06/24/2021

Mix and Mask Actor-Critic Methods

Shared feature spaces for actor-critic methods aims to capture generaliz...
research
02/20/2021

Decoupling Value and Policy for Generalization in Reinforcement Learning

Standard deep reinforcement learning algorithms use a shared representat...
research
10/18/2022

Rethinking Value Function Learning for Generalization in Reinforcement Learning

We focus on the problem of training RL agents on multiple training envir...
research
02/01/2023

Distillation Policy Optimization

On-policy algorithms are supposed to be stable, however, sample-intensiv...
research
06/20/2022

DNA: Proximal Policy Optimization with a Dual Network Architecture

This paper explores the problem of simultaneously learning a value funct...
research
07/05/2021

Hybrid and dynamic policy gradient optimization for bipedal robot locomotion

Controlling a non-statically bipedal robot is challenging due to the com...

Please sign up or login with your details

Forgot password? Click here to reset