DNA: Proximal Policy Optimization with a Dual Network Architecture

06/20/2022
by   Mathew Aitchison, et al.
0

This paper explores the problem of simultaneously learning a value function and policy in deep actor-critic reinforcement learning models. We find that the common practice of learning these functions jointly is sub-optimal, due to an order-of-magnitude difference in noise levels between these two tasks. Instead, we show that learning these tasks independently, but with a constrained distillation phase, significantly improves performance. Furthermore, we find that the policy gradient noise levels can be decreased by using a lower variance return estimate. Whereas, the value learning noise level decreases with a lower bias estimate. Together these insights inform an extension to Proximal Policy Optimization we call Dual Network Architecture (DNA), which significantly outperforms its predecessor. DNA also exceeds the performance of the popular Rainbow DQN algorithm on four of the five environments tested, even under more difficult stochastic control settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/11/2019

Doubly Robust Off-Policy Actor-Critic Algorithms for Reinforcement Learning

We study the problem of off-policy critic evaluation in several variants...
research
09/09/2020

Phasic Policy Gradient

We introduce Phasic Policy Gradient (PPG), a reinforcement learning fram...
research
05/17/2021

Controlling an Inverted Pendulum with Policy Gradient Methods-A Tutorial

This paper provides the details of implementing two important policy gra...
research
04/05/2019

Multi-Preference Actor Critic

Policy gradient algorithms typically combine discounted future rewards w...
research
10/18/2022

Rethinking Value Function Learning for Generalization in Reinforcement Learning

We focus on the problem of training RL agents on multiple training envir...
research
05/29/2023

DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm

Multi-step learning applies lookahead over multiple time steps and has p...

Please sign up or login with your details

Forgot password? Click here to reset