VA-learning as a more efficient alternative to Q-learning

05/29/2023
by   Yunhao Tang, et al.
0

In reinforcement learning, the advantage function is critical for policy improvement, but is often extracted from a learned Q-function. A natural question is: Why not learn the advantage function directly? In this work, we introduce VA-learning, which directly learns advantage function and value function using bootstrapping, without explicit reference to Q-functions. VA-learning learns off-policy and enjoys similar theoretical guarantees as Q-learning. Thanks to the direct learning of advantage function and value function, VA-learning improves the sample efficiency over Q-learning both in tabular implementations and deep RL agents on Atari-57 games. We also identify a close connection between VA-learning and the dueling architecture, which partially explains why a simple architectural change to DQN agents tends to improve performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/02/2023

Accelerating Policy Gradient by Estimating Value Function from Prior Computation in Deep Reinforcement Learning

This paper investigates the use of prior computation to estimate the val...
research
07/30/2021

Maximum Entropy Dueling Network Architecture

In recent years, there have been many deep structures for Reinforcement ...
research
07/04/2022

General Policy Evaluation and Improvement by Learning to Identify Few But Crucial States

Learning to evaluate and improve policies is a core problem of Reinforce...
research
05/23/2018

Scalable Coordinated Exploration in Concurrent Reinforcement Learning

We consider a team of reinforcement learning agents that concurrently op...
research
10/18/2022

Rethinking Value Function Learning for Generalization in Reinforcement Learning

We focus on the problem of training RL agents on multiple training envir...
research
09/13/2021

Direct Advantage Estimation

Credit assignment is one of the central problems in reinforcement learni...
research
04/15/2021

Scale Invariant Solutions for Overdetermined Linear Systems with Applications to Reinforcement Learning

Overdetermined linear systems are common in reinforcement learning, e.g....

Please sign up or login with your details

Forgot password? Click here to reset