On-Policy Deep Reinforcement Learning for the Average-Reward Criterion

06/14/2021
by   Yiming Zhang, et al.
19

We develop theory and algorithms for average-reward on-policy Reinforcement Learning (RL). We first consider bounding the difference of the long-term average reward for two policies. We show that previous work based on the discounted return (Schulman et al., 2015; Achiam et al., 2017) results in a non-meaningful bound in the average-reward setting. By addressing the average-reward criterion directly, we then derive a novel bound which depends on the average divergence between the two policies and Kemeny's constant. Based on this bound, we develop an iterative procedure which produces a sequence of monotonically improved policies for the average reward criterion. This iterative procedure can then be combined with classic DRL (Deep Reinforcement Learning) methods, resulting in practical DRL algorithms that target the long-run average reward criterion. In particular, we demonstrate that Average-Reward TRPO (ATRPO), which adapts the on-policy TRPO algorithm to the average-reward criterion, significantly outperforms TRPO in the most challenging MuJuCo environments.

READ FULL TEXT

page 7

page 24

research
08/18/2020

Learning Fair Policies in Multiobjective (Deep) Reinforcement Learning with Average and Discounted Rewards

As the operations of autonomous systems generally affect simultaneously ...
research
06/07/2021

Average-Reward Reinforcement Learning with Trust Region Methods

Most of reinforcement learning algorithms optimize the discounted criter...
research
04/07/2023

Full Gradient Deep Reinforcement Learning for Average-Reward Criterion

We extend the provably convergent Full Gradient DQN algorithm for discou...
research
07/03/2021

Examining average and discounted reward optimality criteria in reinforcement learning

In reinforcement learning (RL), the goal is to obtain an optimal policy,...
research
02/02/2023

Performance Bounds for Policy-Based Average Reward Reinforcement Learning Algorithms

Many policy-based reinforcement learning (RL) algorithms can be viewed a...
research
01/28/2019

Making Deep Q-learning methods robust to time discretization

Despite remarkable successes, Deep Reinforcement Learning (DRL) is not r...
research
03/07/2022

Influencing Long-Term Behavior in Multiagent Reinforcement Learning

The main challenge of multiagent reinforcement learning is the difficult...

Please sign up or login with your details

Forgot password? Click here to reset