Performance Bounds for Policy-Based Average Reward Reinforcement Learning Algorithms

02/02/2023
by   Yashaswini Murthy, et al.
0

Many policy-based reinforcement learning (RL) algorithms can be viewed as instantiations of approximate policy iteration (PI), i.e., where policy improvement and policy evaluation are both performed approximately. In applications where the average reward objective is the meaningful performance metric, often discounted reward formulations are used with the discount factor being close to 1, which is equivalent to making the expected horizon very large. However, the corresponding theoretical bounds for error performance scale with the square of the horizon. Thus, even after dividing the total reward by the length of the horizon, the corresponding performance bounds for average reward problems go to infinity. Therefore, an open problem has been to obtain meaningful performance bounds for approximate PI and RL algorithms for the average-reward setting. In this paper, we solve this open problem by obtaining the first non-trivial error bounds for average-reward MDPs which go to zero in the limit where when policy evaluation and policy improvement errors go to zero.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2023

Offline Primal-Dual Reinforcement Learning for Linear MDPs

Offline Reinforcement Learning (RL) aims to learn a near-optimal policy ...
research
10/17/2020

Approximate information state for approximate planning and reinforcement learning in partially observed systems

We propose a theoretical framework for approximate planning and learning...
research
06/14/2021

On-Policy Deep Reinforcement Learning for the Average-Reward Criterion

We develop theory and algorithms for average-reward on-policy Reinforcem...
research
02/02/2023

Average-Constrained Policy Optimization

Reinforcement Learning (RL) with constraints is becoming an increasingly...
research
04/19/2023

Bridging RL Theory and Practice with the Effective Horizon

Deep reinforcement learning (RL) works impressively in some environments...
research
01/09/2023

Minimax Weight Learning for Absorbing MDPs

Reinforcement learning policy evaluation problems are often modeled as f...
research
03/17/2023

A New Policy Iteration Algorithm For Reinforcement Learning in Zero-Sum Markov Games

Many model-based reinforcement learning (RL) algorithms can be viewed as...

Please sign up or login with your details

Forgot password? Click here to reset