Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes

09/05/2023
∙
by   Qinbo Bai, et al.
∙
0
∙

In this paper, we consider an infinite horizon average reward Markov Decision Process (MDP). Distinguishing itself from existing works within this context, our approach harnesses the power of the general policy gradient-based algorithm, liberating it from the constraints of assuming a linear MDP structure. We propose a policy gradient-based algorithm and show its global convergence property. We then prove that the proposed algorithm has 𝒊Ėƒ(T^3/4) regret. Remarkably, this paper marks a pioneering effort by presenting the first exploration into regret-bound computation for the general parameterized policy gradient algorithm in the context of average reward scenarios.

READ FULL TEXT
research
∙ 07/23/2020

Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation

We develop several new algorithms for learning Markov Decision Processes...
research
∙ 02/14/2012

Efficient Inference in Markov Control Problems

Markov control algorithms that perform smooth, non-greedy updates of the...
research
∙ 11/26/2018

A Policy Gradient Method with Variance Reduction for Uplift Modeling

Uplift modeling aims to directly model the incremental impact of a treat...
research
∙ 06/03/2011

Infinite-Horizon Policy-Gradient Estimation

Gradient-based approaches to direct policy search in reinforcement learn...
research
∙ 06/03/2011

Experiments with Infinite-Horizon, Policy-Gradient Estimation

In this paper, we present algorithms that perform gradient ascent of the...
research
∙ 01/19/2022

Critic Algorithms using Cooperative Networks

An algorithm is proposed for policy evaluation in Markov Decision Proces...
research
∙ 12/21/2021

Learning in Random Utility Models Via Online Decision Problems

This paper studies the Random Utility Model (RUM) in environments where ...

Please sign up or login with your details

Forgot password? Click here to reset