Beyond Exponentially Fast Mixing in Average-Reward Reinforcement Learning via Multi-Level Monte Carlo Actor-Critic

01/28/2023
by   Wesley A. Suttle, et al.
0

Many existing reinforcement learning (RL) methods employ stochastic gradient iteration on the back end, whose stability hinges upon a hypothesis that the data-generating process mixes exponentially fast with a rate parameter that appears in the step-size selection. Unfortunately, this assumption is violated for large state spaces or settings with sparse rewards, and the mixing time is unknown, making the step size inoperable. In this work, we propose an RL methodology attuned to the mixing time by employing a multi-level Monte Carlo estimator for the critic, the actor, and the average reward embedded within an actor-critic (AC) algorithm. This method, which we call Multi-level Actor-Critic (MAC), is developed especially for infinite-horizon average-reward settings and neither relies on oracle knowledge of the mixing time in its parameter selection nor assumes its exponential decay; it, therefore, is readily applicable to applications with slower mixing times. Nonetheless, it achieves a convergence rate comparable to the state-of-the-art AC algorithms. We experimentally show that these alleviated restrictions on the technical conditions required for stability translate to superior performance in practice for RL problems with sparse rewards.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/07/2019

The Actor-Advisor: Policy Gradient With Off-Policy Advice

Actor-critic algorithms learn an explicit policy (actor), and an accompa...
research
10/14/2022

Monte Carlo Augmented Actor-Critic for Sparse Reward Deep Reinforcement Learning from Suboptimal Demonstrations

Providing densely shaped reward functions for RL algorithms is often exc...
research
02/11/2023

UGAE: A Novel Approach to Non-exponential Discounting

The discounting mechanism in Reinforcement Learning determines the relat...
research
02/28/2022

Provably Efficient Convergence of Primal-Dual Actor-Critic with Nonlinear Function Approximation

We study the convergence of the actor-critic algorithm with nonlinear fu...
research
06/11/2020

Distributed Reinforcement Learning in Multi-Agent Networked Systems

We study distributed reinforcement learning (RL) for a network of agents...
research
06/08/2022

Scalable Online Disease Diagnosis via Multi-Model-Fused Actor-Critic Reinforcement Learning

For those seeking healthcare advice online, AI based dialogue agents cap...
research
01/18/2020

Effects of sparse rewards of different magnitudes in the speed of learning of model-based actor critic methods

Actor critic methods with sparse rewards in model-based deep reinforceme...

Please sign up or login with your details

Forgot password? Click here to reset