Sharper Model-free Reinforcement Learning for Average-reward Markov Decision Processes

06/28/2023
by   Zihan Zhang, et al.
0

We develop several provably efficient model-free reinforcement learning (RL) algorithms for infinite-horizon average-reward Markov Decision Processes (MDPs). We consider both online setting and the setting with access to a simulator. In the online setting, we propose model-free RL algorithms based on reference-advantage decomposition. Our algorithm achieves O(S^5A^2sp(h^*)√(T)) regret after T steps, where S× A is the size of state-action space, and sp(h^*) the span of the optimal bias function. Our results are the first to achieve optimal dependence in T for weakly communicating MDPs. In the simulator setting, we propose a model-free RL algorithm that finds an ϵ-optimal policy using O(SAsp^2(h^*)/ϵ^2+S^2Asp(h^*)/ϵ) samples, whereas the minimax lower bound is Ω(SAsp(h^*)/ϵ^2). Our results are based on two new techniques that are unique in the average-reward setting: 1) better discounted approximation by value-difference estimation; 2) efficient construction of confidence region for the optimal bias function with space complexity O(SA).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2019

Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes

Model-free reinforcement learning is known to be memory and computation ...
research
05/01/2019

Efficient Model-free Reinforcement Learning in Metric Spaces

Model-free Reinforcement Learning (RL) algorithms such as Q-learning [Wa...
research
06/29/2020

Learning and Planning in Average-Reward Markov Decision Processes

We introduce improved learning and planning algorithms for average-rewar...
research
10/18/2020

Average-reward model-free reinforcement learning: a systematic review and literature mapping

Model-free reinforcement learning (RL) has been an active area of resear...
research
05/31/2022

One Policy is Enough: Parallel Exploration with a Single Policy is Minimax Optimal for Reward-Free Reinforcement Learning

While parallelism has been extensively used in Reinforcement Learning (R...
research
02/08/2020

Provably Efficient Adaptive Approximate Policy Iteration

Model-free reinforcement learning algorithms combined with value functio...
research
06/23/2022

Recursive Reinforcement Learning

Recursion is the fundamental paradigm to finitely describe potentially i...

Please sign up or login with your details

Forgot password? Click here to reset