Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning

10/15/2022
by   Zihan Zhang, et al.
0

In this paper, we study the episodic reinforcement learning (RL) problem modeled by finite-horizon Markov Decision Processes (MDPs) with constraint on the number of batches. The multi-batch reinforcement learning framework, where the agent is required to provide a time schedule to update policy before everything, which is particularly suitable for the scenarios where the agent suffers extensively from changing the policy adaptively. Given a finite-horizon MDP with S states, A actions and planning horizon H, we design a computational efficient algorithm to achieve near-optimal regret of Õ(√(SAH^3Kln(1/δ)))[Õ(·) hides logarithmic terms of (S,A,H,K)] in K episodes using O(H+log_2log_2(K) ) batches with confidence parameter δ. To our best of knowledge, it is the first Õ(√(SAH^3K)) regret bound with O(H+log_2log_2(K)) batch complexity. Meanwhile, we show that to achieve Õ(poly(S,A,H)√(K)) regret, the number of batches is at least Ω(H/log_A(K)+ log_2log_2(K) ), which matches our upper bound up to logarithmic terms. Our technical contribution are two-fold: 1) a near-optimal design scheme to explore over the unlearned states; 2) an computational efficient algorithm to explore certain directions with an approximated transition model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/15/2014

Near-optimal Reinforcement Learning in Factored MDPs

Any reinforcement learning algorithm that applies to all Markov decision...
research
06/01/2022

Provably Efficient Lifelong Reinforcement Learning with Linear Function Approximation

We study lifelong reinforcement learning (RL) in a regret minimization s...
research
05/07/2020

Reinforcement Learning with Feedback Graphs

We study episodic reinforcement learning in Markov decision processes wh...
research
02/22/2022

Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning

In today's economy, it becomes important for Internet platforms to consi...
research
02/24/2023

Logarithmic Switching Cost in Reinforcement Learning beyond Linear MDPs

In many real-life reinforcement learning (RL) problems, deploying new po...
research
11/01/2021

Decentralized Cooperative Reinforcement Learning with Hierarchical Information Structure

Multi-agent reinforcement learning (MARL) problems are challenging due t...
research
08/11/2021

Gap-Dependent Unsupervised Exploration for Reinforcement Learning

For the problem of task-agnostic reinforcement learning (RL), an agent f...

Please sign up or login with your details

Forgot password? Click here to reset