Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning

10/29/2015
by   Christoph Dann, et al.
0

Recently, there has been significant progress in understanding reinforcement learning in discounted infinite-horizon Markov decision processes (MDPs) by deriving tight sample complexity bounds. However, in many real-world applications, an interactive learning agent operates for a fixed or bounded period of time, for example tutoring students for exams or handling customer service requests. Such scenarios can often be better treated as episodic fixed-horizon MDPs, for which only looser bounds on the sample complexity exist. A natural notion of sample complexity in this setting is the number of episodes required to guarantee a certain performance with high probability (PAC guarantee). In this paper, we derive an upper PAC bound Õ(| S|^2 | A| H^2/ϵ^21/δ) and a lower PAC bound Ω̃(| S| | A| H^2/ϵ^21/δ + c) that match up to log-terms and an additional linear dependency on the number of states | S|. The lower bound is the first of its kind for this setting. Our upper bound leverages Bernstein's inequality to improve on previous bounds for episodic finite-horizon MDPs which have a time-horizon dependency of at least H^3.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2021

Settling the Horizon-Dependence of Sample Complexity in Reinforcement Learning

Recently there is a surge of interest in understanding the horizon-depen...
research
06/18/2021

On the Sample Complexity of Batch Reinforcement Learning with Policy-Induced Data

We study the fundamental question of the sample complexity of learning a...
research
03/17/2022

Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs

In probably approximately correct (PAC) reinforcement learning (RL), an ...
research
09/23/2019

PAC Reinforcement Learning without Real-World Feedback

This work studies reinforcement learning in the Sim-to-Real setting, in ...
research
03/25/2021

Nearly Horizon-Free Offline Reinforcement Learning

We revisit offline reinforcement learning on episodic time-homogeneous t...
research
11/24/2021

Reinforcement Learning for General LTL Objectives Is Intractable

In recent years, researchers have made significant progress in devising ...
research
09/23/2020

A Sample-Efficient Algorithm for Episodic Finite-Horizon MDP with Constraints

Constrained Markov Decision Processes (CMDPs) formalize sequential decis...

Please sign up or login with your details

Forgot password? Click here to reset