On the Sample Complexity of Batch Reinforcement Learning with Policy-Induced Data

06/18/2021
by   Chenjun Xiao, et al.
0

We study the fundamental question of the sample complexity of learning a good policy in finite Markov decision processes (MDPs) when the data available for learning is obtained by following a logging policy that must be chosen without knowledge of the underlying MDP. Our main results show that the sample complexity, the minimum number of transitions necessary and sufficient to obtain a good policy, is an exponential function of the relevant quantities when the planning horizon H is finite. In particular, we prove that the sample complexity of obtaining ϵ-optimal policies is at least Ω(A^min(S-1, H+1)) for γ-discounted problems, where S is the number of states, A is the number of actions, and H is the effective horizon defined as H=⌊ln(1/ϵ)ln(1/γ)⌋; and it is at least Ω(A^min(S-1, H)/ε^2) for finite horizon problems, where H is the planning horizon of the problem. This lower bound is essentially matched by an upper bound. For the average-reward setting we show that there is no algorithm finding ϵ-optimal policies with a finite amount of data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/15/2023

Optimal Sample Complexity of Reinforcement Learning for Uniformly Ergodic Discounted Markov Decision Processes

We consider the optimal sample complexity theory of tabular reinforcemen...
research
05/01/2020

Is Long Horizon Reinforcement Learning More Difficult Than Short Horizon Reinforcement Learning?

Learning to plan for long horizons is a central challenge in episodic re...
research
10/29/2015

Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning

Recently, there has been significant progress in understanding reinforce...
research
03/08/2023

Policy Mirror Descent Inherently Explores Action Space

Designing computationally efficient exploration strategies for on-policy...
research
02/11/2015

Off-Policy Reward Shaping with Ensembles

Potential-based reward shaping (PBRS) is an effective and popular techni...
research
10/05/2021

TensorPlan and the Few Actions Lower Bound for Planning in MDPs under Linear Realizability of Optimal Value Functions

We consider the minimax query complexity of online planning with a gener...
research
11/25/2022

Automata Cascades: Expressivity and Sample Complexity

Every automaton can be decomposed into a cascade of basic automata. This...

Please sign up or login with your details

Forgot password? Click here to reset