Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL

12/14/2020
by   Andrea Zanette, et al.
0

Several practical applications of reinforcement learning involve an agent learning from past data without the possibility of further exploration. Often these applications require us to 1) identify a near optimal policy or to 2) estimate the value of a target policy. For both tasks we derive exponential information-theoretic lower bounds in discounted infinite horizon MDPs with a linear function representation for the action value function even if 1) realizability holds, 2) the batch algorithm observes the exact reward and transition functions, and 3) the batch algorithm is given the best a priori data distribution for the problem class. Furthermore, if the dataset does not come from policy rollouts then the lower bounds hold even if all policies admit a linear representation. If the objective is to find a near-optimal policy, we discover that these hard instances are easily solved by an online algorithm, showing that there exist RL problems where batch RL is exponentially harder than online RL even under the most favorable batch data distribution. In other words, online exploration is critical to enable sample efficient RL with function approximation. A second corollary is the exponential separation between finite and infinite horizon batch problems under our assumptions. On a technical level, this work helps formalize the issue known as deadly triad and explains that the bootstrapping problem is potentially more severe than the extrapolation issue for RL because unlike the latter, bootstrapping cannot be mitigated by adding more samples.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/03/2022

Near-Optimal Deployment Efficiency in Reward-Free Reinforcement Learning with Linear Function Approximation

We study the problem of deployment efficient reinforcement learning (RL)...
research
10/07/2019

Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?

Modern deep learning methods provide an effective means to learn good re...
research
02/14/2022

Towards Deployment-Efficient Reinforcement Learning: Lower Bound and Optimality

Deployment efficiency is an important criterion for many real-world appl...
research
05/29/2022

Provable Benefits of Representational Transfer in Reinforcement Learning

We study the problem of representational transfer in RL, where an agent ...
research
05/01/2019

Information-Theoretic Considerations in Batch Reinforcement Learning

Value-function approximation methods that operate in batch mode have fou...
research
01/30/2023

STEEL: Singularity-aware Reinforcement Learning

Batch reinforcement learning (RL) aims at finding an optimal policy in a...
research
02/16/2022

Branching Reinforcement Learning

In this paper, we propose a novel Branching Reinforcement Learning (Bran...

Please sign up or login with your details

Forgot password? Click here to reset