A Variant of the Wang-Foster-Kakade Lower Bound for the Discounted Setting

11/02/2020
by   Philip Amortila, et al.
0

Recently, Wang et al. (2020) showed a highly intriguing hardness result for batch reinforcement learning (RL) with linearly realizable value function and good feature coverage in the finite-horizon case. In this note we show that once adapted to the discounted setting, the construction can be simplified to a 2-state MDP with 1-dimensional features, such that learning is impossible even with an infinite amount of data.

READ FULL TEXT

page 1

page 2

research
01/27/2019

Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP

A fundamental question in reinforcement learning is whether model-free a...
research
11/08/2020

Online Sparse Reinforcement Learning

We investigate the hardness of online reinforcement learning in fixed ho...
research
03/23/2021

An Exponential Lower Bound for Linearly-Realizable MDPs with Constant Suboptimality Gap

A fundamental question in the theory of reinforcement learning is: suppo...
research
08/11/2020

Batch Value-function Approximation with Only Realizability

We solve a long-standing problem in batch reinforcement learning (RL): l...
research
08/09/2016

Posterior Sampling for Reinforcement Learning Without Episodes

This is a brief technical note to clarify some of the issues with applyi...
research
06/13/2023

Kernelized Reinforcement Learning with Order Optimal Regret Bounds

Reinforcement learning (RL) has shown empirical success in various real ...
research
08/05/2021

An Elementary Proof that Q-learning Converges Almost Surely

Watkins' and Dayan's Q-learning is a model-free reinforcement learning a...

Please sign up or login with your details

Forgot password? Click here to reset