Q^ Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison

03/09/2020
by   Tengyang Xie, et al.
0

We prove performance guarantees of two algorithms for approximating Q^ in batch reinforcement learning. Compared to classical iterative methods such as Fitted Q-Iteration—whose performance loss incurs quadratic dependence on horizon—these methods estimate (some forms of) the Bellman error and enjoy linear-in-horizon error propagation, a property established for the first time for algorithms that rely solely on batch data and output stationary policies. One of the algorithms uses a novel and explicit importance-weighting correction to overcome the infamous "double sampling" difficulty in Bellman error estimation, and does not use any squared losses. Our analyses reveal its distinct characteristics and potential advantages compared to classical algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2021

Theoretical Guarantees of Fictitious Discount Algorithms for Episodic Reinforcement Learning and Global Convergence of Policy Gradient Methods

When designing algorithms for finite-time-horizon episodic reinforcement...
research
07/16/2020

Provably Good Batch Reinforcement Learning Without Great Exploration

Batch reinforcement learning (RL) is important to apply RL algorithms to...
research
08/15/2020

Reducing Sampling Error in Batch Temporal Difference Learning

Temporal difference (TD) learning is one of the main foundations of mode...
research
07/07/2020

Sharp Analysis of Smoothed Bellman Error Embedding

The Smoothed Bellman Error Embedding algorithm <cit.>, known as SBEED, w...
research
03/25/2021

Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning

This paper considers batch Reinforcement Learning (RL) with general valu...
research
01/30/2023

STEEL: Singularity-aware Reinforcement Learning

Batch reinforcement learning (RL) aims at finding an optimal policy in a...
research
11/21/2020

Neural Network iLQR: A New Reinforcement Learning Architecture

As a notable machine learning paradigm, the research efforts in the cont...

Please sign up or login with your details

Forgot password? Click here to reset