On overfitting and asymptotic bias in batch reinforcement learning with partial observability

09/22/2017
by   Vincent Francois-Lavet, et al.
0

This paper stands in the context of reinforcement learning with partial observability and limited data. In this setting, we focus on the tradeoff between asymptotic bias (suboptimality with unlimited data) and overfitting (additional suboptimality due to limited data), and theoretically show that while potentially increasing the asymptotic bias, a smaller state representation decreases the risk of overfitting. Our analysis relies on expressing the quality of a state representation by bounding L1 error terms of the associated belief states. Theoretical results are empirically illustrated when the state representation is a truncated history of observations. Finally, we also discuss and empirically illustrate how using function approximators and adapting the discount factor may enhance the tradeoff between asymptotic bias and overfitting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/17/2019

On the Bias-Variance Tradeoff: Textbooks Need an Update

The main goal of this thesis is to point out that the bias-variance trad...
research
06/20/2018

A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning

The risks and perils of overfitting in machine learning are well known. ...
research
03/02/2017

Positive-Unlabeled Learning with Non-Negative Risk Estimator

From only positive (P) and unlabeled (U) data, a binary classifier could...
research
08/08/2019

Optimal multiclass overfitting by sequence reconstruction from Hamming queries

A primary concern of excessive reuse of test datasets in machine learnin...
research
11/30/2020

Geometry of asymptotic bias reduction of plug-in estimators with adjusted likelihood

A geometric framework to improve a plug-in estimator in terms of asympto...
research
09/26/2022

Learning GFlowNets from partial episodes for improved convergence and stability

Generative flow networks (GFlowNets) are a family of algorithms for trai...

Please sign up or login with your details

Forgot password? Click here to reset