Structural Return Maximization for Reinforcement Learning

05/12/2014
by   Joshua Joseph, et al.
0

Batch Reinforcement Learning (RL) algorithms attempt to choose a policy from a designer-provided class of policies given a fixed set of training data. Choosing the policy which maximizes an estimate of return often leads to over-fitting when only limited data is available, due to the size of the policy class in relation to the amount of data available. In this work, we focus on learning policy classes that are appropriately sized to the amount of data available. We accomplish this by using the principle of Structural Risk Minimization, from Statistical Learning Theory, which uses Rademacher complexity to identify a policy class that maximizes a bound on the return of the best policy in the chosen policy class, given the available data. Unlike similar batch RL approaches, our bound on return requires only extremely weak assumptions on the true system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2019

On the Generalization Gap in Reparameterizable Reinforcement Learning

Understanding generalization in reinforcement learning (RL) is a signifi...
research
04/19/2022

COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

We consider the offline constrained reinforcement learning (RL) problem,...
research
07/16/2020

Provably Good Batch Reinforcement Learning Without Great Exploration

Batch reinforcement learning (RL) is important to apply RL algorithms to...
research
02/08/2020

BRPO: Batch Residual Policy Optimization

In batch reinforcement learning (RL), one often constrains a learned pol...
research
03/25/2021

Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning

This paper considers batch Reinforcement Learning (RL) with general valu...
research
10/06/2021

Mismatched No More: Joint Model-Policy Optimization for Model-Based RL

Many model-based reinforcement learning (RL) methods follow a similar te...
research
01/30/2017

Reinforcement Learning Algorithm Selection

This paper formalises the problem of online algorithm selection in the c...

Please sign up or login with your details

Forgot password? Click here to reset