Oracle Inequalities for Model Selection in Offline Reinforcement Learning

11/03/2022
by   Jonathan N. Lee, et al.
3

In offline reinforcement learning (RL), a learner leverages prior logged data to learn a good policy without interacting with the environment. A major challenge in applying such methods in practice is the lack of both theoretically principled and practical tools for model selection and evaluation. To address this, we study the problem of model selection in offline RL with value function approximation. The learner is given a nested sequence of model classes to minimize squared Bellman error and must select among these to achieve a balance between approximation and estimation error of the classes. We propose the first model selection algorithm for offline RL that achieves minimax rate-optimal oracle inequalities up to logarithmic factors. The algorithm, ModBE, takes as input a collection of candidate model classes and a generic base offline RL algorithm. By successively eliminating model classes using a novel one-sided generalization test, ModBE returns a policy with regret scaling with the complexity of the minimally complete model class. In addition to its theoretical guarantees, it is conceptually simple and computationally efficient, amounting to solving a series of square loss regression problems and then comparing relative square loss between classes. We conclude with several numerical simulations showing it is capable of reliably selecting a good model class.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/01/2012

Oracle inequalities for computationally adaptive model selection

We analyze general model selection procedures using penalized empirical ...
research
07/06/2022

Model Selection in Reinforcement Learning with General Function Approximations

We consider model selection for classic Reinforcement Learning (RL) envi...
research
12/30/2017

Parameter-free online learning via model selection

We introduce an efficient algorithmic framework for model selection in o...
research
12/23/2021

Model Selection in Batch Policy Optimization

We study the problem of model selection in batch policy optimization: gi...
research
01/31/2023

Revisiting Bellman Errors for Offline Model Selection

Offline model selection (OMS), that is, choosing the best policy from a ...
research
07/13/2021

Model Selection with Near Optimal Rates for Reinforcement Learning with General Model Classes

We address the problem of model selection for the finite horizon episodi...
research
07/23/2021

Model Selection for Offline Reinforcement Learning: Practical Considerations for Healthcare Settings

Reinforcement learning (RL) can be used to learn treatment policies and ...

Please sign up or login with your details

Forgot password? Click here to reset