Model Selection with Near Optimal Rates for Reinforcement Learning with General Model Classes

07/13/2021
by   Avishek Ghosh, et al.
0

We address the problem of model selection for the finite horizon episodic Reinforcement Learning (RL) problem where the transition kernel P^* belongs to a family of models 𝒫^* with finite metric entropy. In the model selection framework, instead of 𝒫^*, we are given M nested families of transition kernels _1 ⊂_2 ⊂…⊂_M. We propose and analyze a novel algorithm, namely Adaptive Reinforcement Learning (General) () that adapts to the smallest such family where the true transition kernel P^* lies. uses the Upper Confidence Reinforcement Learning () algorithm with value targeted regression as a blackbox and puts a model selection module at the beginning of each epoch. Under a mild separability assumption on the model classes, we show that obtains a regret of 𝒪(d_ℰ^*H^2+√(d_ℰ^* 𝕄^* H^2 T)), with high probability, where H is the horizon length, T is the total number of steps, d_ℰ^* is the Eluder dimension and 𝕄^* is the metric entropy corresponding to 𝒫^*. Note that this regret scaling matches that of an oracle that knows 𝒫^* in advance. We show that the cost of model selection for is an additive term in the regret having a weak dependence on T. Subsequently, we remove the separability assumption and consider the setup of linear mixture MDPs, where the transition kernel P^* has a linear function approximation. With this low rank structure, we propose novel adaptive algorithms for model selection, and obtain (order-wise) regret identical to that of an oracle with knowledge of the true model class.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/06/2022

Model Selection in Reinforcement Learning with General Function Approximations

We consider model selection for classic Reinforcement Learning (RL) envi...
research
06/09/2020

Regret Balancing for Bandit and RL Model Selection

We consider model selection in stochastic bandit and reinforcement learn...
research
12/28/2021

Exponential Family Model-Based Reinforcement Learning via Score Matching

We propose an optimistic model-based algorithm, dubbed SMRL, for finite-...
research
05/05/2019

Learning to Control in Metric Space with Optimal Regret

We study online reinforcement learning for finite-horizon deterministic ...
research
06/01/2020

Model-Based Reinforcement Learning with Value-Targeted Regression

This paper studies model-based reinforcement learning (RL) for regret mi...
research
04/06/2022

Fundamental limits to learning closed-form mathematical models from data

Given a finite and noisy dataset generated with a closed-form mathematic...
research
11/03/2022

Oracle Inequalities for Model Selection in Offline Reinforcement Learning

In offline reinforcement learning (RL), a learner leverages prior logged...

Please sign up or login with your details

Forgot password? Click here to reset