Model Selection with Near Optimal Rates for Reinforcement Learning with General Model Classes

07/13/2021 ∙ by Avishek Ghosh, et al. ∙ 0

We address the problem of model selection for the finite horizon episodic Reinforcement Learning (RL) problem where the transition kernel P^* belongs to a family of models 𝒫^* with finite metric entropy. In the model selection framework, instead of 𝒫^*, we are given M nested families of transition kernels _1 ⊂_2 ⊂…⊂_M. We propose and analyze a novel algorithm, namely Adaptive Reinforcement Learning (General) () that adapts to the smallest such family where the true transition kernel P^* lies. uses the Upper Confidence Reinforcement Learning () algorithm with value targeted regression as a blackbox and puts a model selection module at the beginning of each epoch. Under a mild separability assumption on the model classes, we show that obtains a regret of 𝒪(d_ℰ^*H^2+√(d_ℰ^* 𝕄^* H^2 T)), with high probability, where H is the horizon length, T is the total number of steps, d_ℰ^* is the Eluder dimension and 𝕄^* is the metric entropy corresponding to 𝒫^*. Note that this regret scaling matches that of an oracle that knows 𝒫^* in advance. We show that the cost of model selection for is an additive term in the regret having a weak dependence on T. Subsequently, we remove the separability assumption and consider the setup of linear mixture MDPs, where the transition kernel P^* has a linear function approximation. With this low rank structure, we propose novel adaptive algorithms for model selection, and obtain (order-wise) regret identical to that of an oracle with knowledge of the true model class.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.