Model Selection with Near Optimal Rates for Reinforcement Learning with General Model Classes

by   Avishek Ghosh, et al.

We address the problem of model selection for the finite horizon episodic Reinforcement Learning (RL) problem where the transition kernel P^* belongs to a family of models 𝒫^* with finite metric entropy. In the model selection framework, instead of 𝒫^*, we are given M nested families of transition kernels _1 ⊂_2 ⊂…⊂_M. We propose and analyze a novel algorithm, namely Adaptive Reinforcement Learning (General) () that adapts to the smallest such family where the true transition kernel P^* lies. uses the Upper Confidence Reinforcement Learning () algorithm with value targeted regression as a blackbox and puts a model selection module at the beginning of each epoch. Under a mild separability assumption on the model classes, we show that obtains a regret of 𝒪(d_ℰ^*H^2+√(d_ℰ^* 𝕄^* H^2 T)), with high probability, where H is the horizon length, T is the total number of steps, d_ℰ^* is the Eluder dimension and 𝕄^* is the metric entropy corresponding to 𝒫^*. Note that this regret scaling matches that of an oracle that knows 𝒫^* in advance. We show that the cost of model selection for is an additive term in the regret having a weak dependence on T. Subsequently, we remove the separability assumption and consider the setup of linear mixture MDPs, where the transition kernel P^* has a linear function approximation. With this low rank structure, we propose novel adaptive algorithms for model selection, and obtain (order-wise) regret identical to that of an oracle with knowledge of the true model class.


page 1

page 2

page 3

page 4


Model Selection in Reinforcement Learning with General Function Approximations

We consider model selection for classic Reinforcement Learning (RL) envi...

Regret Balancing for Bandit and RL Model Selection

We consider model selection in stochastic bandit and reinforcement learn...

Exponential Family Model-Based Reinforcement Learning via Score Matching

We propose an optimistic model-based algorithm, dubbed SMRL, for finite-...

Learning to Control in Metric Space with Optimal Regret

We study online reinforcement learning for finite-horizon deterministic ...

Model-Based Reinforcement Learning with Value-Targeted Regression

This paper studies model-based reinforcement learning (RL) for regret mi...

Fundamental limits to learning closed-form mathematical models from data

Given a finite and noisy dataset generated with a closed-form mathematic...

Oracle Inequalities for Model Selection in Offline Reinforcement Learning

In offline reinforcement learning (RL), a learner leverages prior logged...

Please sign up or login with your details

Forgot password? Click here to reset