Model Selection in Reinforcement Learning with General Function Approximations

07/06/2022
by   Avishek Ghosh, et al.
8

We consider model selection for classic Reinforcement Learning (RL) environments – Multi Armed Bandits (MABs) and Markov Decision Processes (MDPs) – under general function approximations. In the model selection framework, we do not know the function classes, denoted by ℱ and ℳ, where the true models – reward generating function for MABs and and transition kernel for MDPs – lie, respectively. Instead, we are given M nested function (hypothesis) classes such that true models are contained in at-least one such class. In this paper, we propose and analyze efficient model selection algorithms for MABs and MDPs, that adapt to the smallest function class (among the nested M classes) containing the true underlying model. Under a separability assumption on the nested hypothesis classes, we show that the cumulative regret of our adaptive algorithms match to that of an oracle which knows the correct function classes (i.e., and ) a priori. Furthermore, for both the settings, we show that the cost of model selection is an additive term in the regret having weak (logarithmic) dependence on the learning horizon T.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2021

Model Selection with Near Optimal Rates for Reinforcement Learning with General Model Classes

We address the problem of model selection for the finite horizon episodi...
research
10/15/2019

Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes

Model-free reinforcement learning is known to be memory and computation ...
research
11/03/2022

Oracle Inequalities for Model Selection in Offline Reinforcement Learning

In offline reinforcement learning (RL), a learner leverages prior logged...
research
10/07/2021

A Model Selection Approach for Corruption Robust Reinforcement Learning

We develop a model selection approach to tackle reinforcement learning w...
research
05/23/2022

Computationally Efficient Horizon-Free Reinforcement Learning for Linear Mixture MDPs

Recent studies have shown that episodic reinforcement learning (RL) is n...
research
02/12/2021

Pareto Optimal Model Selection in Linear Bandits

We study a model selection problem in the linear bandit setting, where t...
research
01/29/2021

Sequential prediction under log-loss and misspecification

We consider the question of sequential prediction under the log-loss in ...

Please sign up or login with your details

Forgot password? Click here to reset