DeepAI
Log In Sign Up

Online Model Selection for Reinforcement Learning with Function Approximation

11/19/2020
by   Jonathan N. Lee, et al.
0

Deep reinforcement learning has achieved impressive successes yet often requires a very large amount of interaction data. This result is perhaps unsurprising, as using complicated function approximation often requires more data to fit, and early theoretical results on linear Markov decision processes provide regret bounds that scale with the dimension of the linear approximation. Ideally, we would like to automatically identify the minimal dimension of the approximation that is sufficient to encode an optimal policy. Towards this end, we consider the problem of model selection in RL with function approximation, given a set of candidate RL algorithms with known regret guarantees. The learner's goal is to adapt to the complexity of the optimal algorithm without knowing it a priori. We present a meta-algorithm that successively rejects increasingly complex models using a simple statistical test. Given at least one candidate that satisfies realizability, we prove the meta-algorithm adapts to the optimal complexity with Õ(L^5/6 T^2/3) regret compared to the optimal candidate's Õ(√(T)) regret, where T is the number of episodes and L is the number of algorithms. The dimension and horizon dependencies remain optimal with respect to the best candidate, and our meta-algorithmic approach is flexible to incorporate multiple candidate algorithms and models. Finally, we show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds that depend on the gaps between the maximal values attainable by the candidates.

READ FULL TEXT

page 1

page 2

page 3

page 4

07/02/2021

Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning

We provide improved gap-dependent regret bounds for reinforcement learni...
01/06/2021

Provably Efficient Reinforcement Learning with Linear Function Approximation Under Adaptivity Constraints

We study reinforcement learning (RL) with linear function approximation ...
06/29/2022

Best of Both Worlds Model Selection

We study the problem of model selection in bandit scenarios in the prese...
12/01/2021

Robust Online Selection with Uncertain Offer Acceptance

Online advertising has motivated interest in online selection problems. ...
06/09/2020

Regret Balancing for Bandit and RL Model Selection

We consider model selection in stochastic bandit and reinforcement learn...
10/29/2021

Adaptive Discretization in Online Reinforcement Learning

Discretization based approaches to solving online reinforcement learning...
07/13/2021

Model Selection with Near Optimal Rates for Reinforcement Learning with General Model Classes

We address the problem of model selection for the finite horizon episodi...