Regret Balancing for Bandit and RL Model Selection

06/09/2020
by   Yasin Abbasi-Yadkori, et al.
9

We consider model selection in stochastic bandit and reinforcement learning problems. Given a set of base learning algorithms, an effective model selection strategy adapts to the best learning algorithm in an online fashion. We show that by estimating the regret of each algorithm and playing the algorithms such that all empirical regrets are ensured to be of the same order, the overall regret balancing strategy achieves a regret that is close to the regret of the optimal base algorithm. Our strategy requires an upper bound on the optimal base regret as input, and the performance of the strategy depends on the tightness of the upper bound. We show that having this prior knowledge is necessary in order to achieve a near-optimal regret. Further, we show that any near-optimal model selection strategy implicitly performs a form of regret balancing.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/24/2020

Regret Bound Balancing and Elimination for Model Selection in Bandits and RL

We propose a simple model selection approach for algorithms in stochasti...
research
06/05/2023

Data-Driven Regret Balancing for Online Model Selection in Bandits

We consider model selection for sequential decision making in stochastic...
research
07/13/2021

Model Selection with Near Optimal Rates for Reinforcement Learning with General Model Classes

We address the problem of model selection for the finite horizon episodi...
research
03/17/2018

Multi-device, Multi-tenant Model Selection with GP-EI

Bayesian optimization is the core technique behind the emergence of Auto...
research
03/03/2020

Model Selection in Contextual Stochastic Bandit Problems

We study model selection in stochastic bandit problems. Our approach rel...
research
07/24/2023

Anytime Model Selection in Linear Bandits

Model selection in the context of bandit optimization is a challenging p...
research
06/05/2020

Rate-adaptive model selection over a collection of black-box contextual bandit algorithms

We consider the model selection task in the stochastic contextual bandit...

Please sign up or login with your details

Forgot password? Click here to reset