Regret Bound Balancing and Elimination for Model Selection in Bandits and RL

12/24/2020
by   Aldo Pacchiano, et al.
0

We propose a simple model selection approach for algorithms in stochastic bandit and reinforcement learning problems. As opposed to prior work that (implicitly) assumes knowledge of the optimal regret, we only require that each base algorithm comes with a candidate regret bound that may or may not hold during all rounds. In each round, our approach plays a base algorithm to keep the candidate regret bounds of all remaining base algorithms balanced, and eliminates algorithms that violate their candidate bound. We prove that the total regret of this approach is bounded by the best valid candidate regret bound times a multiplicative factor. This factor is reasonably small in several applications, including linear bandits and MDPs with nested function classes, linear bandits with unknown misspecification, and LinUCB applied to linear bandits with different confidence parameters. We further show that, under a suitable gap-assumption, this factor only scales with the number of base algorithms and not their complexity when the number of rounds is large enough. Finally, unlike recent efforts in model selection for linear stochastic bandits, our approach is versatile enough to also cover cases where the context information is generated by an adversarial environment, rather than a stochastic one.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/29/2022

Best of Both Worlds Model Selection

We study the problem of model selection in bandit scenarios in the prese...
research
06/09/2020

Regret Balancing for Bandit and RL Model Selection

We consider model selection in stochastic bandit and reinforcement learn...
research
06/09/2021

Parameter and Feature Selection in Stochastic Linear Bandits

We study two model selection settings in stochastic linear bandits (LB)....
research
06/05/2023

Data-Driven Regret Balancing for Online Model Selection in Bandits

We consider model selection for sequential decision making in stochastic...
research
06/11/2021

Optimal Model Selection in Contextual Bandits with Many Classes via Offline Oracles

We study the problem of model selection for contextual bandits, in which...
research
03/03/2020

Model Selection in Contextual Stochastic Bandit Problems

We study model selection in stochastic bandit problems. Our approach rel...
research
06/05/2020

Rate-adaptive model selection over a collection of black-box contextual bandit algorithms

We consider the model selection task in the stochastic contextual bandit...

Please sign up or login with your details

Forgot password? Click here to reset