Optimal Model Selection in Contextual Bandits with Many Classes via Offline Oracles

We study the problem of model selection for contextual bandits, in which the algorithm must balance the bias-variance trade-off for model estimation while also balancing the exploration-exploitation trade-off. In this paper, we propose the first reduction of model selection in contextual bandits to offline model selection oracles, allowing for flexible general purpose algorithms with computational requirements no worse than those for model selection for regression. Our main result is a new model selection guarantee for stochastic contextual bandits. When one of the classes in our set is realizable, up to a logarithmic dependency on the number of classes, our algorithm attains optimal realizability-based regret bounds for that class under one of two conditions: if the time-horizon is large enough, or if an assumption that helps with detecting misspecification holds. Hence our algorithm adapts to the complexity of this unknown class. Even when this realizable class is known, we prove improved regret guarantees in early rounds by relying on simpler model classes for those rounds and hence further establish the importance of model selection in contextual bandits.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/03/2019

Model selection for contextual bandits

We introduce the problem of model selection for contextual bandits, wher...
research
07/24/2023

Anytime Model Selection in Linear Bandits

Model selection in the context of bandit optimization is a challenging p...
research
10/25/2021

The Pareto Frontier of model selection for general Contextual Bandits

Recent progress in model selection raises the question of the fundamenta...
research
02/16/2023

Infinite Action Contextual Bandits with Reusable Data Exhaust

For infinite action contextual bandits, smoothed regret and reduction to...
research
11/08/2021

Universal and data-adaptive algorithms for model selection in linear contextual bandits

Model selection in contextual bandits is an important complementary prob...
research
06/09/2021

Parameter and Feature Selection in Stochastic Linear Bandits

We study two model selection settings in stochastic linear bandits (LB)....
research
12/24/2020

Regret Bound Balancing and Elimination for Model Selection in Bandits and RL

We propose a simple model selection approach for algorithms in stochasti...

Please sign up or login with your details

Forgot password? Click here to reset