Online Model Selection: a Rested Bandit Formulation

12/07/2020
by   Leonardo Cella, et al.
0

Motivated by a natural problem in online model selection with bandit information, we introduce and analyze a best arm identification problem in the rested bandit setting, wherein arm expected losses decrease with the number of times the arm has been played. The shape of the expected loss functions is similar across arms, and is assumed to be available up to unknown parameters that have to be learned on the fly. We define a novel notion of regret for this problem, where we compare to the policy that always plays the arm having the smallest expected loss at the end of the game. We analyze an arm elimination algorithm whose regret vanishes as the time horizon increases. The actual rate of convergence depends in a detailed way on the postulated functional form of the expected losses. Unlike known model selection efforts in the recent bandit literature, our algorithm exploits the specific structure of the problem to learn the unknown parameters of the expected loss function so as to identify the best arm as quickly as possible. We complement our analysis with a lower bound, indicating strengths and limitations of the proposed solution.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/27/2023

Adversarial Sleeping Bandit Problems with Multiple Plays: Algorithm and Ranking Application

This paper presents an efficient algorithm to solve the sleeping bandit ...
research
07/12/2021

Continuous Time Bandits With Sampling Costs

We consider a continuous-time multi-arm bandit problem (CTMAB), where th...
research
07/09/2008

Algorithm Selection as a Bandit Problem with Unbounded Losses

Algorithm selection is typically based on models of algorithm performanc...
research
05/23/2020

A Novel Confidence-Based Algorithm for Structured Bandits

We study finite-armed stochastic bandits where the rewards of each arm m...
research
09/28/2021

The Fragility of Optimized Bandit Algorithms

Much of the literature on optimal design of bandit algorithms is based o...
research
05/10/2014

Functional Bandits

We introduce the functional bandit problem, where the objective is to fi...
research
07/26/2022

Neural Design for Genetic Perturbation Experiments

The problem of how to genetically modify cells in order to maximize a ce...

Please sign up or login with your details

Forgot password? Click here to reset