Trading Off Resource Budgets for Improved Regret Bounds

10/11/2022
by   Damon Falck, et al.
0

In this work we consider a variant of adversarial online learning where in each round one picks B out of N arms and incurs cost equal to the minimum of the costs of each arm chosen. We propose an algorithm called Follow the Perturbed Multiple Leaders (FPML) for this problem, which we show (by adapting the techniques of Kalai and Vempala [2005]) achieves expected regret 𝒪(T^1/B+1ln(N)^B/B+1) over time horizon T relative to the single best arm in hindsight. This introduces a trade-off between the budget B and the single-best-arm regret, and we proceed to investigate several applications of this trade-off. First, we observe that algorithms which use standard regret minimizers as subroutines can sometimes be adapted by replacing these subroutines with FPML, and we use this to generalize existing algorithms for Online Submodular Function Maximization [Streeter and Golovin, 2008] in both the full feedback and semi-bandit feedback settings. Next, we empirically evaluate our new algorithms on an online black-box hyperparameter optimization problem. Finally, we show how FPML can lead to new algorithms for Linear Programming which require stronger oracles at the benefit of fewer oracle calls.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/11/2023

: Fair Multi-Armed Bandits with Guaranteed Rewards per Arm

Classic no-regret online prediction algorithms, including variants of th...
research
07/07/2019

Thompson Sampling for Combinatorial Network Optimization in Unknown Environments

Influence maximization, item recommendation, adaptive routing and dynami...
research
05/27/2021

An Online Learning Approach to Optimizing Time-Varying Costs of AoI

We consider systems that require timely monitoring of sources over a com...
research
11/16/2022

Dueling Bandits: From Two-dueling to Multi-dueling

We study a general multi-dueling bandit problem, where an agent compares...
research
05/15/2022

Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback

Motivated by applications to online learning in sparse estimation and Ba...
research
08/12/2015

No Regret Bound for Extreme Bandits

Algorithms for hyperparameter optimization abound, all of which work wel...
research
10/09/2018

Bridging the gap between regret minimization and best arm identification, with application to A/B tests

State of the art online learning procedures focus either on selecting th...

Please sign up or login with your details

Forgot password? Click here to reset