On Regret with Multiple Best Arms

06/26/2020
by   Yinglun Zhu, et al.
0

We study regret minimization problem with the existence of multiple best/near-optimal arms in the multi-armed bandit setting. We consider the case where the number of arms/actions is comparable or much larger than the time horizon, and make no assumptions about the structure of the bandit instance. Our goal is to design algorithms that can automatically adapt to the unknown hardness of the problem, i.e., the number of best arms. Our setting captures many modern applications of bandit algorithms where the action space is enormous and the information about the underlying instance/structure is unavailable. We first propose an adaptive algorithm that is agnostic to the hardness level and theoretically derive its regret bound. We then prove a lower bound for our problem setting, which indicates: (1) no algorithm can be optimal simultaneously over all hardness levels; and (2) our algorithm achieves an adaptive rate function that is Pareto optimal. With additional knowledge of the expected reward of the best arm, we propose another adaptive algorithm that is minimax optimal, up to polylog factors, over all hardness levels. Experimental results confirm our theoretical guarantees and show advantages of our algorithms over the previous state-of-the-art.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/18/2015

Simple regret for infinitely many armed bandits

We consider a stochastic bandit problem with infinitely many arms. In th...
research
05/05/2016

Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm

We study the K-armed dueling bandit problem, a variation of the standard...
research
02/24/2020

Fair Bandit Learning with Delayed Impact of Actions

Algorithmic fairness has been studied mostly in a static setting where t...
research
03/11/2018

Combinatorial Multi-Objective Multi-Armed Bandit Problem

In this paper, we introduce the COmbinatorial Multi-Objective Multi-Arme...
research
11/17/2016

Unimodal Thompson Sampling for Graph-Structured Arms

We study, to the best of our knowledge, the first Bayesian algorithm for...
research
02/15/2018

Bandit Learning with Positive Externalities

Many platforms are characterized by the fact that future user arrivals a...
research
05/12/2018

Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models

In this paper we consider the dynamic assortment selection problem under...

Please sign up or login with your details

Forgot password? Click here to reset