The Countable-armed Bandit with Vanishing Arms

10/23/2021
by   Anand Kalvit, et al.
0

We consider a bandit problem with countably many arms, partitioned into finitely many "types," each characterized by a unique mean reward. A "non-stationary" distribution governs the relative abundance of each arm-type in the population of arms, aka the "arm-reservoir." This non-stationarity is attributable to a probabilistic leakage of "optimal" arms from the reservoir over time, which we refer to as the "vanishing arms" phenomenon; this induces a time-varying (potentially "endogenous," policy-dependent) distribution over the reservoir. The objective is minimization of the expected cumulative regret. We characterize necessary and sufficient conditions for achievability of sub-linear regret in terms of a critical vanishing rate of optimal arms. We also discuss two reservoir distribution-oblivious algorithms that are long-run-average optimal whenever sub-linear regret is statistically achievable. Numerical experiments highlight a distinctive characteristic of this problem related to ex ante knowledge of the "gap" parameter (the difference between the top two mean rewards): in contrast to the stationary bandit formulation, regret in our setting may suffer substantial inflation under adaptive exploration-based (gap-oblivious) algorithms such as UCB vis-`a-vis their non-adaptive forced exploration-based (gap-aware) counterparts like ETC.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2021

From Finite to Countable-Armed Bandits

We consider a stochastic bandit problem with countably many arms that be...
research
05/13/2014

Optimal Exploration-Exploitation in a Multi-Armed-Bandit Problem with Non-stationary Rewards

In a multi-armed bandit (MAB) problem a gambler needs to choose at each ...
research
02/29/2020

Contextual-Bandit Based Personalized Recommendation with Time-Varying User Interests

A contextual bandit problem is studied in a highly non-stationary enviro...
research
05/16/2020

Learning and Optimization with Seasonal Patterns

Seasonality is a common form of non-stationary patterns in the business ...
research
01/18/2023

Complexity Analysis of a Countable-armed Bandit Problem

We consider a stochastic multi-armed bandit (MAB) problem motivated by “...
research
05/20/2022

Actively Tracking the Optimal Arm in Non-Stationary Environments with Mandatory Probing

We study a novel multi-armed bandit (MAB) setting which mandates the age...
research
02/15/2018

Bandit Learning with Positive Externalities

Many platforms are characterized by the fact that future user arrivals a...

Please sign up or login with your details

Forgot password? Click here to reset