Adaptive Exploration-Exploitation Tradeoff for Opportunistic Bandits

09/12/2017
by   Huasen Wu, et al.
0

In this paper, we propose and study opportunistic bandits - a new variant of bandits where the regret of pulling a suboptimal arm varies under different environmental conditions, such as network load or produce price. When the load/price is low, so is the cost/regret of pulling a suboptimal arm (e.g., trying a suboptimal network configuration). Therefore, intuitively, we could explore more when the load is low and exploit more when the load is high. Inspired by this intuition, we propose an Adaptive Upper-Confidence-Bound (AdaUCB) algorithm to adaptively balance the exploration-exploitation tradeoff for opportunistic bandits. We prove that AdaUCB achieves O( T) regret with a smaller coefficient than the traditional UCB algorithm. Furthermore, AdaUCB achieves O(1) regret when the exploration cost is zero if the load level is below a certain threshold. Last, based on both synthetic data and real-world traces, experimental results show that AdaUCB significantly outperforms other bandit algorithms, such as UCB and TS (Thompson Sampling), under large load fluctuations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2019

AdaLinUCB: Opportunistic Learning for Contextual Bandits

In this paper, we propose and study opportunistic contextual bandits - a...
research
10/07/2021

EE-Net: Exploitation-Exploration Neural Networks in Contextual Bandits

Contextual multi-armed bandits have been studied for decades and adapted...
research
10/24/2022

Opportunistic Episodic Reinforcement Learning

In this paper, we propose and study opportunistic reinforcement learning...
research
04/02/2020

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Contextual multi-armed bandit (MAB) achieves cutting-edge performance on...
research
10/31/2018

On Exploration, Exploitation and Learning in Adaptive Importance Sampling

We study adaptive importance sampling (AIS) as an online learning proble...
research
07/02/2019

Bandit Learning Through Biased Maximum Likelihood Estimation

We propose BMLE, a new family of bandit algorithms, that are formulated ...
research
05/03/2022

Norm-Agnostic Linear Bandits

Linear bandits have a wide variety of applications including recommendat...

Please sign up or login with your details

Forgot password? Click here to reset