Robust Bandit Learning with Imperfect Context

02/09/2021
by   Jianyi Yang, et al.
7

A standard assumption in contextual multi-arm bandit is that the true context is perfectly known before arm selection. Nonetheless, in many practical applications (e.g., cloud resource management), prior to arm selection, the context information can only be acquired by prediction subject to errors or adversarial modification. In this paper, we study a contextual bandit setting in which only imperfect context is available for arm selection while the true context is revealed at the end of each round. We propose two robust arm selection algorithms: MaxMinUCB (Maximize Minimum UCB) which maximizes the worst-case reward, and MinWD (Minimize Worst-case Degradation) which minimizes the worst-case regret. Importantly, we analyze the robustness of MaxMinUCB and MinWD by deriving both regret and reward bounds compared to an oracle that knows the true context. Our results show that as time goes on, MaxMinUCB and MinWD both perform as asymptotically well as their optimal counterparts that know the reward function. Finally, we apply MaxMinUCB and MinWD to online edge datacenter selection, and run synthetic simulations to validate our theoretical analysis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/10/2022

Worst-case Performance of Greedy Policies in Bandits with Imperfect Context Observations

Contextual bandits are canonical models for sequential decision-making u...
research
06/07/2020

Thompson Sampling for Multinomial Logit Contextual Bandits

We consider a dynamic assortment selection problem where the goal is to ...
research
05/31/2023

Combinatorial Neural Bandits

We consider a contextual combinatorial bandit problem where in each roun...
research
05/24/2016

Refined Lower Bounds for Adversarial Bandits

We provide new lower bounds on the regret that must be suffered by adver...
research
11/25/2021

Bandit problems with fidelity rewards

The fidelity bandits problem is a variant of the K-armed bandit problem ...
research
12/12/2018

On Distributed Multi-player Multiarmed Bandit Problems in Abruptly Changing Environment

We study the multi-player stochastic multiarmed bandit (MAB) problem in ...
research
10/09/2019

Robust Monopoly Regulation

We study the regulation of a monopolistic firm using a robust-design app...

Please sign up or login with your details

Forgot password? Click here to reset