A One-Size-Fits-All Solution to Conservative Bandit Problems

12/14/2020
by   Yihan Du, et al.
0

In this paper, we study a family of conservative bandit problems (CBPs) with sample-path reward constraints, i.e., the learner's reward performance must be at least as well as a given baseline at any time. We propose a One-Size-Fits-All solution to CBPs and present its applications to three encompassed problems, i.e. conservative multi-armed bandits (CMAB), conservative linear bandits (CLB) and conservative contextual combinatorial bandits (CCCB). Different from previous works which consider high probability constraints on the expected reward, we focus on a sample-path constraint on the actually received reward, and achieve better theoretical guarantees (T-independent additive regrets instead of T-dependent) and empirical performance. Furthermore, we extend the results and consider a novel conservative mean-variance bandit problem (MV-CBP), which measures the learning performance with both the expected reward and variability. For this extended problem, we provide a novel algorithm with O(1/T) normalized additive regrets (T-independent in the cumulative form) and validate this result through empirical evaluation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/26/2019

Contextual Combinatorial Conservative Bandits

The problem of multi-armed bandits (MAB) asks to make sequential decisio...
research
02/08/2020

Improved Algorithms for Conservative Exploration in Bandits

In many fields such as digital marketing, healthcare, finance, and robot...
research
04/17/2021

Conservative Contextual Combinatorial Cascading Bandit

Conservative mechanism is a desirable property in decision-making proble...
research
11/17/2017

Calibration of Distributionally Robust Empirical Optimization Models

In this paper, we study the out-of-sample properties of robust empirical...
research
09/30/2020

Stage-wise Conservative Linear Bandits

We study stage-wise conservative linear stochastic bandits: an instance ...
research
02/13/2016

Conservative Bandits

We study a novel multi-armed bandit problem that models the challenge fa...
research
07/01/2021

A Map of Bandits for E-commerce

The rich body of Bandit literature not only offers a diverse toolbox of ...

Please sign up or login with your details

Forgot password? Click here to reset