Risk-Aware Algorithms for Adversarial Contextual Bandits

10/17/2016
by   Wen Sun, et al.
0

In this work we consider adversarial contextual bandits with risk constraints. At each round, nature prepares a context, a cost for each arm, and additionally a risk for each arm. The learner leverages the context to pull an arm and then receives the corresponding cost and risk associated with the pulled arm. In addition to minimizing the cumulative cost, the learner also needs to satisfy long-term risk constraints -- the average of the cumulative risk from all pulled arms should not be larger than a pre-defined threshold. To address this problem, we first study the full information setting where in each round the learner receives an adversarial convex loss and a convex constraint. We develop a meta algorithm leveraging online mirror descent for the full information setting and extend it to contextual bandit with risk constraints setting using expert advice. Our algorithms can achieve near-optimal regret in terms of minimizing the total cost, while successfully maintaining a sublinear growth of cumulative risk constraint violation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/18/2015

Simple regret for infinitely many armed bandits

We consider a stochastic bandit problem with infinitely many arms. In th...
research
10/23/2020

Online Algorithm for Unsupervised Sequential Selection with Contextual Information

In this paper, we study Contextual Unsupervised Sequential Selection (US...
research
05/25/2023

Small Total-Cost Constraints in Contextual Bandits with Knapsacks, with Application to Fairness

We consider contextual bandit problems with knapsacks [CBwK], a problem ...
research
05/26/2022

Contextual Pandora's Box

Pandora's Box is a fundamental stochastic optimization problem, where th...
research
07/05/2023

Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization

Simple regret minimization is a critical problem in learning optimal tre...
research
10/11/2019

Privacy-Preserving Contextual Bandits

Contextual bandits are online learners that, given an input, select an a...
research
07/26/2023

Corruption-Robust Lipschitz Contextual Search

I study the problem of learning a Lipschitz function with corrupted bina...

Please sign up or login with your details

Forgot password? Click here to reset