Conditionally Risk-Averse Contextual Bandits

10/24/2022
by   Mónika Farsang, et al.
0

We desire to apply contextual bandits to scenarios where average-case statistical guarantees are inadequate. Happily, we discover the composition of reduction to online regression and expectile loss is analytically tractable, computationally convenient, and empirically effective. The result is the first risk-averse contextual bandit algorithm with an online regret guarantee. We state our precise regret guarantee and conduct experiments from diverse scenarios in dynamic pricing, inventory management, and self-tuning software; including results from a production exascale cloud data processing system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/16/2023

Infinite Action Contextual Bandits with Reusable Data Exhaust

For infinite action contextual bandits, smoothed regret and reduction to...
research
02/26/2021

Adapting to misspecification in contextual bandits with offline regression oracles

Computationally efficient contextual bandits are often based on estimati...
research
02/06/2016

BISTRO: An Efficient Relaxation-Based Method for Contextual Bandits

We present efficient algorithms for the problem of contextual bandits wi...
research
04/13/2020

Power-Constrained Bandits

Contextual bandits often provide simple and effective personalization in...
research
02/23/2020

Survey Bandits with Regret Guarantees

We consider a variant of the contextual bandit problem. In standard cont...
research
01/28/2019

Target Tracking for Contextual Bandits: Application to Demand Side Management

We propose a contextual-bandit approach for demand side management by of...
research
06/14/2019

Online Allocation and Pricing: Constant Regret via Bellman Inequalities

We develop a framework for designing tractable heuristics for Markov Dec...

Please sign up or login with your details

Forgot password? Click here to reset