Conditionally Risk-Averse Contextual Bandits

10/24/2022

∙

We desire to apply contextual bandits to scenarios where average-case statistical guarantees are inadequate. Happily, we discover the composition of reduction to online regression and expectile loss is analytically tractable, computationally convenient, and empirically effective. The result is the first risk-averse contextual bandit algorithm with an online regret guarantee. We state our precise regret guarantee and conduct experiments from diverse scenarios in dynamic pricing, inventory management, and self-tuning software; including results from a production exascale cloud data processing system.

READ FULL TEXT

Conditionally Risk-Averse Contextual Bandits

Sign in with Google

Consider DeepAI Pro