Near-Optimal MNL Bandits Under Risk Criteria

09/26/2020
by   Guangyu Xi, et al.
0

We study MNL bandits, which is a variant of the traditional multi-armed bandit problem, under risk criteria. Unlike the ordinary expected revenue, risk criteria are more general goals widely used in industries and bussiness. We design algorithms for a broad class of risk criteria, including but not limited to the well-known conditional value-at-risk, Sharpe ratio and entropy risk, and prove that they suffer a near-optimal regret. As a complement, we also conduct experiments with both synthetic and real data to show the empirical performance of our proposed algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2018

A General Approach to Multi-Armed Bandits Under Risk Criteria

Different risk-related criteria have received recent interest in learnin...
research
04/12/2019

Distributed Bandit Learning: How Much Communication is Needed to Achieve (Near) Optimal Regret

We study the communication complexity of distributed multi-armed bandits...
research
08/15/2021

Batched Thompson Sampling for Multi-Armed Bandits

We study Thompson Sampling algorithms for stochastic multi-armed bandits...
research
12/14/2015

Fighting Bandits with a New Kind of Smoothness

We define a novel family of algorithms for the adversarial multi-armed b...
research
11/16/2020

Risk-Constrained Thompson Sampling for CVaR Bandits

The multi-armed bandit (MAB) problem is a ubiquitous decision-making pro...
research
09/09/2021

Extreme Bandits using Robust Statistics

We consider a multi-armed bandit problem motivated by situations where o...
research
08/25/2021

A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits

This paper unifies the design and simplifies the analysis of risk-averse...

Please sign up or login with your details

Forgot password? Click here to reset