DeepAI AI Chat
Log In Sign Up

Near-Optimal MNL Bandits Under Risk Criteria

by   Guangyu Xi, et al.

We study MNL bandits, which is a variant of the traditional multi-armed bandit problem, under risk criteria. Unlike the ordinary expected revenue, risk criteria are more general goals widely used in industries and bussiness. We design algorithms for a broad class of risk criteria, including but not limited to the well-known conditional value-at-risk, Sharpe ratio and entropy risk, and prove that they suffer a near-optimal regret. As a complement, we also conduct experiments with both synthetic and real data to show the empirical performance of our proposed algorithms.


page 1

page 2

page 3

page 4


A General Approach to Multi-Armed Bandits Under Risk Criteria

Different risk-related criteria have received recent interest in learnin...

Distributed Bandit Learning: How Much Communication is Needed to Achieve (Near) Optimal Regret

We study the communication complexity of distributed multi-armed bandits...

Batched Thompson Sampling for Multi-Armed Bandits

We study Thompson Sampling algorithms for stochastic multi-armed bandits...

Fighting Bandits with a New Kind of Smoothness

We define a novel family of algorithms for the adversarial multi-armed b...

Risk-Constrained Thompson Sampling for CVaR Bandits

The multi-armed bandit (MAB) problem is a ubiquitous decision-making pro...

A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits

This paper unifies the design and simplifies the analysis of risk-averse...

Whittle Index for A Class of Restless Bandits with Imperfect Observations

We consider a class of restless bandit problems that finds a broad appli...