Generalized Risk-Aversion in Stochastic Multi-Armed Bandits

05/05/2014
by   Alexander Zimin, et al.
0

We consider the problem of minimizing the regret in stochastic multi-armed bandit, when the measure of goodness of an arm is not the mean return, but some general function of the mean and the variance.We characterize the conditions under which learning is possible and present examples for which no natural algorithm can achieve sublinear regret.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/11/2019

Batched Multi-Armed Bandits with Optimal Regret

We present a simple and efficient algorithm for the batched stochastic m...
research
06/24/2022

Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

In this paper we consider the contextual multi-armed bandit problem for ...
research
01/05/2018

Nonparametric Stochastic Contextual Bandits

We analyze the K-armed bandit problem where the reward for each arm is a...
research
02/12/2018

Multi-Armed Bandits on Unit Interval Graphs

An online learning problem with side information on the similarity and d...
research
05/09/2021

Stochastic Multi-Armed Bandits with Control Variates

This paper studies a new variant of the stochastic multi-armed bandits p...
research
06/04/2018

A General Approach to Multi-Armed Bandits Under Risk Criteria

Different risk-related criteria have received recent interest in learnin...
research
05/21/2017

Instrument-Armed Bandits

We extend the classic multi-armed bandit (MAB) model to the setting of n...

Please sign up or login with your details

Forgot password? Click here to reset