Decision Variance in Online Learning

07/24/2018
by   Sattar Vakili, et al.
0

Online learning has classically focused on the expected behaviour of learning policies. Recently, risk-averse online learning has gained much attention. In this paper, a risk-averse multi-armed bandit problem where the performance of policies is measured based on the mean-variance of the rewards is studied. The variance of the rewards depends on the variance of the underlying processes as well as the variance of the player's decisions. The performance of two existing policies is analyzed and new fundamental limitations on risk-averse learning is established. In particular, it is shown that although an O( T) distribution-dependent regret in time T is achievable (similar to the risk-neutral setting), the worst-case (i.e. minimax) regret is lower bounded by Ω(T) (in contrast to the Ω(√(T)) lower bound in the risk-neutral setting). The lower bound results are even stronger in the sense that they are proven for the case of online learning with full feedback.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2015

On Regret-Optimal Learning in Decentralized Multi-player Multi-armed Bandits

We consider the problem of learning in single-player and multiplayer mul...
research
02/12/2018

Multi-Armed Bandits on Unit Interval Graphs

An online learning problem with side information on the similarity and d...
research
06/07/2022

A Simple and Optimal Policy Design with Safety against Heavy-tailed Risk for Multi-armed Bandits

We design new policies that ensure both worst-case optimality for expect...
research
08/18/2012

Online Learning with Predictable Sequences

We present methods for online linear optimization that take advantage of...
research
02/07/2023

Leveraging Demonstrations to Improve Online Learning: Quality Matters

We investigate the extent to which offline demonstration data can improv...
research
03/13/2019

Online Budgeted Learning for Classifier Induction

In real-world machine learning applications, there is a cost associated ...
research
06/15/2023

Optimal Best-Arm Identification in Bandits with Access to Offline Data

Learning paradigms based purely on offline data as well as those based s...

Please sign up or login with your details

Forgot password? Click here to reset