Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

06/24/2022
by   Yifan Lin, et al.
0

In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion. At each round, contexts are revealed for each arm, and the decision maker chooses one arm to pull and receives the corresponding reward. In particular, we consider mean-variance as the risk criterion, and the best arm is the one with the largest mean-variance reward. We apply the Thompson Sampling algorithm for the disjoint model, and provide a comprehensive regret analysis for a variant of the proposed algorithm. For T rounds, K actions, and d-dimensional feature vectors, we prove a regret bound of O((1+ρ+1/ρ) dln T lnK/δ√(d K T^1+2ϵlnK/δ1/ϵ)) that holds with probability 1-δ under the mean-variance criterion with risk tolerance ρ, for any 0<ϵ<1/2, 0<δ<1. The empirical performance of our proposed algorithms is demonstrated via a portfolio selection problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/05/2014

Generalized Risk-Aversion in Stochastic Multi-Armed Bandits

We consider the problem of minimizing the regret in stochastic multi-arm...
research
08/10/2020

Lenient Regret for Multi-Armed Bandits

We consider the Multi-Armed Bandit (MAB) problem, where the agent sequen...
research
09/15/2022

Risk-aware linear bandits with convex loss

In decision-making problems such as the multi-armed bandit, an agent lea...
research
05/10/2022

Risk Aversion In Learning Algorithms and an Application To Recommendation Systems

Consider a bandit learning environment. We demonstrate that popular lear...
research
11/07/2016

Reinforcement-based Simultaneous Algorithm and its Hyperparameters Selection

Many algorithms for data analysis exist, especially for classification p...
research
10/02/2020

Neural Thompson Sampling

Thompson Sampling (TS) is one of the most effective algorithms for solvi...
research
04/30/2019

Risk-Averse Explore-Then-Commit Algorithms for Finite-Time Bandits

In this paper, we study multi-armed bandit problems in explore-then-comm...

Please sign up or login with your details

Forgot password? Click here to reset