Continuous Mean-Covariance Bandits

02/24/2021
by   Yihan Du, et al.
0

Existing risk-aware multi-armed bandit models typically focus on risk measures of individual options such as variance. As a result, they cannot be directly applied to important real-world online decision making problems with correlated options. In this paper, we propose a novel Continuous Mean-Covariance Bandit (CMCB) model to explicitly take into account option correlation. Specifically, in CMCB, there is a learner who sequentially chooses weight vectors on given options and observes random feedback according to the decisions. The agent's objective is to achieve the best trade-off between reward and risk, measured with option covariance. To capture important reward observation scenarios in practice, we consider three feedback settings, i.e., full-information, semi-bandit and full-bandit feedback. We propose novel algorithms with the optimal regrets (within logarithmic factors), and provide matching lower bounds to validate their optimalities. Our experimental results also demonstrate the superiority of the proposed algorithms. To the best of our knowledge, this is the first work that considers option correlation in risk-aware bandits and explicitly quantifies how arbitrary covariance structures impact the learning performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/02/2021

Risk-Aware Algorithms for Combinatorial Semi-Bandits

In this paper, we study the stochastic combinatorial multi-armed bandit ...
research
09/15/2022

Risk-aware linear bandits with convex loss

In decision-making problems such as the multi-armed bandit, an agent lea...
research
02/01/2020

Thompson Sampling Algorithms for Mean-Variance Bandits

The multi-armed bandit (MAB) problem is a classical learning task that e...
research
02/22/2018

Collaboratively Learning the Best Option, Using Bounded Memory

We consider multi-armed bandit problems in social groups wherein each in...
research
05/12/2022

A Survey of Risk-Aware Multi-Armed Bandits

In several applications such as clinical trials and financial portfolio ...
research
09/14/2018

Dueling Bandits with Qualitative Feedback

We formulate and study a novel multi-armed bandit problem called the qua...
research
08/25/2021

A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits

This paper unifies the design and simplifies the analysis of risk-averse...

Please sign up or login with your details

Forgot password? Click here to reset