DeepAI AI Chat
Log In Sign Up

Continuous Mean-Covariance Bandits

by   Yihan Du, et al.

Existing risk-aware multi-armed bandit models typically focus on risk measures of individual options such as variance. As a result, they cannot be directly applied to important real-world online decision making problems with correlated options. In this paper, we propose a novel Continuous Mean-Covariance Bandit (CMCB) model to explicitly take into account option correlation. Specifically, in CMCB, there is a learner who sequentially chooses weight vectors on given options and observes random feedback according to the decisions. The agent's objective is to achieve the best trade-off between reward and risk, measured with option covariance. To capture important reward observation scenarios in practice, we consider three feedback settings, i.e., full-information, semi-bandit and full-bandit feedback. We propose novel algorithms with the optimal regrets (within logarithmic factors), and provide matching lower bounds to validate their optimalities. Our experimental results also demonstrate the superiority of the proposed algorithms. To the best of our knowledge, this is the first work that considers option correlation in risk-aware bandits and explicitly quantifies how arbitrary covariance structures impact the learning performance.


page 1

page 2

page 3

page 4


Risk-Aware Algorithms for Combinatorial Semi-Bandits

In this paper, we study the stochastic combinatorial multi-armed bandit ...

Risk-aware linear bandits with convex loss

In decision-making problems such as the multi-armed bandit, an agent lea...

Thompson Sampling Algorithms for Mean-Variance Bandits

The multi-armed bandit (MAB) problem is a classical learning task that e...

Collaboratively Learning the Best Option, Using Bounded Memory

We consider multi-armed bandit problems in social groups wherein each in...

A Survey of Risk-Aware Multi-Armed Bandits

In several applications such as clinical trials and financial portfolio ...

Dueling Bandits with Qualitative Feedback

We formulate and study a novel multi-armed bandit problem called the qua...

Harnessing Natural Fluctuations: Analogue Computer for Efficient Socially Maximal Decision Making

Each individual handles many tasks of finding the most profitable option...