Risk-Averse Stochastic Convex Bandit

10/01/2018
by   Adrian Rivera Cardoso, et al.
0

Motivated by applications in clinical trials and finance, we study the problem of online convex optimization (with bandit feedback) where the decision maker is risk-averse. We provide two algorithms to solve this problem. The first one is a descent-type algorithm which is easy to implement. The second algorithm, which combines the ellipsoid method and a center point device, achieves (almost) optimal regret bounds with respect to the number of rounds. To the best of our knowledge this is the first attempt to address risk-aversion in the online convex bandit problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/21/2017

Regret Analysis for Continuous Dueling Bandit

The dueling bandit is a learning framework wherein the feedback informat...
research
07/23/2020

Online Boosting with Bandit Feedback

We consider the problem of online boosting for regression tasks, when on...
research
02/22/2017

Fast Rates for Bandit Optimization with Upper-Confidence Frank-Wolfe

We consider the problem of bandit optimization, inspired by stochastic o...
research
09/06/2022

A Zeroth-Order Momentum Method for Risk-Averse Online Convex Games

We consider risk-averse learning in repeated unknown games where the goa...
research
09/12/2017

Setpoint Tracking with Partially Observed Loads

We use online convex optimization (OCO) for setpoint tracking with uncer...
research
03/16/2022

Risk-Averse No-Regret Learning in Online Convex Games

We consider an online stochastic game with risk-averse agents whose goal...
research
10/10/2022

Towards an efficient and risk aware strategy for guiding farmers in identifying best crop management

Identification of best performing fertilizer practices among a set of co...

Please sign up or login with your details

Forgot password? Click here to reset