Risk-Averse Multi-Armed Bandits with Unobserved Confounders: A Case Study in Emotion Regulation in Mobile Health

09/09/2022
by   Yi Shen, et al.
10

In this paper, we consider a risk-averse multi-armed bandit (MAB) problem where the goal is to learn a policy that minimizes the risk of low expected return, as opposed to maximizing the expected return itself, which is the objective in the usual approach to risk-neutral MAB. Specifically, we formulate this problem as a transfer learning problem between an expert and a learner agent in the presence of contexts that are only observable by the expert but not by the learner. Thus, such contexts are unobserved confounders (UCs) from the learner's perspective. Given a dataset generated by the expert that excludes the UCs, the goal for the learner is to identify the true minimum-risk arm with fewer online learning steps, while avoiding possible biased decisions due to the presence of UCs in the expert's data.

READ FULL TEXT
research
06/07/2021

Learning without Knowing: Unobserved Context in Continuous Transfer Reinforcement Learning

In this paper, we consider a transfer Reinforcement Learning (RL) proble...
research
05/31/2022

Online Meta-Learning in Adversarial Multi-Armed Bandits

We study meta-learning for adversarial multi-armed bandits. We consider ...
research
04/30/2019

Risk-Averse Explore-Then-Commit Algorithms for Finite-Time Bandits

In this paper, we study multi-armed bandit problems in explore-then-comm...
research
06/03/2019

Distribution oblivious, risk-aware algorithms for multi-armed bandits with unbounded rewards

Classical multi-armed bandit problems use the expected value of an arm a...
research
10/23/2018

Online learning with feedback graphs and switching costs

We study online learning when partial feedback information is provided f...
research
04/01/2022

Strategies for Safe Multi-Armed Bandits with Logarithmic Regret and Risk

We investigate a natural but surprisingly unstudied approach to the mult...
research
06/04/2018

A General Approach to Multi-Armed Bandits Under Risk Criteria

Different risk-related criteria have received recent interest in learnin...

Please sign up or login with your details

Forgot password? Click here to reset