Robust Contextual Bandit via the Capped-ℓ_2 norm

08/17/2017
by   Feiyun Zhu, et al.
0

This paper considers the actor-critic contextual bandit for the mobile health (mHealth) intervention. The state-of-the-art decision-making methods in mHealth generally assume that the noise in the dynamic system follows the Gaussian distribution. Those methods use the least-square-based algorithm to estimate the expected reward, which is prone to the existence of outliers. To deal with the issue of outliers, we propose a novel robust actor-critic contextual bandit method for the mHealth intervention. In the critic updating, the capped-ℓ_2 norm is used to measure the approximation error, which prevents outliers from dominating our objective. A set of weights could be achieved from the critic updating. Considering them gives a weighted objective for the actor updating. It provides the badly noised sample in the critic updating with zero weights for the actor updating. As a result, the robustness of both actor-critic updating is enhanced. There is a key parameter in the capped-ℓ_2 norm. We provide a reliable method to properly set it by making use of one of the most fundamental definitions of outliers in statistics. Extensive experiment results demonstrate that our method can achieve almost identical results compared with the state-of-the-art methods on the dataset without outliers and dramatically outperform them on the datasets noised by outliers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/27/2018

Robust Actor-Critic Contextual Bandit for Mobile Health (mHealth) Interventions

We consider the actor-critic contextual bandit for the mobile health (mH...
research
03/04/2022

A Small Gain Analysis of Single Timescale Actor Critic

We consider a version of actor-critic which uses proportional step-sizes...
research
06/28/2017

An Actor-Critic Contextual Bandit Algorithm for Personalized Mobile Health Interventions

Increasing technological sophistication and widespread use of smartphone...
research
08/21/2022

Robust Tests in Online Decision-Making

Bandit algorithms are widely used in sequential decision problems to max...
research
06/28/2023

SARC: Soft Actor Retrospective Critic

The two-time scale nature of SAC, which is an actor-critic algorithm, is...
research
06/28/2021

Robust Blind Source Separation by Soft Decision-Directed Non-Unitary Joint Diagonalization

Approximate joint diagonalization of a set of matrices provides a powerf...
research
09/12/2014

10,000+ Times Accelerated Robust Subset Selection (ARSS)

Subset selection from massive data with noised information is increasing...

Please sign up or login with your details

Forgot password? Click here to reset