The Assistive Multi-Armed Bandit

01/24/2019
by   Lawrence Chan, et al.
0

Learning preferences implicit in the choices humans make is a well studied problem in both economics and computer science. However, most work makes the assumption that humans are acting (noisily) optimally with respect to their preferences. Such approaches can fail when people are themselves learning about what they want. In this work, we introduce the assistive multi-armed bandit, where a robot assists a human playing a bandit task to maximize cumulative reward. In this problem, the human does not know the reward function but can learn it through the rewards received from arm pulls; the robot only observes which arms the human pulls but not the reward associated with each pull. We offer sufficient and necessary conditions for successfully assisting the human in this framework. Surprisingly, better human performance in isolation does not necessarily lead to better performance when assisted by the robot: a human policy can do better by effectively communicating its observed rewards to the robot. We conduct proof-of-concept experiments that support these results. We see this work as contributing towards a theory behind algorithms for human-robot interaction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/12/2021

Risk-Averse Biased Human Policies in Assistive Multi-Armed Bandit Settings

Assistive multi-armed bandit problems can be used to model team situatio...
research
11/13/2019

Adaptive Portfolio by Solving Multi-armed Bandit via Thompson Sampling

As the cornerstone of modern portfolio theory, Markowitz's mean-variance...
research
10/27/2011

The multi-armed bandit problem with covariates

We consider a multi-armed bandit problem in a setting where each arm pro...
research
06/01/2022

Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When Partial Feedback Counts

There is a rising interest in industrial online applications where data ...
research
01/30/2020

HAMLET – A Learning Curve-Enabled Multi-Armed Bandit for Algorithm Selection

Automated algorithm selection and hyperparameter tuning facilitates the ...
research
10/21/2021

Can Q-learning solve Multi Armed Bantids?

When a reinforcement learning (RL) method has to decide between several ...
research
07/27/2018

Task Recommendation in Crowdsourcing Based on Learning Preferences and Reliabilities

Workers participating in a crowdsourcing platform can have a wide range ...

Please sign up or login with your details

Forgot password? Click here to reset