Risk-Averse Biased Human Policies in Assistive Multi-Armed Bandit Settings

04/12/2021
by   Michael Koller, et al.
0

Assistive multi-armed bandit problems can be used to model team situations between a human and an autonomous system like a domestic service robot. To account for human biases such as the risk-aversion described in the Cumulative Prospect Theory, the setting is expanded to using observable rewards. When robots leverage knowledge about the risk-averse human model they eliminate the bias and make more rational choices. We present an algorithm that increases the utility value of such human-robot teams. A brief evaluation indicates that arbitrary reward functions can be handled.

READ FULL TEXT

page 1

page 2

page 3

research
01/24/2019

The Assistive Multi-Armed Bandit

Learning preferences implicit in the choices humans make is a well studi...
research
06/30/2020

Bounded Rationality in Las Vegas: Probabilistic Finite Automata PlayMulti-Armed Bandits

While traditional economics assumes that humans are fully rational agent...
research
06/15/2023

A Framework for Learning from Demonstration with Minimal Human Effort

We consider robot learning in the context of shared autonomy, where cont...
research
04/03/2020

Hawkes Process Multi-armed Bandits for Disaster Search and Rescue

We propose a novel framework for integrating Hawkes processes with multi...
research
06/30/2019

Reinforcement Learning with Fairness Constraints for Resource Distribution in Human-Robot Teams

Much work in robotics and operations research has focused on optimal res...
research
01/13/2020

When Humans Aren't Optimal: Robots that Collaborate with Risk-Aware Humans

In order to collaborate safely and efficiently, robots need to anticipat...
research
06/07/2017

Bandit Models of Human Behavior: Reward Processing in Mental Disorders

Drawing an inspiration from behavioral studies of human decision making,...

Please sign up or login with your details

Forgot password? Click here to reset