Learning Behaviors with Uncertain Human Feedback

06/07/2020
by   Xu He, et al.
0

Human feedback is widely used to train agents in many domains. However, previous works rarely consider the uncertainty when humans provide feedback, especially in cases that the optimal actions are not obvious to the trainers. For example, the reward of a sub-optimal action can be stochastic and sometimes exceeds that of the optimal action, which is common in games or real-world. Trainers are likely to provide positive feedback to sub-optimal actions, negative feedback to the optimal actions and even do not provide feedback in some confusing situations. Existing works, which utilize the Expectation Maximization (EM) algorithm and treat the feedback model as hidden parameters, do not consider uncertainties in the learning environment and human feedback. To address this challenge, we introduce a novel feedback model that considers the uncertainty of human feedback. However, this incurs intractable calculus in the EM algorithm. To this end, we propose a novel approximate EM algorithm, in which we approximate the expectation step with the Gradient Descent method. Experimental results in both synthetic scenarios and two real-world scenarios with human participants demonstrate the superior performance of our proposed approach.

READ FULL TEXT

page 1

page 2

page 3

page 5

page 7

page 8

page 9

page 10

research
01/16/2019

ReNeg and Backseat Driver: Learning from Demonstration with Continuous Human Feedback

In autonomous vehicle (AV) control, allowing mistakes can be quite dange...
research
03/14/2023

RE-MOVE: An Adaptive Policy Design Approach for Dynamic Environments via Language-Based Feedback

Reinforcement learning-based policies for continuous control robotic nav...
research
03/03/2019

Analysis of Gradient-Based Expectation-Maximization-Like Algorithms via Integral Quadratic Constraints

The Expectation-Maximization (EM) algorithm is one of the most popular m...
research
11/02/2020

Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent

Expectation maximization (EM) is the default algorithm for fitting proba...
research
05/24/2019

A view of Estimation of Distribution Algorithms through the lens of Expectation-Maximization

We show that under mild conditions, Estimation of Distribution Algorithm...
research
09/28/2020

The EMPATHIC Framework for Task Learning from Implicit Human Feedback

Reactions such as gestures, facial expressions, and vocalizations are an...
research
02/16/2018

Dealing with Uncertainties in User Feedback: Strategies Between Denying and Accepting

Latest research revealed a considerable lack of reliability within user ...

Please sign up or login with your details

Forgot password? Click here to reset