The Expertise Problem: Learning from Specialized Feedback

11/12/2022
by   Oliver Daniels-Koch, et al.
0

Reinforcement learning from human feedback (RLHF) is a powerful technique for training agents to perform difficult-to-specify tasks. However, human feedback can be noisy, particularly when human teachers lack relevant knowledge or experience. Levels of expertise vary across teachers, and a given teacher may have differing levels of expertise for different components of a task. RLHF algorithms that learn from multiple teachers therefore face an expertise problem: the reliability of a given piece of feedback depends both on the teacher that it comes from and how specialized that teacher is on relevant components of the task. Existing state-of-the-art RLHF algorithms assume that all evaluations come from the same distribution, obscuring this inter- and intra-human variance, and preventing them from accounting for or taking advantage of variations in expertise. We formalize this problem, implement it as an extension of an existing RLHF benchmark, evaluate the performance of a state-of-the-art RLHF algorithm, and explore techniques to improve query and teacher selection. Our key contribution is to demonstrate and characterize the expertise problem, and to provide an open-source implementation for testing future solutions.

READ FULL TEXT
research
03/02/2023

Active Reward Learning from Multiple Teachers

Reward learning algorithms utilize human feedback to infer a reward func...
research
04/02/2021

Learning Online from Corrective Feedback: A Meta-Algorithm for Robotics

A key challenge in Imitation Learning (IL) is that optimal state actions...
research
05/22/2023

Yes, this Way! Learning to Ground Referring Expressions into Actions with Intra-episodic Feedback from Supportive Teachers

The ability to pick up on language signals in an ongoing interaction is ...
research
07/19/2019

Interactive Learning of Environment Dynamics for Sequential Tasks

In order for robots and other artificial agents to efficiently learn to ...
research
03/12/2019

Learning Gaussian Policies from Corrective Human Feedback

Learning from human feedback is a viable alternative to control design t...
research
06/05/2023

Is ChatGPT a Good Teacher Coach? Measuring Zero-Shot Performance For Scoring and Providing Actionable Insights on Classroom Instruction

Coaching, which involves classroom observation and expert feedback, is a...
research
08/08/2020

Hierarchial Reinforcement Learning in StarCraft II with Human Expertise in Subgoals Selection

This work is inspired by recent advances in hierarchical reinforcement l...

Please sign up or login with your details

Forgot password? Click here to reset