Teachable Reinforcement Learning via Advice Distillation

03/19/2022
by   Olivia Watkins, et al.
0

Training automated agents to complete complex tasks in interactive environments is challenging: reinforcement learning requires careful hand-engineering of reward functions, imitation learning requires specialized infrastructure and access to a human expert, and learning from intermediate forms of supervision (like binary preferences) is time-consuming and extracts little information from each human intervention. Can we overcome these challenges by building agents that learn from rich, interactive feedback instead? We propose a new supervision paradigm for interactive learning based on "teachable" decision-making systems that learn from structured advice provided by an external teacher. We begin by formalizing a class of human-in-the-loop decision making problems in which multiple forms of teacher-provided advice are available to a learner. We then describe a simple learning algorithm for these problems that first learns to interpret advice, then learns from advice to complete tasks even in the absence of human supervision. In puzzle-solving, navigation, and locomotion domains, we show that agents that learn from advice can acquire new skills with significantly less human supervision than standard reinforcement learning algorithms and often less than imitation learning.

READ FULL TEXT

page 6

page 15

page 16

page 17

page 19

research
03/09/2020

Human AI interaction loop training: New approach for interactive reinforcement learning

Reinforcement Learning (RL) in various decision-making tasks of machine ...
research
08/03/2020

Tracking the Race Between Deep Reinforcement Learning and Imitation Learning – Extended Version

Learning-based approaches for solving large sequential decision making p...
research
10/05/2020

Learning to Generalize for Sequential Decision Making

We consider problems of making sequences of decisions to accomplish task...
research
03/29/2022

ReIL: A Framework for Reinforced Intervention-based Imitation Learning

Compared to traditional imitation learning methods such as DAgger and DA...
research
04/14/2023

Synthetically Generating Human-like Data for Sequential Decision Making Tasks via Reward-Shaped Imitation Learning

We consider the problem of synthetically generating data that can closel...
research
11/10/2018

Learning Shaping Strategies in Human-in-the-loop Interactive Reinforcement Learning

Providing reinforcement learning agents with informationally rich human ...
research
07/20/2023

Breadcrumbs to the Goal: Goal-Conditioned Exploration from Human-in-the-Loop Feedback

Exploration and reward specification are fundamental and intertwined cha...

Please sign up or login with your details

Forgot password? Click here to reset