Learner-aware Teaching: Inverse Reinforcement Learning with Preferences and Constraints

06/02/2019
by   Sebastian Tschiatschek, et al.
0

Inverse reinforcement learning (IRL) enables an agent to learn complex behavior by observing demonstrations from a (near-)optimal policy. The typical assumption is that the learner's goal is to match the teacher's demonstrated behavior. In this paper, we consider the setting where the learner has her own preferences that she additionally takes into consideration. These preferences can for example capture behavioral biases, mismatched worldviews, or physical constraints. We study two teaching approaches: learner-agnostic teaching, where the teacher provides demonstrations from an optimal policy ignoring the learner's preferences, and learner-aware teaching, where the teacher accounts for the learner's preferences. We design learner-aware teaching algorithms and show that significant performance improvements can be achieved over learner-agnostic teaching.

READ FULL TEXT
research
10/21/2018

Teaching Inverse Reinforcement Learners via Features and Demonstrations

Learning near-optimal behaviour from an expert's demonstrations typicall...
research
05/28/2019

Interactive Teaching Algorithms for Inverse Reinforcement Learning

We study the problem of inverse reinforcement learning (IRL) with the ad...
research
09/26/2022

Overcoming Referential Ambiguity in Language-Guided Goal-Conditioned Reinforcement Learning

Teaching an agent to perform new tasks using natural language can easily...
research
06/08/2021

Curriculum Design for Teaching via Demonstrations: Theory and Applications

We consider the problem of teaching via demonstrations in sequential dec...
research
10/26/2019

ZPD Teaching Strategies for Deep Reinforcement Learning from Demonstrations

Learning from demonstrations is a popular tool for accelerating and redu...
research
01/27/2020

Adaptive Teaching of Temporal Logic Formulas to Learners with Preferences

Machine teaching is an algorithmic framework for teaching a target hypot...
research
08/17/2023

Controlling Federated Learning for Covertness

A learner aims to minimize a function f by repeatedly querying a distrib...

Please sign up or login with your details

Forgot password? Click here to reset