Option Compatible Reward Inverse Reinforcement Learning

11/07/2019
by   Rakhoon Hwang, et al.
0

Reinforcement learning with complex tasks is a challenging problem. Often, expert demonstrations of complex multitasking operations are required to train agents. However, it is difficult to design a reward function for given complex tasks. In this paper, we solve a hierarchical inverse reinforcement learning (IRL) problem within the framework of options. A gradient method for parametrized options is used to deduce a defining equation for the Q-feature space, which leads to a reward feature space. Using a second-order optimality condition for option parameters, an optimal reward function is selected. Experimental results in both discrete and continuous domains confirm that our segmented rewards provide a solution to the IRL problem for multitasking operations and show good performance and robustness against the noise created by expert demonstrations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2018

Learning a Prior over Intent via Meta-Inverse Reinforcement Learning

A significant challenge for the practical application of reinforcement l...
research
01/19/2020

Learning Options from Demonstration using Skill Segmentation

We present a method for learning options from segmented demonstration tr...
research
11/08/2021

Batch Reinforcement Learning from Crowds

A shortcoming of batch reinforcement learning is its requirement for rew...
research
09/25/2022

Reward Learning using Structural Motifs in Inverse Reinforcement Learning

The Inverse Reinforcement Learning (IRL) problem has seen rapid evolutio...
research
06/03/2021

LiMIIRL: Lightweight Multiple-Intent Inverse Reinforcement Learning

Multiple-Intent Inverse Reinforcement Learning (MI-IRL) seeks to find a ...
research
10/24/2018

Inverse reinforcement learning for video games

Deep reinforcement learning achieves superhuman performance in a range o...
research
11/28/2017

Hierarchical Policy Search via Return-Weighted Density Estimation

Learning an optimal policy from a multi-modal reward function is a chall...

Please sign up or login with your details

Forgot password? Click here to reset