Improving Interactive Reinforcement Agent Planning with Human Demonstration

04/18/2019
by   Guangliang Li, et al.
0

TAMER has proven to be a powerful interactive reinforcement learning method for allowing ordinary people to teach and personalize autonomous agents' behavior by providing evaluative feedback. However, a TAMER agent planning with UCT---a Monte Carlo Tree Search strategy, can only update states along its path and might induce high learning cost especially for a physical robot. In this paper, we propose to drive the agent's exploration along the optimal path and reduce the learning cost by initializing the agent's reward function via inverse reinforcement learning from demonstration. We test our proposed method in the RL benchmark domain---Grid World---with different discounts on human reward. Our results show that learning from demonstration can allow a TAMER agent to learn a roughly optimal policy up to the deepest search and encourage the agent to explore along the optimal path. In addition, we find that learning from demonstration can improve the learning efficiency by reducing total feedback, the number of incorrect actions and increasing the ratio of correct actions to obtain an optimal policy, allowing a TAMER agent to converge faster.

READ FULL TEXT

page 5

page 6

research
10/14/2022

Multi-trainer Interactive Reinforcement Learning System

Interactive reinforcement learning can effectively facilitate the agent ...
research
06/22/2018

Human-Interactive Subgoal Supervision for Efficient Inverse Reinforcement Learning

Humans are able to understand and perform complex tasks by strategically...
research
02/09/2022

Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration

A major challenge in real-world reinforcement learning (RL) is the spars...
research
01/16/2019

ReNeg and Backseat Driver: Learning from Demonstration with Continuous Human Feedback

In autonomous vehicle (AV) control, allowing mistakes can be quite dange...
research
07/25/2018

A Minimax Tree Based Approach for Minimizing Detectability and Maximizing Visibility

We introduce and study the problem of planning a trajectory for an agent...
research
11/02/2020

Useful Policy Invariant Shaping from Arbitrary Advice

Reinforcement learning is a powerful learning paradigm in which agents c...
research
10/16/2017

Gradient-free Policy Architecture Search and Adaptation

We develop a method for policy architecture search and adaptation via gr...

Please sign up or login with your details

Forgot password? Click here to reset