A New Framework for Query Efficient Active Imitation Learning

12/30/2019
by   Daniel Hsu, et al.
13

We seek to align agent policy with human expert behavior in a reinforcement learning (RL) setting, without any prior knowledge about dynamics, reward function, and unsafe states. There is a human expert knowing the rewards and unsafe states based on his preference and objective, but querying that human expert is expensive. To address this challenge, we propose a new framework for imitation learning (IL) algorithm that actively and interactively learns a model of the user's reward function with efficient queries. We build an adversarial generative model of states and a successor feature (SR) model trained over transition experience collected by learning policy. Our method uses these models to select state-action pairs, asking the user to comment on the optimality or safety, and trains a adversarial neural network to predict the rewards. Different from previous papers, which are almost all based on uncertainty sampling, the key idea is to actively and efficiently select state-action pairs from both on-policy and off-policy experience, by discriminating the queried (expert) and unqueried (generated) data and maximizing the efficiency of value function learning. We call this method adversarial reward query with successor representation. We evaluate the proposed method with simulated human on a state-based 2D navigation task, robotic control tasks and the image-based video games, which have high-dimensional observation and complex state dynamics. The results show that the proposed method significantly outperforms uncertainty-based methods on learning reward models, achieving better query efficiency, where the adversarial discriminator can make the agent learn human behavior more efficiently and the SR can select states which have stronger impact on value function. Moreover, the proposed method can also learn to avoid unsafe states when training the reward model.

READ FULL TEXT
research
12/05/2019

Learning Human Objectives by Evaluating Hypothetical Behavior

We seek to align agent behavior with a user's objectives in a reinforcem...
research
11/23/2021

Sample Efficient Imitation Learning via Reward Function Trained in Advance

Imitation learning (IL) is a framework that learns to imitate expert beh...
research
11/09/2020

f-IRL: Inverse Reinforcement Learning via State Marginal Matching

Imitation learning is well-suited for robotic tasks where it is difficul...
research
04/13/2021

Reward Shaping with Dynamic Trajectory Aggregation

Reinforcement learning, which acquires a policy maximizing long-term rew...
research
04/05/2022

GAIL-PT: A Generic Intelligent Penetration Testing Framework with Generative Adversarial Imitation Learning

Penetration testing (PT) is an efficient network testing and vulnerabili...
research
01/03/2023

DADAgger: Disagreement-Augmented Dataset Aggregation

DAgger is an imitation algorithm that aggregates its original datasets b...
research
06/19/2022

Learning Multi-Task Transferable Rewards via Variational Inverse Reinforcement Learning

Many robotic tasks are composed of a lot of temporally correlated sub-ta...

Please sign up or login with your details

Forgot password? Click here to reset