Reinforced Imitation in Heterogeneous Action Space

04/06/2019
by   Konrad Zolna, et al.
14

Imitation learning is an effective alternative approach to learn a policy when the reward function is sparse. In this paper, we consider a challenging setting where an agent and an expert use different actions from each other. We assume that the agent has access to a sparse reward function and state-only expert observations. We propose a method which gradually balances between the imitation learning cost and the reinforcement learning objective. In addition, this method adapts the agent's policy based on either mimicking expert behavior or maximizing sparse reward. We show, through navigation scenarios, that (i) an agent is able to efficiently leverage sparse rewards to outperform standard state-only imitation learning, (ii) it can learn a policy even when its actions are different from the expert, and (iii) the performance of the agent is not bounded by that of the expert, due to the optimized usage of sparse rewards.

READ FULL TEXT

page 6

page 7

page 14

page 15

page 16

page 17

research
05/16/2019

Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation

We consider the problem of imitation learning from a finite set of exper...
research
03/11/2019

Hybrid Reinforcement Learning with Expert State Sequences

Existing imitation learning approaches often require that the complete d...
research
06/08/2020

Primal Wasserstein Imitation Learning

Imitation Learning (IL) methods seek to match the behavior of an agent w...
research
03/25/2023

Embedding Contextual Information through Reward Shaping in Multi-Agent Learning: A Case Study from Google Football

Artificial Intelligence has been used to help human complete difficult t...
research
02/13/2023

Imitation from Observation With Bootstrapped Contrastive Learning

Imitation from observation (IfO) is a learning paradigm that consists of...
research
03/01/2023

LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning

Recent methods for imitation learning directly learn a Q-function using ...
research
11/03/2019

Learning from Trajectories via Subgoal Discovery

Learning to solve complex goal-oriented tasks with sparse terminal-only ...

Please sign up or login with your details

Forgot password? Click here to reset