A Ranking Game for Imitation Learning

02/07/2022
by   Harshit Sikchi, et al.
6

We propose a new framework for imitation learning - treating imitation as a two-player ranking-based Stackelberg game between a policy and a reward function. In this game, the reward agent learns to satisfy pairwise performance rankings within a set of policies, while the policy agent learns to maximize this reward. This game encompasses a large subset of both inverse reinforcement learning (IRL) methods and methods which learn from offline preferences. The Stackelberg game formulation allows us to use optimization methods that take the game structure into account, leading to more sample efficient and stable learning dynamics compared to existing IRL methods. We theoretically analyze the requirements of the loss function used for ranking policy performances to facilitate near-optimal imitation learning at equilibrium. We use insights from this analysis to further increase sample efficiency of the ranking game by using automatically generated rankings or with offline annotated rankings. Our experiments show that the proposed method achieves state-of-the-art sample efficiency and is able to solve previously unsolvable tasks in the Learning from Observation (LfO) setting.

READ FULL TEXT

page 17

page 18

research
07/09/2019

Ranking-Based Reward Extrapolation without Rankings

The performance of imitation learning is typically upper-bounded by the ...
research
12/10/2019

Deep Bayesian Reward Learning from Preferences

Bayesian inverse reinforcement learning (IRL) methods are ideal for safe...
research
07/24/2020

Bayesian Robust Optimization for Imitation Learning

One of the main challenges in imitation learning is determining what act...
research
02/21/2020

Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences

Bayesian reward learning from demonstrations enables rigorous safety and...
research
01/25/2019

Evaluation Function Approximation for Scrabble

The current state-of-the-art Scrabble agents are not learning-based but ...
research
09/24/2019

Avoidance Learning Using Observational Reinforcement Learning

Imitation learning seeks to learn an expert policy from sampled demonstr...
research
09/17/2020

Evolutionary Selective Imitation: Interpretable Agents by Imitation Learning Without a Demonstrator

We propose a new method for training an agent via an evolutionary strate...

Please sign up or login with your details

Forgot password? Click here to reset