DeepAI AI Chat
Log In Sign Up

A Ranking Game for Imitation Learning

by   Harshit Sikchi, et al.

We propose a new framework for imitation learning - treating imitation as a two-player ranking-based Stackelberg game between a policy and a reward function. In this game, the reward agent learns to satisfy pairwise performance rankings within a set of policies, while the policy agent learns to maximize this reward. This game encompasses a large subset of both inverse reinforcement learning (IRL) methods and methods which learn from offline preferences. The Stackelberg game formulation allows us to use optimization methods that take the game structure into account, leading to more sample efficient and stable learning dynamics compared to existing IRL methods. We theoretically analyze the requirements of the loss function used for ranking policy performances to facilitate near-optimal imitation learning at equilibrium. We use insights from this analysis to further increase sample efficiency of the ranking game by using automatically generated rankings or with offline annotated rankings. Our experiments show that the proposed method achieves state-of-the-art sample efficiency and is able to solve previously unsolvable tasks in the Learning from Observation (LfO) setting.


page 17

page 18


Ranking-Based Reward Extrapolation without Rankings

The performance of imitation learning is typically upper-bounded by the ...

Deep Bayesian Reward Learning from Preferences

Bayesian inverse reinforcement learning (IRL) methods are ideal for safe...

Bayesian Robust Optimization for Imitation Learning

One of the main challenges in imitation learning is determining what act...

Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences

Bayesian reward learning from demonstrations enables rigorous safety and...

Evaluation Function Approximation for Scrabble

The current state-of-the-art Scrabble agents are not learning-based but ...

Scalable Bayesian Inverse Reinforcement Learning

Bayesian inference over the reward presents an ideal solution to the ill...

Evolutionary Selective Imitation: Interpretable Agents by Imitation Learning Without a Demonstrator

We propose a new method for training an agent via an evolutionary strate...