Quantile Filtered Imitation Learning

12/02/2021
by   David Brandfonbrener, et al.
0

We introduce quantile filtered imitation learning (QFIL), a novel policy improvement operator designed for offline reinforcement learning. QFIL performs policy improvement by running imitation learning on a filtered version of the offline dataset. The filtering process removes s,a pairs whose estimated Q values fall below a given quantile of the pushforward distribution over values induced by sampling actions from the behavior policy. The definitions of both the pushforward Q distribution and resulting value function quantile are key contributions of our method. We prove that QFIL gives us a safe policy improvement step with function approximation and that the choice of quantile provides a natural hyperparameter to trade off bias and variance of the improvement step. Empirically, we perform a synthetic experiment illustrating how QFIL effectively makes a bias-variance tradeoff and we see that QFIL performs well on the D4RL benchmark.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/08/2020

Non-Adversarial Imitation Learning and its Connections to Adversarial Methods

Many modern methods for imitation learning and inverse reinforcement lea...
research
11/03/2021

Curriculum Offline Imitation Learning

Offline reinforcement learning (RL) tasks require the agent to learn fro...
research
11/29/2020

Distilled Thompson Sampling: Practical and Efficient Thompson Sampling via Imitation Learning

Thompson sampling (TS) has emerged as a robust technique for contextual ...
research
06/12/2020

Self-Imitation Learning via Generalized Lower Bound Q-learning

Self-imitation learning motivated by lower-bound Q-learning is a novel a...
research
06/19/2018

Unsupervised Imitation Learning

We introduce a novel method to learn a policy from unsupervised demonstr...
research
09/18/2020

Compressed imitation learning

In analogy to compressed sensing, which allows sample-efficient signal r...
research
07/06/2020

Explaining Fast Improvement in Online Policy Optimization

Online policy optimization (OPO) views policy optimization for sequentia...

Please sign up or login with your details

Forgot password? Click here to reset