Curriculum Offline Imitation Learning

11/03/2021
by   Minghuan Liu, et al.
0

Offline reinforcement learning (RL) tasks require the agent to learn from a pre-collected dataset with no further interactions with the environment. Despite the potential to surpass the behavioral policies, RL-based methods are generally impractical due to the training instability and bootstrapping the extrapolation errors, which always require careful hyperparameter tuning via online evaluation. In contrast, offline imitation learning (IL) has no such issues since it learns the policy directly without estimating the value function by bootstrapping. However, IL is usually limited in the capability of the behavioral policy and tends to learn a mediocre behavior from the dataset collected by the mixture of policies. In this paper, we aim to take advantage of IL but mitigate such a drawback. Observing that behavior cloning is able to imitate neighboring policies with less data, we propose Curriculum Offline Imitation Learning (COIL), which utilizes an experience picking strategy for imitating from adaptive neighboring policies with a higher return, and improves the current policy along curriculum stages. On continuous control benchmarks, we compare COIL against both imitation-based and RL-based methods, showing that it not only avoids just learning a mediocre behavior on mixed datasets but is also even competitive with state-of-the-art offline RL methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/20/2023

Bridging Imitation and Online Reinforcement Learning: An Optimistic Tale

In this paper, we address the following problem: Given an offline demons...
research
05/23/2023

Sequence Modeling is a Robust Contender for Offline Reinforcement Learning

Offline reinforcement learning (RL) allows agents to learn effective, re...
research
10/15/2022

A Policy-Guided Imitation Approach for Offline Reinforcement Learning

Offline reinforcement learning (RL) methods can generally be categorized...
research
06/23/2023

Comparing the Efficacy of Fine-Tuning and Meta-Learning for Few-Shot Policy Imitation

In this paper we explore few-shot imitation learning for control problem...
research
02/06/2023

A Strong Baseline for Batch Imitation Learning

Imitation of expert behaviour is a highly desirable and safe approach to...
research
12/02/2021

Quantile Filtered Imitation Learning

We introduce quantile filtered imitation learning (QFIL), a novel policy...
research
01/30/2023

Winning Solution of Real Robot Challenge III

This report introduces our winning solution of the real-robot phase of t...

Please sign up or login with your details

Forgot password? Click here to reset