DeepAI AI Chat
Log In Sign Up

A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise Datasets

by   Jake Grigsby, et al.
University of Virginia

Recent Offline Reinforcement Learning methods have succeeded in learning high-performance policies from fixed datasets of experience. A particularly effective approach learns to first identify and then mimic optimal decision-making strategies. Our work evaluates this method's ability to scale to vast datasets consisting almost entirely of sub-optimal noise. A thorough investigation on a custom benchmark helps identify several key challenges involved in learning from high-noise datasets. We re-purpose prioritized experience sampling to locate expert-level demonstrations among millions of low-performance samples. This modification enables offline agents to learn state-of-the-art policies in benchmark tasks using datasets where expert actions are outnumbered nearly 65:1.


Accelerating Online Reinforcement Learning with Offline Datasets

Reinforcement learning provides an appealing formalism for learning cont...

When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning?

Offline reinforcement learning (RL) algorithms can acquire effective pol...

Model-based trajectory stitching for improved behavioural cloning and its applications

Behavioural cloning (BC) is a commonly used imitation learning method to...

Improving Learning from Demonstrations by Learning from Experience

How to make imitation learning more general when demonstrations are rela...

Improving TD3-BC: Relaxed Policy Constraint for Offline Learning and Stable Online Fine-Tuning

The ability to discover optimal behaviour from fixed data sets has the p...

Behaviour Discriminator: A Simple Data Filtering Method to Improve Offline Policy Learning

This paper studies the problem of learning a control policy without the ...