MAHALO: Unifying Offline Reinforcement Learning and Imitation Learning from Observations

03/30/2023
by   Anqi Li, et al.
0

We study a new paradigm for sequential decision making, called offline Policy Learning from Observation (PLfO). Offline PLfO aims to learn policies using datasets with substandard qualities: 1) only a subset of trajectories is labeled with rewards, 2) labeled trajectories may not contain actions, 3) labeled trajectories may not be of high quality, and 4) the overall data may not have full coverage. Such imperfection is common in real-world learning scenarios, so offline PLfO encompasses many existing offline learning setups, including offline imitation learning (IL), ILfO, and reinforcement learning (RL). In this work, we present a generic approach, called Modality-agnostic Adversarial Hypothesis Adaptation for Learning from Observations (MAHALO), for offline PLfO. Built upon the pessimism concept in offline RL, MAHALO optimizes the policy using a performance lower bound that accounts for uncertainty due to the dataset's insufficient converge. We implement this idea by adversarially training data-consistent critic and reward functions in policy optimization, which forces the learned policy to be robust to the data deficiency. We show that MAHALO consistently outperforms or matches specialized algorithms across a variety of offline PLfO tasks in theory and experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2023

Optimal Transport for Offline Imitation Learning

With the advent of large datasets, offline reinforcement learning (RL) i...
research
10/19/2022

Robust Offline Reinforcement Learning with Gradient Penalty and Constraint Relaxation

A promising paradigm for offline reinforcement learning (RL) is to const...
research
05/23/2023

Sequence Modeling is a Robust Contender for Offline Reinforcement Learning

Offline reinforcement learning (RL) allows agents to learn effective, re...
research
12/08/2022

Model-based trajectory stitching for improved behavioural cloning and its applications

Behavioural cloning (BC) is a commonly used imitation learning method to...
research
03/22/2021

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

Offline (or batch) reinforcement learning (RL) algorithms seek to learn ...
research
02/27/2020

Provably Efficient Third-Person Imitation from Offline Observation

Domain adaptation in imitation learning represents an essential step tow...
research
02/06/2023

A Strong Baseline for Batch Imitation Learning

Imitation of expert behaviour is a highly desirable and safe approach to...

Please sign up or login with your details

Forgot password? Click here to reset