Distance Weighted Supervised Learning for Offline Interaction Data

04/26/2023
by   Joey Hejna, et al.
0

Sequential decision making algorithms often struggle to leverage different sources of unstructured offline interaction data. Imitation learning (IL) methods based on supervised learning are robust, but require optimal demonstrations, which are hard to collect. Offline goal-conditioned reinforcement learning (RL) algorithms promise to learn from sub-optimal data, but face optimization challenges especially with high-dimensional data. To bridge the gap between IL and RL, we introduce Distance Weighted Supervised Learning or DWSL, a supervised method for learning goal-conditioned policies from offline data. DWSL models the entire distribution of time-steps between states in offline data with only supervised learning, and uses this distribution to approximate shortest path distances. To extract a policy, we weight actions by their reduction in distance estimates. Theoretically, DWSL converges to an optimal policy constrained to the data distribution, an attractive property for offline learning, without any bootstrapping. Across all datasets we test, DWSL empirically maintains behavior cloning as a lower bound while still exhibiting policy improvement. In high-dimensional image domains, DWSL surpasses the performance of both prior goal-conditioned IL and RL algorithms. Visualizations and code can be found at https://sites.google.com/view/dwsl/home .

READ FULL TEXT

page 6

page 16

page 17

page 18

page 19

research
03/16/2023

Goal-conditioned Offline Reinforcement Learning through State Space Partitioning

Offline reinforcement learning (RL) aims to infer sequential decision po...
research
06/24/2023

Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching

Offline optimization paradigms such as offline Reinforcement Learning (R...
research
02/11/2021

Representation Matters: Offline Pretraining for Sequential Decision Making

The recent success of supervised learning methods on ever larger offline...
research
04/18/2023

Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled Datasets

Enabling robots to learn novel visuomotor skills in a data-efficient man...
research
06/13/2020

Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning

We propose a graphical model framework for goal-conditioned RL, with an ...
research
07/07/2023

Goal-Conditioned Predictive Coding as an Implicit Planner for Offline Reinforcement Learning

Recent work has demonstrated the effectiveness of formulating decision m...
research
05/13/2022

Upside-Down Reinforcement Learning Can Diverge in Stochastic Environments With Episodic Resets

Upside-Down Reinforcement Learning (UDRL) is an approach for solving RL ...

Please sign up or login with your details

Forgot password? Click here to reset