Back to the Manifold: Recovering from Out-of-Distribution States

07/18/2022
by   Alfredo Reichlin, et al.
27

Learning from previously collected datasets of expert data offers the promise of acquiring robotic policies without unsafe and costly online explorations. However, a major challenge is a distributional shift between the states in the training dataset and the ones visited by the learned policy at the test time. While prior works mainly studied the distribution shift caused by the policy during the offline training, the problem of recovering from out-of-distribution states at the deployment time is not very well studied yet. We alleviate the distributional shift at the deployment time by introducing a recovery policy that brings the agent back to the training manifold whenever it steps out of the in-distribution states, e.g., due to an external perturbation. The recovery policy relies on an approximation of the training data density and a learned equivariant mapping that maps visual observations into a latent space in which translations correspond to the robot actions. We demonstrate the effectiveness of the proposed method through several manipulation experiments on a real robotic platform. Our results show that the recovery policy enables the agent to complete tasks while the behavioral cloning alone fails because of the distributional shift problem.

READ FULL TEXT

page 1

page 5

research
11/09/2021

Dealing with the Unknown: Pessimistic Offline Reinforcement Learning

Reinforcement Learning (RL) has been shown effective in domains where th...
research
05/02/2023

Get Back Here: Robust Imitation by Return-to-Distribution Planning

We consider the Imitation Learning (IL) setup where expert data are not ...
research
06/21/2022

Lyapunov Density Models: Constraining Distribution Shift in Learning-Based Control

Learned models and policies can generalize effectively when evaluated wi...
research
10/24/2021

SCORE: Spurious COrrelation REduction for Offline Reinforcement Learning

Offline reinforcement learning (RL) aims to learn the optimal policy fro...
research
10/06/2022

Distributionally Adaptive Meta Reinforcement Learning

Meta-reinforcement learning algorithms provide a data-driven way to acqu...
research
10/15/2021

Value Penalized Q-Learning for Recommender Systems

Scaling reinforcement learning (RL) to recommender systems (RS) is promi...
research
02/19/2020

Online Policies for Efficient Volunteer Crowdsourcing

Nonprofit crowdsourcing platforms such as food recovery organizations re...

Please sign up or login with your details

Forgot password? Click here to reset