Instabilities of Offline RL with Pre-Trained Neural Representation

03/08/2021
by   Ruosong Wang, et al.
15

In offline reinforcement learning (RL), we seek to utilize offline data to evaluate (or learn) policies in scenarios where the data are collected from a distribution that substantially differs from that of the target policy to be evaluated. Recent theoretical advances have shown that such sample-efficient offline RL is indeed possible provided certain strong representational conditions hold, else there are lower bounds exhibiting exponential error amplification (in the problem horizon) unless the data collection distribution has only a mild distribution shift relative to the target policy. This work studies these issues from an empirical perspective to gauge how stable offline RL methods are. In particular, our methodology explores these ideas when using features from pre-trained neural networks, in the hope that these representations are powerful enough to permit sample efficient offline RL. Through extensive experiments on a range of tasks, we see that substantial error amplification does occur even when using such pre-trained representations (trained on the same task itself); we find offline RL is stable only under extremely mild distribution shift. The implications of these results, both from a theoretical and an empirical perspective, are that successful offline RL (where we seek to go beyond the low distribution shift regime) requires substantially stronger conditions beyond those which suffice for successful supervised learning.

READ FULL TEXT

page 13

page 34

page 35

page 36

research
10/22/2020

What are the Statistical Limits of Offline RL with Linear Function Approximation?

Offline reinforcement learning seeks to utilize offline (observational) ...
research
12/20/2021

RvS: What is Essential for Offline RL via Supervised Learning?

Recent work has shown that supervised learning alone, without temporal d...
research
06/21/2021

OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation

We consider the offline reinforcement learning (RL) setting where the ag...
research
07/05/2021

The Least Restriction for Offline Reinforcement Learning

Many practical applications of reinforcement learning (RL) constrain the...
research
06/16/2023

π2vec: Policy Representations with Successor Features

This paper describes π2vec, a method for representing behaviors of black...
research
06/06/2023

State Regularized Policy Optimization on Data with Dynamics Shift

In many real-world scenarios, Reinforcement Learning (RL) algorithms are...
research
11/30/2020

IV-Posterior: Inverse Value Estimation for Interpretable Policy Certificates

Model-free reinforcement learning (RL) is a powerful tool to learn a bro...

Please sign up or login with your details

Forgot password? Click here to reset