Upside-Down Reinforcement Learning Can Diverge in Stochastic Environments With Episodic Resets

05/13/2022
by   Miroslav Štrupl, et al.
7

Upside-Down Reinforcement Learning (UDRL) is an approach for solving RL problems that does not require value functions and uses only supervised learning, where the targets for given inputs in a dataset do not change over time. Ghosh et al. proved that Goal-Conditional Supervised Learning (GCSL) – which can be viewed as a simplified version of UDRL – optimizes a lower bound on goal-reaching performance. This raises expectations that such algorithms may enjoy guaranteed convergence to the optimal policy in arbitrary environments, similar to certain well-known traditional RL algorithms. Here we show that for a specific episodic UDRL algorithm (eUDRL, including GCSL), this is not the case, and give the causes of this limitation. To do so, we first introduce a helpful rewrite of eUDRL as a recursive policy update. This formulation helps to disprove its convergence to the optimal policy for a wide class of stochastic environments. Finally, we provide a concrete example of a very simple environment where eUDRL diverges. Since the primary aim of this paper is to present a negative result, and the best counterexamples are the simplest ones, we restrict all discussions to finite (discrete) environments, ignoring issues of function approximation and limited sample size.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/05/2019

Training Agents using Upside-Down Reinforcement Learning

Traditional Reinforcement Learning (RL) algorithms either predict reward...
research
05/25/2022

Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments

We initiate the study of dynamic regret minimization for goal-oriented r...
research
10/05/2021

You Only Evaluate Once: a Simple Baseline Algorithm for Offline RL

The goal of offline reinforcement learning (RL) is to find an optimal po...
research
08/07/2023

Deep Q-Network for Stochastic Process Environments

Reinforcement learning is a powerful approach for training an optimal po...
research
04/26/2023

Distance Weighted Supervised Learning for Offline Interaction Data

Sequential decision making algorithms often struggle to leverage differe...
research
09/03/2020

Optimality-based Analysis of XCSF Compaction in Discrete Reinforcement Learning

Learning classifier systems (LCSs) are population-based predictive syste...
research
09/29/2022

Blessing from Experts: Super Reinforcement Learning in Confounded Environments

We introduce super reinforcement learning in the batch setting, which ta...

Please sign up or login with your details

Forgot password? Click here to reset