Koopman Q-learning: Offline Reinforcement Learning via Symmetries of Dynamics

11/02/2021
by   Matthias Weissenbacher, et al.
0

Offline reinforcement learning leverages large datasets to train policies without interactions with the environment. The learned policies may then be deployed in real-world settings where interactions are costly or dangerous. Current algorithms over-fit to the training dataset and as a consequence perform poorly when deployed to out-of-distribution generalizations of the environment. We aim to address these limitations by learning a Koopman latent representation which allows us to infer symmetries of the system's underlying dynamic. The latter is then utilized to extend the otherwise static offline dataset during training; this constitutes a novel data augmentation framework which reflects the system's dynamic and is thus to be interpreted as an exploration of the environments phase space. To obtain the symmetries we employ Koopman theory in which nonlinear dynamics are represented in terms of a linear operator acting on the space of measurement functions of the system and thus symmetries of the dynamics may be inferred directly. We provide novel theoretical results on the existence and nature of symmetries relevant for control systems such as reinforcement learning settings. Moreover, we empirically evaluate our method on several benchmark offline reinforcement learning tasks and datasets including D4RL, Metaworld and Robosuite and find that by using our framework we consistently improve the state-of-the-art for Q-learning methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/10/2021

S4RL: Surprisingly Simple Self-Supervision for Offline Reinforcement Learning

Offline reinforcement learning proposes to learn policies from large col...
research
03/13/2022

DARA: Dynamics-Aware Reward Augmentation in Offline Reinforcement Learning

Offline reinforcement learning algorithms promise to be applicable in se...
research
05/19/2022

Data Valuation for Offline Reinforcement Learning

The success of deep reinforcement learning (DRL) hinges on the availabil...
research
04/05/2022

Configuration Path Control

Reinforcement learning methods often produce brittle policies – policies...
research
02/13/2022

Goal Recognition as Reinforcement Learning

Most approaches for goal recognition rely on specifications of the possi...
research
07/06/2023

Offline Reinforcement Learning with Imbalanced Datasets

The prevalent use of benchmarks in current offline reinforcement learnin...
research
04/12/2021

Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment

Reinforcement learning from large-scale offline datasets provides us wit...

Please sign up or login with your details

Forgot password? Click here to reset