CROP: Towards Distributional-Shift Robust Reinforcement Learning using Compact Reshaped Observation Processing

04/26/2023
by   Philipp Altmann, et al.
0

The safe application of reinforcement learning (RL) requires generalization from limited training data to unseen scenarios. Yet, fulfilling tasks under changing circumstances is a key challenge in RL. Current state-of-the-art approaches for generalization apply data augmentation techniques to increase the diversity of training data. Even though this prevents overfitting to the training environment(s), it hinders policy optimization. Crafting a suitable observation, only containing crucial information, has been shown to be a challenging task itself. To improve data efficiency and generalization capabilities, we propose Compact Reshaped Observation Processing (CROP) to reduce the state information used for policy optimization. By providing only relevant information, overfitting to a specific training layout is precluded and generalization to unseen environments is improved. We formulate three CROPs that can be applied to fully observable observation- and action-spaces and provide methodical foundation. We empirically show the improvements of CROP in a distributionally shifted safety gridworld. We furthermore provide benchmark comparisons to full observability and data-augmentation in two different-sized procedurally generated mazes.

READ FULL TEXT

page 4

page 5

page 6

research
06/29/2021

Generalization of Reinforcement Learning with Policy-Aware Adversarial Data Augmentation

The generalization gap in reinforcement learning (RL) has been a signifi...
research
10/21/2020

Improving Generalization in Reinforcement Learning with Mixture Regularization

Deep reinforcement learning (RL) agents trained in a limited set of envi...
research
02/21/2022

Don't Touch What Matters: Task-Aware Lipschitz Data Augmentation for Visual Reinforcement Learning

One of the key challenges in visual Reinforcement Learning (RL) is to le...
research
07/03/2022

Stabilizing Off-Policy Deep Reinforcement Learning from Pixels

Off-policy reinforcement learning (RL) from pixel observations is notori...
research
09/24/2020

Bootstrapped Q-learning with Context Relevant Observation Pruning to Generalize in Text-based Games

We show that Reinforcement Learning (RL) methods for solving Text-Based ...
research
10/13/2022

Bootstrap Advantage Estimation for Policy Optimization in Reinforcement Learning

This paper proposes an advantage estimation approach based on data augme...
research
08/05/2021

Active Reinforcement Learning over MDPs

The past decade has seen the rapid development of Reinforcement Learning...

Please sign up or login with your details

Forgot password? Click here to reset