The Principle of Unchanged Optimality in Reinforcement Learning Generalization

06/02/2019
by   Alex Irpan, et al.
2

Several recent papers have examined generalization in reinforcement learning (RL), by proposing new environments or ways to add noise to existing environments, then benchmarking algorithms and model architectures on those environments. We discuss subtle conceptual properties of RL benchmarks that are not required in supervised learning (SL), and also properties that an RL benchmark should possess. Chief among them is one we call the principle of unchanged optimality: there should exist a single π that is optimal across all train and test tasks. In this work, we argue why this principle is important, and ways it can be broken or satisfied due to subtle choices in state representation or model architecture. We conclude by discussing challenges and future lines of research in theoretically analyzing generalization benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/14/2018

Natural Environment Benchmarks for Reinforcement Learning

While current benchmark reinforcement learning (RL) tasks have been usef...
research
09/27/2021

MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research

The progress in deep reinforcement learning (RL) is heavily driven by th...
research
05/29/2019

Advantage Amplification in Slowly Evolving Latent-State Environments

Latent-state environments with long horizons, such as those faced by rec...
research
02/09/2022

Contextualize Me – The Case for Context in Reinforcement Learning

While Reinforcement Learning (RL) has made great strides towards solving...
research
07/03/2019

Benchmarking Model-Based Reinforcement Learning

Model-based reinforcement learning (MBRL) is widely seen as having the p...
research
06/13/2022

Intrinsically motivated option learning: a comparative study of recent methods

Options represent a framework for reasoning across multiple time scales ...
research
07/11/2021

Out-of-Distribution Dynamics Detection: RL-Relevant Benchmarks and Results

We study the problem of out-of-distribution dynamics (OODD) detection, w...

Please sign up or login with your details

Forgot password? Click here to reset