Grounding Aleatoric Uncertainty in Unsupervised Environment Design

07/11/2022
by   Minqi Jiang, et al.
0

Adaptive curricula in reinforcement learning (RL) have proven effective for producing policies robust to discrepancies between the train and test environment. Recently, the Unsupervised Environment Design (UED) framework generalized RL curricula to generating sequences of entire environments, leading to new methods with robust minimax regret properties. Problematically, in partially-observable or stochastic settings, optimal policies may depend on the ground-truth distribution over aleatoric parameters of the environment in the intended deployment setting, while curriculum learning necessarily shifts the training distribution. We formalize this phenomenon as curriculum-induced covariate shift (CICS), and describe how its occurrence in aleatoric parameters can lead to suboptimal policies. Directly sampling these parameters from the ground-truth distribution avoids the issue, but thwarts curriculum learning. We propose SAMPLR, a minimax regret UED method that optimizes the ground-truth utility function, even when the underlying training data is biased due to CICS. We prove, and validate on challenging domains, that our approach preserves optimality under the ground-truth distribution, while promoting robustness across the full range of environment settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/30/2022

Reinforcement Learning with Success Induced Task Prioritization

Many challenging reinforcement learning (RL) problems require designing ...
research
12/03/2020

Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design

A wide range of reinforcement learning (RL) problems - including robustn...
research
02/12/2022

Automatic Curriculum Generation for Learning Adaptation in Networking

As deep reinforcement learning (RL) showcases its strengths in networkin...
research
06/10/2021

Mode recovery in neural autoregressive sequence modeling

Despite its wide use, recent studies have revealed unexpected and undesi...
research
10/19/2022

CLUTR: Curriculum Learning via Unsupervised Task Representation Learning

Reinforcement Learning (RL) algorithms are often known for sample ineffi...
research
11/28/2016

Nonparametric General Reinforcement Learning

Reinforcement learning (RL) problems are often phrased in terms of Marko...
research
01/12/2021

Joint Demosaicking and Denoising in the Wild: The Case of Training Under Ground Truth Uncertainty

Image demosaicking and denoising are the two key fundamental steps in di...

Please sign up or login with your details

Forgot password? Click here to reset