Distributional Robustness and Regularization in Reinforcement Learning

03/05/2020
by   Esther Derman, et al.
0

Distributionally Robust Optimization (DRO) has enabled to prove the equivalence between robustness and regularization in classification and regression, thus providing an analytical reason why regularization generalizes well in statistical learning. Although DRO's extension to sequential decision-making overcomes external uncertainty through the robust Markov Decision Process (MDP) setting, the resulting formulation is hard to solve, especially on large domains. On the other hand, existing regularization methods in reinforcement learning only address internal uncertainty due to stochasticity. Our study aims to facilitate robust reinforcement learning by establishing a dual relation between robust MDPs and regularization. We introduce Wasserstein distributionally robust MDPs and prove that they hold out-of-sample performance guarantees. Then, we introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function. We extend the result to linear value function approximation for large state spaces. Our approach provides an alternative formulation of robustness with guaranteed finite-sample performance. Moreover, it suggests using regularization as a practical tool for dealing with external uncertainty in reinforcement learning methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/12/2023

Twice Regularized Markov Decision Processes: The Equivalence between Robustness and Regularization

Robust Markov decision processes (MDPs) aim to handle changing or partia...
research
08/20/2021

Distributionally Robust Learning

This monograph develops a comprehensive statistical learning framework t...
research
11/01/2018

Temporal Regularization in Markov Decision Process

Several applications of Reinforcement Learning suffer from instability d...
research
05/27/2019

Distributionally Robust Optimization and Generalization in Kernel Methods

Distributionally robust optimization (DRO) has attracted attention in ma...
research
10/12/2021

Twice regularized MDPs and the equivalence between robustness and regularization

Robust Markov decision processes (MDPs) aim to handle changing or partia...
research
07/15/2022

Set-based value operators for non-stationary Markovian environments

This paper analyzes finite state Markov Decision Processes (MDPs) with u...
research
05/25/2020

Dynamic Value Estimation for Single-Task Multi-Scene Reinforcement Learning

Training deep reinforcement learning agents on environments with multipl...

Please sign up or login with your details

Forgot password? Click here to reset