Your Policy Regularizer is Secretly an Adversary

03/23/2022
by   Rob Brekelmans, et al.
0

Policy regularization methods such as maximum entropy regularization are widely used in reinforcement learning to improve the robustness of a learned policy. In this paper, we show how this robustness arises from hedging against worst-case perturbations of the reward function, which are chosen from a limited set by an imagined adversary. Using convex duality, we characterize this robust set of adversarial reward perturbations under KL and alpha-divergence regularization, which includes Shannon and Tsallis entropy regularization as special cases. Importantly, generalization guarantees can be given within this robust set. We provide detailed discussion of the worst-case reward perturbations, and present intuitive empirical examples to illustrate this robustness and its relationship with generalization. Finally, we discuss how our analysis complements and extends previous results on adversarial reward robustness and path consistency optimality conditions.

READ FULL TEXT
research
01/18/2021

Regularized Policies are Reward Robust

Entropic regularization of policies in Reinforcement Learning (RL) is a ...
research
01/28/2022

Do You Need the Entropy Reward (in Practice)?

Maximum entropy (MaxEnt) RL maximizes a combination of the original task...
research
05/16/2022

Enforcing KL Regularization in General Tsallis Entropy Reinforcement Learning via Advantage Learning

Maximum Tsallis entropy (MTE) framework in reinforcement learning has ga...
research
06/18/2019

Robust Reinforcement Learning for Continuous Control with Model Misspecification

We provide a framework for incorporating robustness -- to perturbations ...
research
01/26/2023

Policy Optimization with Robustness Certificates

We present a policy optimization framework in which the learned policy c...
research
08/24/2022

Entropy Regularization for Population Estimation

Entropy regularization is known to improve exploration in sequential dec...
research
11/17/2017

Calibration of Distributionally Robust Empirical Optimization Models

In this paper, we study the out-of-sample properties of robust empirical...

Please sign up or login with your details

Forgot password? Click here to reset