Regularized Policies are Reward Robust

01/18/2021
by   Hisham Husain, et al.
0

Entropic regularization of policies in Reinforcement Learning (RL) is a commonly used heuristic to ensure that the learned policy explores the state-space sufficiently before overfitting to a local optimal policy. The primary motivation for using entropy is for exploration and disambiguating optimal policies; however, the theoretical effects are not entirely understood. In this work, we study the more general regularized RL objective and using Fenchel duality; we derive the dual problem which takes the form of an adversarial reward problem. In particular, we find that the optimal policy found by a regularized objective is precisely an optimal policy of a reinforcement learning problem under a worst-case adversarial reward. Our result allows us to reinterpret the popular entropic regularization scheme as a form of robustification. Furthermore, due to the generality of our results, we apply to other existing regularization schemes. Our results thus give insights into the effects of regularization of policies and deepen our understanding of exploration through robust rewards at large.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/20/2019

On the Convergence of Approximate and Regularized Policy Iteration Schemes

Algorithms based on the entropy regularized framework, such as Soft Q-le...
research
03/23/2022

Your Policy Regularizer is Secretly an Adversary

Policy regularization methods such as maximum entropy regularization are...
research
03/02/2019

A Unified Framework for Regularized Reinforcement Learning

We propose and study a general framework for regularized Markov decision...
research
12/02/2022

Utilizing Prior Solutions for Reward Shaping and Composition in Entropy-Regularized Reinforcement Learning

In reinforcement learning (RL), the ability to utilize prior knowledge f...
research
06/20/2023

The Unintended Consequences of Discount Regularization: Improving Regularization in Certainty Equivalence Reinforcement Learning

Discount regularization, using a shorter planning horizon when calculati...
research
02/26/2019

Planning in Hierarchical Reinforcement Learning: Guarantees for Using Local Policies

We consider a settings of hierarchical reinforcement learning, in which ...
research
03/13/2018

Hierarchical Reinforcement Learning: Approximating Optimal Discounted TSP Using Local Policies

In this work, we provide theoretical guarantees for reward decomposition...

Please sign up or login with your details

Forgot password? Click here to reset