Discount Factor as a Regularizer in Reinforcement Learning

07/04/2020
by   Ron Amit, et al.
12

Specifying a Reinforcement Learning (RL) task involves choosing a suitable planning horizon, which is typically modeled by a discount factor. It is known that applying RL algorithms with a lower discount factor can act as a regularizer, improving performance in the limited data regime. Yet the exact nature of this regularizer has not been investigated. In this work, we fill in this gap. For several Temporal-Difference (TD) learning methods, we show an explicit equivalence between using a reduced discount factor and adding an explicit regularization term to the algorithm's loss. Motivated by the equivalence, we empirically study this technique compared to standard L_2 regularization by extensive experiments in discrete and continuous domains, using tabular and functional representations. Our experiments suggest the regularization effectiveness is strongly related to properties of the available data, such as size, distribution, and mixing rate.

READ FULL TEXT
research
08/17/2022

Choquet regularization for reinforcement learning

We propose Choquet regularizers to measure and manage the level of explo...
research
10/21/2022

Bridging the Gap Between Target Networks and Functional Regularization

Bootstrapping is behind much of the successes of Deep Reinforcement Lear...
research
12/09/2021

DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization

Despite overparameterization, deep networks trained via supervised learn...
research
03/14/2021

Offline Reinforcement Learning with Fisher Divergence Critic Regularization

Many modern approaches to offline Reinforcement Learning (RL) utilize be...
research
10/17/2021

Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration, Convergence, and Stabilization

Anderson mixing has been heuristically applied to reinforcement learning...
research
06/20/2023

The Unintended Consequences of Discount Regularization: Improving Regularization in Certainty Equivalence Reinforcement Learning

Discount regularization, using a shorter planning horizon when calculati...
research
11/01/2018

Temporal Regularization in Markov Decision Process

Several applications of Reinforcement Learning suffer from instability d...

Please sign up or login with your details

Forgot password? Click here to reset