Taylor Expansion of Discount Factors

06/11/2021
by   Yunhao Tang, et al.
0

In practical reinforcement learning (RL), the discount factor used for estimating value functions often differs from that used for defining the evaluation objective. In this work, we study the effect that this discrepancy of discount factors has during learning, and discover a family of objectives that interpolate value functions of two distinct discount factors. Our analysis suggests new ways for estimating value functions and performing policy optimization updates, which demonstrate empirical performance gains. This framework also leads to new insights on commonly-used deep RL heuristic modifications to policy optimization algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

02/17/2020

Kalman meets Bellman: Improving Policy Evaluation through Value Tracking

Policy evaluation is a key process in Reinforcement Learning (RL). It as...
08/03/2020

Proximal Deterministic Policy Gradient

This paper introduces two simple techniques to improve off-policy Reinfo...
03/13/2020

Taylor Expansion Policy Optimization

In this work, we investigate the application of Taylor expansions in rei...
07/17/2020

Discovering Reinforcement Learning Algorithms

Reinforcement learning (RL) algorithms update an agent's parameters acco...
06/24/2021

Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation

Model-agnostic meta-reinforcement learning requires estimating the Hessi...
07/28/2020

Munchausen Reinforcement Learning

Bootstrapping is a core mechanism in Reinforcement Learning (RL). Most a...
02/19/2019

Hyperbolic Discounting and Learning over Multiple Horizons

Reinforcement learning (RL) typically defines a discount factor as part ...