DeepAI AI Chat
Log In Sign Up

Taylor Expansion of Discount Factors

by   Yunhao Tang, et al.

In practical reinforcement learning (RL), the discount factor used for estimating value functions often differs from that used for defining the evaluation objective. In this work, we study the effect that this discrepancy of discount factors has during learning, and discover a family of objectives that interpolate value functions of two distinct discount factors. Our analysis suggests new ways for estimating value functions and performing policy optimization updates, which demonstrate empirical performance gains. This framework also leads to new insights on commonly-used deep RL heuristic modifications to policy optimization algorithms.


page 1

page 2

page 3

page 4


Kalman meets Bellman: Improving Policy Evaluation through Value Tracking

Policy evaluation is a key process in Reinforcement Learning (RL). It as...

Offline Policy Optimization in RL with Variance Regularizaton

Learning policies from fixed offline datasets is a key challenge to scal...

Proximal Deterministic Policy Gradient

This paper introduces two simple techniques to improve off-policy Reinfo...

Taylor Expansion Policy Optimization

In this work, we investigate the application of Taylor expansions in rei...

Discovering Reinforcement Learning Algorithms

Reinforcement learning (RL) algorithms update an agent's parameters acco...

Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation

Model-agnostic meta-reinforcement learning requires estimating the Hessi...

Hyperbolic Discounting and Learning over Multiple Horizons

Reinforcement learning (RL) typically defines a discount factor as part ...