Taylor Expansion of Discount Factors

06/11/2021
by   Yunhao Tang, et al.
0

In practical reinforcement learning (RL), the discount factor used for estimating value functions often differs from that used for defining the evaluation objective. In this work, we study the effect that this discrepancy of discount factors has during learning, and discover a family of objectives that interpolate value functions of two distinct discount factors. Our analysis suggests new ways for estimating value functions and performing policy optimization updates, which demonstrate empirical performance gains. This framework also leads to new insights on commonly-used deep RL heuristic modifications to policy optimization algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/17/2020

Kalman meets Bellman: Improving Policy Evaluation through Value Tracking

Policy evaluation is a key process in Reinforcement Learning (RL). It as...
research
12/29/2022

Offline Policy Optimization in RL with Variance Regularizaton

Learning policies from fixed offline datasets is a key challenge to scal...
research
08/03/2020

Proximal Deterministic Policy Gradient

This paper introduces two simple techniques to improve off-policy Reinfo...
research
03/13/2020

Taylor Expansion Policy Optimization

In this work, we investigate the application of Taylor expansions in rei...
research
07/17/2020

Discovering Reinforcement Learning Algorithms

Reinforcement learning (RL) algorithms update an agent's parameters acco...
research
06/24/2021

Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation

Model-agnostic meta-reinforcement learning requires estimating the Hessi...
research
02/19/2019

Hyperbolic Discounting and Learning over Multiple Horizons

Reinforcement learning (RL) typically defines a discount factor as part ...

Please sign up or login with your details

Forgot password? Click here to reset