Hyperbolically-Discounted Reinforcement Learning on Reward-Punishment Framework

06/03/2021
by   Taisuke Kobayashi, et al.
0

This paper proposes a new reinforcement learning with hyperbolic discounting. Combining a new temporal difference error with the hyperbolic discounting in recursive manner and reward-punishment framework, a new scheme to learn the optimal policy is derived. In simulations, it is found that the proposal outperforms the standard reinforcement learning, although the performance depends on the design of reward and punishment. In addition, the averages of discount factors w.r.t. reward and punishment are different from each other, like a sign effect in animal behaviors.

READ FULL TEXT

page 1

page 2

research
06/30/2019

Detecting Spiky Corruption in Markov Decision Processes

Current reinforcement learning methods fail if the reward function is im...
research
07/28/2023

Curiosity-Driven Reinforcement Learning based Low-Level Flight Control

Curiosity is one of the main motives in many of the natural creatures wi...
research
08/24/2023

Intentionally-underestimated Value Function at Terminal State for Temporal-difference Learning with Mis-designed Reward

Robot control using reinforcement learning has become popular, but its l...
research
05/18/2015

A Definition of Happiness for Reinforcement Learning Agents

What is happiness for reinforcement learning agents? We seek a formal de...
research
10/19/2022

Scaling Laws for Reward Model Overoptimization

In reinforcement learning from human feedback, it is common to optimize ...
research
04/22/2022

Reward Reports for Reinforcement Learning

The desire to build good systems in the face of complex societal effects...
research
12/30/2021

Self Reward Design with Fine-grained Interpretability

Transparency and fairness issues in Deep Reinforcement Learning may stem...

Please sign up or login with your details

Forgot password? Click here to reset