Convergence of Q-value in case of Gaussian rewards

03/07/2020
by   Konatsu Miyamoto, et al.
0

In this paper, as a study of reinforcement learning, we converge the Q function to unbounded rewards such as Gaussian distribution. From the central limit theorem, in some real-world applications it is natural to assume that rewards follow a Gaussian distribution , but existing proofs cannot guarantee convergence of the Q-function. Furthermore, in the distribution-type reinforcement learning and Bayesian reinforcement learning that have become popular in recent years, it is better to allow the reward to have a Gaussian distribution. Therefore, in this paper, we prove the convergence of the Q-function under the condition of E[r(s,a)^2]<∞, which is much more relaxed than the existing research. Finally, as a bonus, a proof of the policy gradient theorem for distributed reinforcement learning is also posted.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/21/2020

SIBRE: Self Improvement Based REwards for Reinforcement Learning

We propose a generic reward shaping approach for improving rate of conve...
research
07/04/2020

Variational Policy Gradient Method for Reinforcement Learning with General Utilities

In recent years, reinforcement learning (RL) systems with general goals ...
research
06/22/2021

Off-Policy Reinforcement Learning with Delayed Rewards

We study deep reinforcement learning (RL) algorithms with delayed reward...
research
01/11/2021

Independent Policy Gradient Methods for Competitive Reinforcement Learning

We obtain global, non-asymptotic convergence guarantees for independent ...
research
11/03/2021

Proximal Policy Optimization with Continuous Bounded Action Space via the Beta Distribution

Reinforcement learning methods for continuous control tasks have evolved...
research
05/06/2019

Deep Ordinal Reinforcement Learning

Reinforcement learning usually makes use of numerical rewards, which hav...
research
02/11/2019

Performance Dynamics and Termination Errors in Reinforcement Learning: A Unifying Perspective

In reinforcement learning, a decision needs to be made at some point as ...

Please sign up or login with your details

Forgot password? Click here to reset