Performance Dynamics and Termination Errors in Reinforcement Learning: A Unifying Perspective

02/11/2019
by   Nikki Lijing Kuang, et al.
0

In reinforcement learning, a decision needs to be made at some point as to whether it is worthwhile to carry on with the learning process or to terminate it. In many such situations, stochastic elements are often present which govern the occurrence of rewards, with the sequential occurrences of positive rewards randomly interleaved with negative rewards. For most practical learners, the learning is considered useful if the number of positive rewards always exceeds the negative ones. A situation that often calls for learning termination is when the number of negative rewards exceeds the number of positive rewards. However, while this seems reasonable, the error of premature termination, whereby termination is enacted along with the conclusion of learning failure despite the positive rewards eventually far outnumber the negative ones, can be significant. In this paper, using combinatorial analysis we study the error probability in wrongly terminating a reinforcement learning activity which undermines the effectiveness of an optimal policy, and we show that the resultant error can be quite high. Whilst we demonstrate mathematically that such errors can never be eliminated, we propose some practical mechanisms that can effectively reduce such errors. Simulation experiments have been carried out, the results of which are in close agreement with our theoretical findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/11/2019

Stochastic Reinforcement Learning

In reinforcement learning episodes, the rewards and punishments are ofte...
research
09/27/2022

Reinforcement Learning with Non-Exponential Discounting

Commonly in reinforcement learning (RL), rewards are discounted over tim...
research
05/30/2022

Reinforcement Learning with a Terminator

We present the problem of reinforcement learning with exogenous terminat...
research
08/24/2023

Intentionally-underestimated Value Function at Terminal State for Temporal-difference Learning with Mis-designed Reward

Robot control using reinforcement learning has become popular, but its l...
research
02/23/2021

State Augmented Constrained Reinforcement Learning: Overcoming the Limitations of Learning with Rewards

Constrained reinforcement learning involves multiple rewards that must i...
research
03/07/2020

Convergence of Q-value in case of Gaussian rewards

In this paper, as a study of reinforcement learning, we converge the Q f...
research
03/27/2016

Negative Learning Rates and P-Learning

We present a method of training a differentiable function approximator f...

Please sign up or login with your details

Forgot password? Click here to reset