Gap-Increasing Policy Evaluation for Efficient and Noise-Tolerant Reinforcement Learning

06/18/2019
by   Tadashi Kozuno, et al.
0

In real-world applications of reinforcement learning (RL), noise from inherent stochasticity of environments is inevitable. However, current policy evaluation algorithms, which plays a key role in many RL algorithms, are either prone to noise or inefficient. To solve this issue, we introduce a novel policy evaluation algorithm, which we call Gap-increasing RetrAce Policy Evaluation (GRAPE). It leverages two recent ideas: (1) gap-increasing value update operators in advantage learning for noise-tolerance and (2) off-policy eligibility trace in Retrace algorithm for efficient learning. We provide detailed theoretical analysis of the new algorithm that shows its efficiency and noise-tolerance inherited from Retrace and advantage learning. Furthermore, our analysis shows that GRAPE's learning is significantly efficient than that of a simple learning-rate-based approach while keeping the same level of noise-tolerance. We applied GRAPE to control problems and obtained experimental results supporting our theoretical analysis.

READ FULL TEXT
research
12/13/2019

More Efficient Off-Policy Evaluation through Regularized Targeted Learning

We study the problem of off-policy evaluation (OPE) in Reinforcement Lea...
research
08/02/2023

Direct Gradient Temporal Difference Learning

Off-policy learning enables a reinforcement learning (RL) agent to reaso...
research
06/07/2022

On the Role of Discount Factor in Offline Reinforcement Learning

Offline reinforcement learning (RL) enables effective learning from prev...
research
10/01/2022

Integrating Conventional Headway Control with Reinforcement Learning to Avoid Bus Bunching

Bus bunching is a natural-occurring phenomenon that undermines the effic...
research
05/29/2021

An algorithm for identifying eigenvectors exhibiting strong spatial localization

We introduce an approach for exploring eigenvector localization phenomen...
research
11/06/2019

Improving reinforcement learning algorithms: towards optimal learning rate policies

This paper investigates to what extent we can improve reinforcement lear...
research
05/29/2023

Privileged Knowledge Distillation for Sim-to-Real Policy Generalization

Reinforcement Learning (RL) has recently achieved remarkable success in ...

Please sign up or login with your details

Forgot password? Click here to reset