Off-Policy Reinforcement Learning with Delayed Rewards

06/22/2021
by   Beining Han, et al.
4

We study deep reinforcement learning (RL) algorithms with delayed rewards. In many real-world tasks, instant rewards are often not readily accessible or even defined immediately after the agent performs actions. In this work, we first formally define the environment with delayed rewards and discuss the challenges raised due to the non-Markovian nature of such environments. Then, we introduce a general off-policy RL framework with a new Q-function formulation that can handle the delayed rewards with theoretical convergence guarantees. For practical tasks with high dimensional state spaces, we further introduce the HC-decomposition rule of the Q-function in our framework which naturally leads to an approximation scheme that helps boost the training efficiency and stability. We finally conduct extensive experiments to demonstrate the superior performance of our algorithms over the existing work and their variants.

READ FULL TEXT
research
04/21/2020

SIBRE: Self Improvement Based REwards for Reinforcement Learning

We propose a generic reward shaping approach for improving rate of conve...
research
02/18/2023

HOPE: Human-Centric Off-Policy Evaluation for E-Learning and Healthcare

Reinforcement learning (RL) has been extensively researched for enhancin...
research
04/14/2021

GridToPix: Training Embodied Agents with Minimal Supervision

While deep reinforcement learning (RL) promises freedom from hand-labele...
research
03/07/2020

Convergence of Q-value in case of Gaussian rewards

In this paper, as a study of reinforcement learning, we converge the Q f...
research
09/06/2018

Model-Based Stabilisation of Deep Reinforcement Learning

Though successful in high-dimensional domains, deep reinforcement learni...
research
05/17/2021

Generic Itemset Mining Based on Reinforcement Learning

One of the biggest problems in itemset mining is the requirement of deve...
research
10/05/2020

Policy Learning Using Weak Supervision

Most existing policy learning solutions require the learning agents to r...

Please sign up or login with your details

Forgot password? Click here to reset