Time-Aware Q-Networks: Resolving Temporal Irregularity for Deep Reinforcement Learning

05/06/2021
by   Yeo Jin Kim, et al.
0

Deep Reinforcement Learning (DRL) has shown outstanding performance on inducing effective action policies that maximize expected long-term return on many complex tasks. Much of DRL work has been focused on sequences of events with discrete time steps and ignores the irregular time intervals between consecutive events. Given that in many real-world domains, data often consists of temporal sequences with irregular time intervals, and it is important to consider the time intervals between temporal events to capture latent progressive patterns of states. In this work, we present a general Time-Aware RL framework: Time-aware Q-Networks (TQN), which takes into account physical time intervals within a deep RL framework. TQN deals with time irregularity from two aspects: 1) elapsed time in the past and an expected next observation time for time-aware state approximation, and 2) action time window for the future for time-aware discounting of rewards. Experimental results show that by capturing the underlying structures in the sequences with time irregularities from both aspects, TQNs significantly outperform DQN in four types of contexts with irregular time intervals. More specifically, our results show that in classic RL tasks such as CartPole and MountainCar and Atari benchmark with randomly segmented time intervals, time-aware discounting alone is more important while in the real-world tasks such as nuclear reactor operation and septic patient treatment with intrinsic time intervals, both time-aware state and time-aware discounting are crucial. Moreover, to improve the agent's learning capacity, we explored three boosting methods: Double networks, Dueling networks, and Prioritized Experience Replay, and our results show that for the two real-world tasks, combining all three boosting methods with TQN is especially effective.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2020

Soft Hindsight Experience Replay

Efficient learning in the environment with sparse rewards is one of the ...
research
07/13/2022

Modeling Long-term Dependencies and Short-term Correlations in Patient Journey Data with Temporal Attention Networks for Health Prediction

Building models for health prediction based on Electronic Health Records...
research
09/28/2021

Deep Reinforcement Learning with Adjustments

Deep reinforcement learning (RL) algorithms can learn complex policies t...
research
09/30/2018

Deep Quality-Value (DQV) Learning

We introduce a novel Deep Reinforcement Learning (DRL) algorithm called ...
research
02/18/2018

Sim-To-Real Optimization Of Complex Real World Mobile Network with Imperfect Information via Deep Reinforcement Learning from Self-play

Mobile network that millions of people use every day is one of the most ...
research
04/29/2021

Hypernetwork Dismantling via Deep Reinforcement Learning

Network dismantling aims to degrade the connectivity of a network by rem...
research
10/01/2019

Lineage-Aware Temporal Windows: Supporting Set Operations in Temporal-Probabilistic Databases

In temporal-probabilistic (TP) databases, the combination of the tempora...

Please sign up or login with your details

Forgot password? Click here to reset