Making Deep Q-learning methods robust to time discretization

01/28/2019
by   Corentin Tallec, et al.
14

Despite remarkable successes, Deep Reinforcement Learning (DRL) is not robust to hyperparameterization, implementation details, or small environment changes (Henderson et al. 2017, Zhang et al. 2018). Overcoming such sensitivity is key to making DRL applicable to real world problems. In this paper, we identify sensitivity to time discretization in near continuous-time environments as a critical factor; this covers, e.g., changing the number of frames per second, or the action frequency of the controller. Empirically, we find that Q-learning-based approaches such as Deep Q- learning (Mnih et al., 2015) and Deep Deterministic Policy Gradient (Lillicrap et al., 2015) collapse with small time steps. Formally, we prove that Q-learning does not exist in continuous time. We detail a principled way to build an off-policy RL algorithm that yields similar performances over a wide range of time discretizations, and confirm this robustness empirically.

READ FULL TEXT

page 7

page 8

page 18

page 19

research
07/02/2022

q-Learning in Continuous Time

We study the continuous-time counterpart of Q-learning for reinforcement...
research
11/24/2021

A comment on stabilizing reinforcement learning

This is a short comment on the paper "Asymptotically Stable Adaptive-Opt...
research
12/12/2020

Faster Policy Learning with Continuous-Time Gradients

We study the estimation of policy gradients for continuous-time systems ...
research
06/14/2021

On-Policy Deep Reinforcement Learning for the Average-Reward Criterion

We develop theory and algorithms for average-reward on-policy Reinforcem...
research
03/09/2020

Divided Differences, Falling Factorials, and Discrete Splines: Another Look at Trend Filtering and Related Problems

This paper serves as a postscript of sorts to Tibshirani (2014); Wang et...
research
08/16/2021

Identifying and Exploiting Structures for Reliable Deep Learning

Deep learning research has recently witnessed an impressively fast-paced...
research
11/06/2021

Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods

In reinforcement learning, continuous time is often discretized by a tim...

Please sign up or login with your details

Forgot password? Click here to reset