Model-Based Stabilisation of Deep Reinforcement Learning

09/06/2018
by   Felix Leibfried, et al.
0

Though successful in high-dimensional domains, deep reinforcement learning exhibits high sample complexity and suffers from stability issues as reported by researchers and practitioners in the field. These problems hinder the application of such algorithms in real-world and safety-critical scenarios. In this paper, we take steps towards stable and efficient reinforcement learning by following a model-based approach that is known to reduce agent-environment interactions. Namely, our method augments deep Q-networks (DQNs) with model predictions for transitions, rewards, and termination flags. Having the model at hand, we then conduct a rigorous theoretical study of our algorithm and show, for the first time, convergence to a stationary point. En route, we provide a counter-example showing that 'vanilla' DQNs can diverge confirming practitioners' and researchers' experiences. Our proof is novel in its own right and can be extended to other forms of deep reinforcement learning. In particular, we believe exploiting the relation between reinforcement (with deep function approximators) and online learning can serve as a recipe for future proofs in the domain. Finally, we validate our theoretical results in 20 games from the Atari benchmark. Our results show that following the proposed model-based learning approach not only ensures convergence but leads to a reduction in sample complexity and superior performance.

READ FULL TEXT
research
08/06/2017

An Information-Theoretic Optimality Principle for Deep Reinforcement Learning

In this paper, we methodologically address the problem of cumulative rew...
research
08/11/2020

Model-Based Deep Reinforcement Learning for High-Dimensional Problems, a Survey

Deep reinforcement learning has shown remarkable success in the past few...
research
07/17/2021

High-Accuracy Model-Based Reinforcement Learning, a Survey

Deep reinforcement learning has shown remarkable success in the past few...
research
06/22/2021

Off-Policy Reinforcement Learning with Delayed Rewards

We study deep reinforcement learning (RL) algorithms with delayed reward...
research
08/07/2020

Towards Sample Efficient Agents through Algorithmic Alignment

Deep reinforcement-learning agents have demonstrated great success on va...
research
10/02/2017

Deep Abstract Q-Networks

We examine the problem of learning and planning on high-dimensional doma...
research
09/01/2022

Transformers are Sample Efficient World Models

Deep reinforcement learning agents are notoriously sample inefficient, w...

Please sign up or login with your details

Forgot password? Click here to reset