An Information-Theoretic Optimality Principle for Deep Reinforcement Learning

08/06/2017
by   Felix Leibfried, et al.
0

In this paper, we methodologically address the problem of cumulative reward overestimation in deep reinforcement learning. We generalise notions from information-theoretic bounded rationality to handle high-dimensional state spaces efficiently. The resultant algorithm encompasses a wide range of learning outcomes that can be demonstrated by tuning a Lagrange multiplier that intrinsically penalises rewards. We show that deep Q-networks arise as a special case of our proposed approach. We introduce a novel scheduling scheme for bounded-rational behaviour that ensures sample efficiency and robustness. In experiments on Atari games, we show that our algorithm outperforms other deep reinforcement learning algorithms (e.g., deep and double deep Q-networks) in terms of both game-play performance and sample complexity.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset