Learning values across many orders of magnitude

02/24/2016
by   Hado van Hasselt, et al.
0

Most learning algorithms are not invariant to the scale of the function that is being approximated. We propose to adaptively normalize the targets used in learning. This is useful in value-based reinforcement learning, where the magnitude of appropriate value approximations can change over time when we update the policy of behavior. Our main motivation is prior work on learning to play Atari games, where the rewards were all clipped to a predetermined range. This clipping facilitates learning across many different games with a single learning algorithm, but a clipped reward function can result in qualitatively different behavior. Using the adaptive normalization we can remove this domain-specific heuristic without diminishing overall performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/16/2016

Reinforcement Learning approach for Real Time Strategy Games Battle city and S3

In this paper we proposed reinforcement learning algorithms with the gen...
research
07/30/2019

Reward Learning for Efficient Reinforcement Learning in Extractive Document Summarisation

Document summarisation can be formulated as a sequential decision-making...
research
05/06/2023

A Novel Reward Shaping Function for Single-Player Mahjong

Mahjong is a complex game with an intractably large state space with ext...
research
05/24/2018

Meta-Gradient Reinforcement Learning

The goal of reinforcement learning algorithms is to estimate and/or opti...
research
05/11/2021

Return-based Scaling: Yet Another Normalisation Trick for Deep RL

Scaling issues are mundane yet irritating for practitioners of reinforce...
research
05/16/2022

Qualitative Differences Between Evolutionary Strategies and Reinforcement Learning Methods for Control of Autonomous Agents

In this paper we analyze the qualitative differences between evolutionar...
research
02/24/2021

Combining Off and On-Policy Training in Model-Based Reinforcement Learning

The combination of deep learning and Monte Carlo Tree Search (MCTS) has ...

Please sign up or login with your details

Forgot password? Click here to reset