Reward prediction for representation learning and reward shaping

05/07/2021
by   Hlynur Davíð Hlynsson, et al.
10

One of the fundamental challenges in reinforcement learning (RL) is the one of data efficiency: modern algorithms require a very large number of training samples, especially compared to humans, for solving environments with high-dimensional observations. The severity of this problem is increased when the reward signal is sparse. In this work, we propose learning a state representation in a self-supervised manner for reward prediction. The reward predictor learns to estimate either a raw or a smoothed version of the true reward signal in environment with a single, terminating, goal state. We augment the training of out-of-the-box RL agents by shaping the reward using our reward predictor during policy learning. Using our representation for preprocessing high-dimensional observations, as well as using the predictor for reward shaping, is shown to significantly enhance Actor Critic using Kronecker-factored Trust Region and Proximal Policy Optimization in single-goal environments with visual inputs.

READ FULL TEXT

page 10

page 11

page 12

page 18

page 19

page 20

page 21

page 22

research
08/26/2022

Visual processing in context of reinforcement learning

Although deep reinforcement learning (RL) has recently enjoyed many succ...
research
07/26/2019

A Unified Bellman Optimality Principle Combining Reward Maximization and Empowerment

Empowerment is an information-theoretic method that can be used to intri...
research
05/09/2018

Reward Estimation for Variance Reduction in Deep Reinforcement Learning

In reinforcement learning (RL), stochastic environments can make learnin...
research
06/30/2022

Denoised MDPs: Learning World Models Better Than the World Itself

The ability to separate signal from noise, and reason with clean abstrac...
research
10/07/2022

How to Enable Uncertainty Estimation in Proximal Policy Optimization

While deep reinforcement learning (RL) agents have showcased strong resu...
research
11/07/2022

Reward-Predictive Clustering

Recent advances in reinforcement-learning research have demonstrated imp...
research
12/25/2022

Novel Reinforcement Learning Algorithm for Suppressing Synchronization in Closed Loop Deep Brain Stimulators

Parkinson's disease is marked by altered and increased firing characteri...

Please sign up or login with your details

Forgot password? Click here to reset