Reinforcement Learning Upside Down: Don't Predict Rewards – Just Map Them to Actions

12/05/2019
by   Juergen Schmidhuber, et al.
0

We transform reinforcement learning (RL) into a form of supervised learning (SL) by turning traditional RL on its head, calling this Upside Down RL (UDRL). Standard RL predicts rewards, while UDRL instead uses rewards as task-defining inputs, together with representations of time horizons and other computable functions of historic and desired future data. UDRL learns to interpret these input observations as commands, mapping them to actions (or action probabilities) through SL on past (possibly accidental) experience. UDRL generalizes to achieve high rewards or other goals, through input commands such as: get lots of reward within at most so much time! A separate paper [61] on first experiments with UDRL shows that even a pilot version of UDRL can outperform traditional baseline algorithms on certain challenging RL problems. We also introduce a related simple but general approach for teaching a robot to imitate humans. First videotape humans imitating the robot's current behaviors, then let the robot learn through SL to map the videos (as input commands) to these behaviors, then let it generalize and imitate videos of humans executing previously unknown behavior. This Imitate-Imitator concept may actually explain why biological evolution has resulted in parents who imitate the babbling of their babies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/05/2019

Training Agents using Upside-Down Reinforcement Learning

Traditional Reinforcement Learning (RL) algorithms either predict reward...
research
05/20/2019

Perceptual Values from Observation

Imitation by observation is an approach for learning from expert demonst...
research
03/18/2020

Social navigation with human empowerment driven reinforcement learning

The next generation of mobile robots needs to be socially-compliant to b...
research
10/01/2019

Accelerated Robot Learning via Human Brain Signals

In reinforcement learning (RL), sparse rewards are a natural way to spec...
research
10/02/2018

Reinforcement Learning with Perturbed Rewards

Recent studies have shown the vulnerability of reinforcement learning (R...
research
12/19/2022

Learning Latent Representations to Co-Adapt to Humans

When robots interact with humans in homes, roads, or factories the human...
research
09/16/2018

Deep Learning with Experience Ranking Convolutional Neural Network for Robot Manipulator

Supervised learning, more specifically Convolutional Neural Networks (CN...

Please sign up or login with your details

Forgot password? Click here to reset