Evolved Policy Gradients

02/13/2018
by   Rein Houthooft, et al.
0

We propose a meta-learning approach for learning gradient-based reinforcement learning (RL) algorithms. The idea is to evolve a differentiable loss function, such that an agent, which optimizes its policy to minimize this loss, will achieve high rewards. The loss is parametrized via temporal convolutions over the agent's experience. Because this loss is highly flexible in its ability to take into account the agent's history, it enables fast task learning and eliminates the need for reward shaping at test time. Empirical results show that our evolved policy gradient algorithm achieves faster learning on several randomized environments compared to an off-the-shelf policy gradient method. Moreover, at test time, our learner optimizes only its learned loss function, and requires no explicit reward signal. In effect, the agent internalizes the reward structure, suggesting a direction toward agents that learn to solve new tasks simply from intrinsic motivation.

READ FULL TEXT
research
04/17/2018

On Learning Intrinsic Rewards for Policy Gradient Methods

In many sequential decision making tasks, it is challenging to design re...
research
02/24/2022

Learning Transferable Reward for Query Object Localization with Policy Adaptation

We propose a reinforcement learning based approach to query object local...
research
08/04/2021

Policy Gradients Incorporating the Future

Reasoning about the future – understanding how decisions in the present ...
research
04/25/2023

Loss and Reward Weighing for increased learning in Distributed Reinforcement Learning

This paper introduces two learning schemes for distributed agents in Rei...
research
05/04/2017

Deep 360 Pilot: Learning a Deep Agent for Piloting through 360° Sports Video

Watching a 360 sports video requires a viewer to continuously select a v...
research
11/29/2019

VIABLE: Fast Adaptation via Backpropagating Learned Loss

In few-shot learning, typically, the loss function which is applied at t...
research
03/05/2018

Recurrent Predictive State Policy Networks

We introduce Recurrent Predictive State Policy (RPSP) networks, a recurr...

Please sign up or login with your details

Forgot password? Click here to reset