AI Chat AI Image Generator AI Video Text to Speech

Hindsight policy gradients

11/16/2017

∙

by Paulo Rauber, et al.

∙

∙

Goal-conditional policies allow reinforcement learning agents to pursue specific goals during different episodes. In addition to their potential to generalize desired behavior to unseen goals, such policies may also help in defining options for arbitrary subgoals, enabling higher-level planning. While trying to achieve a specific goal, an agent may also be able to exploit information about the degree to which it has achieved alternative goals. Reinforcement learning agents have only recently been endowed with such capacity for hindsight, which is highly valuable in environments with sparse rewards. In this paper, we show how hindsight can be introduced to likelihood-ratio policy gradient methods, generalizing this capacity to an entire class of highly successful algorithms. Our preliminary experiments suggest that hindsight may increase the sample efficiency of policy gradient methods.

Paulo Rauber
4 publications
Filipe Mutz
5 publications
Juergen Schmidhuber
41 publications

page 1

page 2

page 3

page 4

research

∙ 05/21/2019

Maximum Entropy-Regularized Multi-Goal Reinforcement Learning

In Multi-Goal Reinforcement Learning, an agent learns to achieve multipl...

0 Rui Zhao, et al. ∙

research

∙ 11/12/2019

On Policy Gradients

The goal of policy gradient approaches is to find a policy in a given cl...

0 Mattis Manfred Kämmerer, et al. ∙

research

∙ 06/08/2020

A Decentralized Policy Gradient Approach to Multi-task Reinforcement Learning

We develop a mathematical framework for solving multi-task reinforcement...

4 Sihan Zeng, et al. ∙

research

∙ 01/11/2021

Independent Policy Gradient Methods for Competitive Reinforcement Learning

We obtain global, non-asymptotic convergence guarantees for independent ...

0 Constantinos Daskalakis, et al. ∙

research

∙ 01/28/2022

Leveraging class abstraction for commonsense reinforcement learning via residual policy gradient methods

Enabling reinforcement learning (RL) agents to leverage a knowledge base...

0 Niklas Hopner, et al. ∙

research

∙ 07/07/2022

Hyper-Universal Policy Approximation: Learning to Generate Actions from a Single Image using Hypernets

Inspired by Gibson's notion of object affordances in human vision, we as...

0 Dimitrios C. Gklezakos, et al. ∙

research

∙ 06/18/2019

Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination

A key challenge for Multiagent RL (Reinforcement Learning) is the design...

9 Shauharda Khadka, et al. ∙