On Learning Intrinsic Rewards for Policy Gradient Methods

04/17/2018
by   Zeyu Zheng, et al.
0

In many sequential decision making tasks, it is challenging to design reward functions that help an RL agent efficiently learn behavior that is considered good by the agent designer. A number of different formulations of the reward-design problem, or close variants thereof, have been proposed in the literature. In this paper we build on the Optimal Rewards Framework of Singh et.al. that defines the optimal intrinsic reward function as one that when used by an RL agent achieves behavior that optimizes the task-specifying or extrinsic reward function. Previous work in this framework has shown how good intrinsic reward functions can be learned for lookahead search based planning agents. Whether it is possible to learn intrinsic reward functions for learning agents remains an open problem. In this paper we derive a novel algorithm for learning intrinsic rewards for policy-gradient based learning agents. We compare the performance of an augmented agent that uses our algorithm to provide additive intrinsic rewards to an A2C-based policy learner (for Atari games) and a PPO-based policy learner (for Mujoco domains) with a baseline agent that uses the same policy learners but with only extrinsic rewards. Our results show improved performance on most but not all of the domains.

READ FULL TEXT

page 6

page 7

page 8

research
12/11/2019

What Can Learned Intrinsic Rewards Capture?

Reinforcement learning agents can include different components, such as ...
research
02/13/2018

Evolved Policy Gradients

We propose a meta-learning approach for learning gradient-based reinforc...
research
12/21/2020

Evaluating Agents without Rewards

Reinforcement learning has enabled agents to solve challenging tasks in ...
research
06/19/2019

Adapting Behaviour via Intrinsic Reward: A Survey and Empirical Study

Learning about many things can provide numerous benefits to a reinforcem...
research
06/10/2018

Deep Curiosity Loops in Social Environments

Inspired by infants' intrinsic motivation to learn, which values informa...
research
07/11/2019

Reward Advancement: Transforming Policy under Maximum Causal Entropy Principle

Many real-world human behaviors can be characterized as a sequential dec...
research
06/12/2019

Fast Task Inference with Variational Intrinsic Successor Features

It has been established that diverse behaviors spanning the controllable...

Please sign up or login with your details

Forgot password? Click here to reset