Learning Intrinsic Symbolic Rewards in Reinforcement Learning

10/08/2020
by   Hassam Sheikh, et al.
3

Learning effective policies for sparse objectives is a key challenge in Deep Reinforcement Learning (RL). A common approach is to design task-related dense rewards to improve task learnability. While such rewards are easily interpreted, they rely on heuristics and domain expertise. Alternate approaches that train neural networks to discover dense surrogate rewards avoid heuristics, but are high-dimensional, black-box solutions offering little interpretability. In this paper, we present a method that discovers dense rewards in the form of low-dimensional symbolic trees - thus making them more tractable for analysis. The trees use simple functional operators to map an agent's observations to a scalar reward, which then supervises the policy gradient learning of a neural network policy. We test our method on continuous action spaces in Mujoco and discrete action spaces in Atari and Pygame environments. We show that the discovered dense rewards are an effective signal for an RL policy to solve the benchmark tasks. Notably, we significantly outperform a widely used, contemporary neural-network based reward-discovery algorithm in all environments considered.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

04/21/2020

SIBRE: Self Improvement Based REwards for Reinforcement Learning

We propose a generic reward shaping approach for improving rate of conve...
11/28/2016

Improving Policy Gradient by Exploring Under-appreciated Rewards

This paper presents a novel form of policy gradient for model-free reinf...
09/30/2021

Reinforcement Learning for Classical Planning: Viewing Heuristics as Dense Reward Generators

Recent advances in reinforcement learning (RL) have led to a growing int...
05/31/2016

VIME: Variational Information Maximizing Exploration

Scalable and effective exploration remains a key challenge in reinforcem...
10/23/2020

Learning Guidance Rewards with Trajectory-space Smoothing

Long-term temporal credit assignment is an important challenge in deep r...
06/16/2021

Unbiased Methods for Multi-Goal Reinforcement Learning

In multi-goal reinforcement learning (RL) settings, the reward for each ...
09/16/2021

Interpretable Local Tree Surrogate Policies

High-dimensional policies, such as those represented by neural networks,...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.