Self-Supervised Online Reward Shaping in Sparse-Reward Environments

03/08/2021
by   Farzan Memarian, et al.
0

We propose a novel reinforcement learning framework that performs self-supervised online reward shaping, yielding faster, sample efficient performance in sparse reward environments. The proposed framework alternates between updating a policy and inferring a reward function. While the policy update is done with the inferred, potentially dense reward function, the original sparse reward is used to provide a self-supervisory signal for the reward update by serving as an ordering over the observed trajectories. The proposed framework is based on the theory that altering the reward function does not affect the optimal policy of the original MDP as long as we maintain certain relations between the altered and the original reward. We name the proposed framework ClAssification-based REward Shaping (CaReS), since we learn the altered reward in a self-supervised manner using classifier based reward inference. Experimental results on several sparse-reward environments demonstrate that the proposed algorithm is not only significantly more sample efficient than the state-of-the-art baseline, but also achieves a similar sample efficiency to MDPs that use hand-designed dense reward functions.

READ FULL TEXT

page 1

page 7

research
02/22/2023

Exploration by self-supervised exploitation

Reinforcement learning can solve decision-making problems and train an a...
research
10/12/2021

Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation

We study the model-based reward-free reinforcement learning with linear ...
research
10/08/2021

Towards Sample-efficient Apprenticeship Learning from Suboptimal Demonstration

Learning from Demonstration (LfD) seeks to democratize robotics by enabl...
research
04/14/2023

Learning to Learn Group Alignment: A Self-Tuning Credo Framework with Multiagent Teams

Mixed incentives among a population with multiagent teams has been shown...
research
11/17/2020

Learning Dense Rewards for Contact-Rich Manipulation Tasks

Rewards play a crucial role in reinforcement learning. To arrive at the ...
research
08/24/2022

Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning

In real-world scenarios, reinforcement learning under sparse-reward syne...
research
11/09/2020

Reward Conditioned Neural Movement Primitives for Population Based Variational Policy Optimization

The aim of this paper is to study the reward based policy exploration pr...

Please sign up or login with your details

Forgot password? Click here to reset