SIBRE: Self Improvement Based REwards for Reinforcement Learning

04/21/2020
by   Somjit Nath, et al.
0

We propose a generic reward shaping approach for improving rate of convergence in reinforcement learning (RL), called Self Improvement Based REwards, or SIBRE. The approach can be used for episodic environments in conjunction with any existing RL algorithm, and consists of rewarding improvement over the agent's own past performance. We show that SIBRE converges under the same conditions as the algorithm whose reward has been modified. The new rewards help discriminate between policies when the original rewards are either weakly discriminated or sparse. Experiments show that in certain environments, this approach speeds up learning and converges to the optimal policy faster. We analyse SIBRE theoretically, and follow it up with tests on several well-known benchmark environments for reinforcement learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2023

Temporal Video-Language Alignment Network for Reward Shaping in Reinforcement Learning

Designing appropriate reward functions for Reinforcement Learning (RL) a...
research
09/12/2019

Joint Inference of Reward Machines and Policies for Reinforcement Learning

Incorporating high-level knowledge is an effective way to expedite reinf...
research
10/02/2018

Reinforcement Learning with Perturbed Rewards

Recent studies have shown the vulnerability of reinforcement learning (R...
research
10/08/2020

Learning Intrinsic Symbolic Rewards in Reinforcement Learning

Learning effective policies for sparse objectives is a key challenge in ...
research
06/22/2021

Off-Policy Reinforcement Learning with Delayed Rewards

We study deep reinforcement learning (RL) algorithms with delayed reward...
research
07/19/2023

Benchmarking Potential Based Rewards for Learning Humanoid Locomotion

The main challenge in developing effective reinforcement learning (RL) p...
research
03/07/2020

Convergence of Q-value in case of Gaussian rewards

In this paper, as a study of reinforcement learning, we converge the Q f...

Please sign up or login with your details

Forgot password? Click here to reset