Dealing with Sparse Rewards in Continuous Control Robotics via Heavy-Tailed Policies

06/12/2022
by   Souradip Chakraborty, et al.
7

In this paper, we present a novel Heavy-Tailed Stochastic Policy Gradient (HT-PSG) algorithm to deal with the challenges of sparse rewards in continuous control problems. Sparse reward is common in continuous control robotics tasks such as manipulation and navigation, and makes the learning problem hard due to non-trivial estimation of value functions over the state space. This demands either reward shaping or expert demonstrations for the sparse reward environment. However, obtaining high-quality demonstrations is quite expensive and sometimes even impossible. We propose a heavy-tailed policy parametrization along with a modified momentum-based policy gradient tracking scheme (HT-SPG) to induce a stable exploratory behavior to the algorithm. The proposed algorithm does not require access to expert demonstrations. We test the performance of HT-SPG on various benchmark tasks of continuous control with sparse rewards such as 1D Mario, Pathological Mountain Car, Sparse Pendulum in OpenAI Gym, and Sparse MuJoCo environments (Hopper-v2). We show consistent performance improvement across all tasks in terms of high average cumulative reward. HT-SPG also demonstrates improved convergence speed with minimum samples, thereby emphasizing the sample efficiency of our proposed algorithm.

READ FULL TEXT

page 1

page 7

page 8

research
04/22/2020

Policy Gradient from Demonstration and Curiosity

With reinforcement learning, an agent could learn complex behaviors from...
research
07/08/2022

HTRON:Efficient Outdoor Navigation with Sparse Rewards via Heavy Tailed Adaptive Reinforce Algorithm

We present a novel approach to improve the performance of deep reinforce...
research
02/20/2021

On Proximal Policy Optimization's Heavy-tailed Gradients

Modern policy gradient algorithms, notably Proximal Policy Optimization ...
research
01/28/2022

On the Hidden Biases of Policy Mirror Ascent in Continuous Action Spaces

We focus on parameterized policy search for reinforcement learning over ...
research
07/27/2017

Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards

We propose a general and model-free approach for Reinforcement Learning ...
research
02/14/2022

Strategy Discovery and Mixture in Lifelong Learning from Heterogeneous Demonstration

Learning from Demonstration (LfD) approaches empower end-users to teach ...
research
07/20/2023

Exploring reinforcement learning techniques for discrete and continuous control tasks in the MuJoCo environment

We leverage the fast physics simulator, MuJoCo to run tasks in a continu...

Please sign up or login with your details

Forgot password? Click here to reset