Learning Sample-Efficient Target Reaching for Mobile Robots

03/05/2018
by   Arbaaz Khan, et al.
0

In this paper, we propose a novel architecture and a self-supervised policy gradient algorithm, which employs unsupervised auxiliary tasks to enable a mobile robot to learn how to navigate to a given goal. The dependency on the global information is eliminated by providing only sparse range-finder measurements to the robot. The partially observable planning problem is addressed by splitting it into a hierarchical process. We use convolutional networks to plan locally, and a differentiable memory to provide information about past time steps in the trajectory. These modules, combined in our network architecture, produce globally consistent plans. The sparse reward problem is mitigated by our modified policy gradient algorithm. We model the robots uncertainty with unsupervised tasks to force exploration. The novel architecture we propose with the modified version of the policy gradient algorithm allows our robot to reach the goal in a sample efficient manner, which is orders of magnitude faster than the current state of the art policy gradient algorithm. Simulation and experimental results are provided to validate the proposed approach.

READ FULL TEXT

page 1

page 5

page 7

page 8

research
11/12/2019

On Policy Gradients

The goal of policy gradient approaches is to find a policy in a given cl...
research
12/03/2022

Policy Learning for Active Target Tracking over Continuous SE(3) Trajectories

This paper proposes a novel model-based policy gradient algorithm for tr...
research
10/05/2021

Quasi-Newton policy gradient algorithms

Policy gradient algorithms have been widely applied to reinforcement lea...
research
03/13/2018

Learning to Explore with Meta-Policy Gradient

The performance of off-policy learning, including deep Q-learning and de...
research
01/30/2023

SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree Search

Despite the popularity of policy gradient methods, they are known to suf...
research
06/13/2023

Soft Soil Gait Planning and Control for Biped Robot using Deep Deterministic Policy Gradient Approach

Biped robots have plenty of benefits over wheeled, quadruped, or hexapod...

Please sign up or login with your details

Forgot password? Click here to reset