Teach Biped Robots to Walk via Gait Principles and Reinforcement Learning with Adversarial Critics

10/22/2019
by   Kuangen Zhang, et al.
0

Controlling a biped robot to walk stably is a challenging task considering its nonlinearity and hybrid dynamics. Reinforcement learning can address these issues by directly mapping the observed states to optimal actions that maximize the cumulative reward. However, the local minima caused by unsuitable rewards and the overestimation of the cumulative reward impede the maximization of the cumulative reward. To increase the cumulative reward, this paper designs a gait reward based on walking principles, which compensates the local minima for unnatural motions. Besides, an Adversarial Twin Delayed Deep Deterministic (ATD3) policy gradient algorithm with a recurrent neural network (RNN) is proposed to further boost the cumulative reward by mitigating the overestimation of the cumulative reward. Experimental results in the Roboschool Walker2d and Webots Atlas simulators indicate that the test rewards increase by 23.50 increase by 15.96 that the ATD3_RNN decreases the error of estimating cumulative reward from 19.86 the biped robot trained by the gait reward and ATD3_RNN increases by over 69.23 cumulative reward and teach biped robots to walk better.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

page 7

page 8

research
07/11/2019

Shapley Q-value: A Local Reward Approach to Solve Global Reward Games

Cooperative game is a critical research area in multi-agent reinforcemen...
research
07/16/2018

Bipedal Walking Robot using Deep Deterministic Policy Gradient

Machine learning algorithms have found several applications in the field...
research
06/24/2019

Using Human Ratings for Feedback Control: A Supervised Learning Approach with Application to Rehabilitation Robotics

This paper presents a method for tailoring a parametric controller based...
research
06/23/2020

Risk-Sensitive Reinforcement Learning: a Martingale Approach to Reward Uncertainty

We introduce a novel framework to account for sensitivity to rewards unc...
research
05/20/2022

Adversarial Body Shape Search for Legged Robots

We propose an evolutionary computation method for an adversarial attack ...
research
02/12/2018

ReinforceWalk: Learning to Walk in Graph with Monte Carlo Tree Search

Learning to walk over a graph towards a target node for a given input qu...
research
12/20/2022

Settling the Reward Hypothesis

The reward hypothesis posits that, "all of what we mean by goals and pur...

Please sign up or login with your details

Forgot password? Click here to reset