Teach Biped Robots to Walk via Gait Principles and Reinforcement Learning with Adversarial Critics
Controlling a biped robot to walk stably is a challenging task considering its nonlinearity and hybrid dynamics. Reinforcement learning can address these issues by directly mapping the observed states to optimal actions that maximize the cumulative reward. However, the local minima caused by unsuitable rewards and the overestimation of the cumulative reward impede the maximization of the cumulative reward. To increase the cumulative reward, this paper designs a gait reward based on walking principles, which compensates the local minima for unnatural motions. Besides, an Adversarial Twin Delayed Deep Deterministic (ATD3) policy gradient algorithm with a recurrent neural network (RNN) is proposed to further boost the cumulative reward by mitigating the overestimation of the cumulative reward. Experimental results in the Roboschool Walker2d and Webots Atlas simulators indicate that the test rewards increase by 23.50 increase by 15.96 that the ATD3_RNN decreases the error of estimating cumulative reward from 19.86 the biped robot trained by the gait reward and ATD3_RNN increases by over 69.23 cumulative reward and teach biped robots to walk better.
READ FULL TEXT