Hybrid and dynamic policy gradient optimization for bipedal robot locomotion

07/05/2021
by   Changxin Huang, et al.
0

Controlling a non-statically bipedal robot is challenging due to the complex dynamics and multi-criterion optimization involved. Recent works have demonstrated the effectiveness of deep reinforcement learning (DRL) for simulation and physically implemented bipeds. In these methods, the rewards from different criteria are normally summed to learn a single value function. However, this may cause the loss of dependency information between hybrid rewards and lead to a sub-optimal policy. In this work, we propose a novel policy gradient reinforcement learning for biped locomotion, allowing the control policy to be simultaneously optimized by multiple criteria using a dynamic mechanism. Our proposed method applies a multi-head critic to learn a separate value function for each component reward function. This also leads to hybrid policy gradients. We further propose dynamic weight for hybrid policy gradients to optimize the policy with different priorities. This hybrid and dynamic policy gradient (HDPG) design makes the agent learn more efficiently. We showed that the proposed method outperforms summed-up-reward approaches and is able to transfer to physical robots. The MuJoCo results further demonstrate the effectiveness and generalization of our HDPG.

READ FULL TEXT

page 6

page 8

research
09/07/2019

Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning

Maximum entropy deep reinforcement learning (RL) methods have been demon...
research
03/22/2019

Iterative Reinforcement Learning Based Design of Dynamic Locomotion Skills for Cassie

Deep reinforcement learning (DRL) is a promising approach for developing...
research
09/09/2020

Phasic Policy Gradient

We introduce Phasic Policy Gradient (PPG), a reinforcement learning fram...
research
02/22/2018

Structured Control Nets for Deep Reinforcement Learning

In recent years, Deep Reinforcement Learning has made impressive advance...
research
09/16/2022

Value Summation: A Novel Scoring Function for MPC-based Model-based Reinforcement Learning

This paper proposes a novel scoring function for the planning module of ...
research
09/04/2020

Policy Gradient Reinforcement Learning for Policy Represented by Fuzzy Rules: Application to Simulations of Speed Control of an Automobile

A method of a fusion of fuzzy inference and policy gradient reinforcemen...
research
02/14/2021

Sparse Attention Guided Dynamic Value Estimation for Single-Task Multi-Scene Reinforcement Learning

Training deep reinforcement learning agents on environments with multipl...

Please sign up or login with your details

Forgot password? Click here to reset