Learning walk and trot from the same objective using different types of exploration

04/28/2019
by   Zinan Liu, et al.
0

In quadruped gait learning, policy search methods that scale high dimensional continuous action spaces are commonly used. In most approaches, it is necessary to introduce prior knowledge on the gaits to limit the highly non-convex search space of the policies. In this work, we propose a new approach to encode the symmetry properties of the desired gaits, on the initial covariance of the Gaussian search distribution, allowing for strategic exploration. Using episode-based likelihood ratio policy gradient and relative entropy policy search, we learned the gaits walk and trot on a simulated quadruped. Comparing these gaits to random gaits learned by initialized diagonal covariance matrix, we show that the performance can be significantly enhanced.

READ FULL TEXT

page 5

page 6

research
02/10/2019

Diverse Exploration via Conjugate Policies for Policy Gradient Methods

We address the challenge of effective exploration while maintaining good...
research
07/09/2020

A Policy Gradient Method for Task-Agnostic Exploration

In a reward-free environment, what is a suitable intrinsic objective for...
research
11/18/2020

Cautious Bayesian Optimization for Efficient and Scalable Policy Search

Sample efficiency is one of the key factors when applying policy search ...
research
12/11/2019

Entropy Regularization with Discounted Future State Distribution in Policy Gradient Methods

The policy gradient theorem is defined based on an objective with respec...
research
06/27/2019

Quantile Regression Deep Reinforcement Learning

Policy gradient based reinforcement learning algorithms coupled with neu...
research
07/19/2021

Improving exploration in policy gradient search: Application to symbolic optimization

Many machine learning strategies designed to automate mathematical tasks...
research
01/28/2022

On the Hidden Biases of Policy Mirror Ascent in Continuous Action Spaces

We focus on parameterized policy search for reinforcement learning over ...

Please sign up or login with your details

Forgot password? Click here to reset