Bingham Policy Parameterization for 3D Rotations in Reinforcement Learning

02/08/2022
by   Stephen James, et al.
0

We propose a new policy parameterization for representing 3D rotations during reinforcement learning. Today in the continuous control reinforcement learning literature, many stochastic policy parameterizations are Gaussian. We argue that universally applying a Gaussian policy parameterization is not always desirable for all environments. One such case in particular where this is true are tasks that involve predicting a 3D rotation output, either in isolation, or coupled with translation as part of a full 6D pose output. Our proposed Bingham Policy Parameterization (BPP) models the Bingham distribution and allows for better rotation (quaternion) prediction over a Gaussian policy parameterization in a range of reinforcement learning tasks. We evaluate BPP on the rotation Wahba problem task, as well as a set of vision-based next-best pose robot manipulation tasks from RLBench. We hope that this paper encourages more research into developing other policy parameterization that are more suited for particular environments, rather than always assuming Gaussian.

READ FULL TEXT

page 1

page 5

page 6

research
04/12/2023

Learning Over All Contracting and Lipschitz Closed-Loops for Partially-Observed Nonlinear Systems

This paper presents a policy parameterization for learning-based control...
research
06/27/2019

Quantile Regression Deep Reinforcement Learning

Policy gradient based reinforcement learning algorithms coupled with neu...
research
07/20/2023

Reparameterized Policy Learning for Multimodal Trajectory Optimization

We investigate the challenge of parametrizing policies for reinforcement...
research
06/21/2022

Finding Optimal Policy for Queueing Models: New Parameterization

Queueing systems appear in many important real-life applications includi...
research
06/13/2019

Jacobian Policy Optimizations

Recently, natural policy gradient algorithms gained widespread recogniti...
research
01/26/2023

FedHQL: Federated Heterogeneous Q-Learning

Federated Reinforcement Learning (FedRL) encourages distributed agents t...
research
01/29/2020

GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values

We present GradientDICE for estimating the density ratio between the sta...

Please sign up or login with your details

Forgot password? Click here to reset