Faster Policy Learning with Continuous-Time Gradients

12/12/2020
by   Samuel Ainsworth, et al.
0

We study the estimation of policy gradients for continuous-time systems with known dynamics. By reframing policy learning in continuous-time, we show that it is possible construct a more efficient and accurate gradient estimator. The standard back-propagation through time estimator (BPTT) computes exact gradients for a crude discretization of the continuous-time system. In contrast, we approximate continuous-time gradients in the original system. With the explicit goal of estimating continuous-time gradients, we are able to discretize adaptively and construct a more efficient policy gradient estimator which we call the Continuous-Time Policy Gradient (CTPG). We show that replacing BPTT policy gradients with more efficient CTPG estimates results in faster and more robust learning in a variety of control tasks and simulators.

READ FULL TEXT
research
04/21/2009

A method for Hedging in continuous time

We present a method for hedging in continuous time....
research
01/17/2022

Optimisation of Structured Neural Controller Based on Continuous-Time Policy Gradient

This study presents a policy optimisation framework for structured nonli...
research
01/15/2022

ChevOpt: Continuous-time State Estimation by Chebyshev Polynomial Optimization

In this paper, a new framework for continuous-time maximum a posteriori ...
research
06/22/2021

Distributional Gradient Matching for Learning Uncertain Neural Dynamics Models

Differential equations in general and neural ODEs in particular are an e...
research
01/28/2019

Making Deep Q-learning methods robust to time discretization

Despite remarkable successes, Deep Reinforcement Learning (DRL) is not r...
research
11/06/2021

Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods

In reinforcement learning, continuous time is often discretized by a tim...
research
02/02/2022

Do Differentiable Simulators Give Better Policy Gradients?

Differentiable simulators promise faster computation time for reinforcem...

Please sign up or login with your details

Forgot password? Click here to reset