Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods

11/06/2021
by   Seohong Park, et al.
0

In reinforcement learning, continuous time is often discretized by a time scale δ, to which the resulting performance is known to be highly sensitive. In this work, we seek to find a δ-invariant algorithm for policy gradient (PG) methods, which performs well regardless of the value of δ. We first identify the underlying reasons that cause PG methods to fail as δ→ 0, proving that the variance of the PG estimator can diverge to infinity in stochastic environments under a certain assumption of stochasticity. While durative actions or action repetition can be employed to have δ-invariance, previous action repetition methods cannot immediately react to unexpected situations in stochastic environments. We thus propose a novel δ-invariant method named Safe Action Repetition (SAR) applicable to any existing PG algorithm. SAR can handle the stochasticity of environments by adaptively reacting to changes in states during action repetition. We empirically show that our method is not only δ-invariant but also robust to stochasticity, outperforming previous δ-invariant approaches on eight MuJoCo environments with both deterministic and stochastic settings. Our code is available at https://vision.snu.ac.kr/projects/sar.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/01/2020

A Hybrid Stochastic Policy Gradient Algorithm for Reinforcement Learning

We propose a novel hybrid stochastic policy gradient estimator by combin...
research
06/01/2020

Invariant Policy Optimization: Towards Stronger Generalization in Reinforcement Learning

A fundamental challenge in reinforcement learning is to learn policies t...
research
12/12/2020

Faster Policy Learning with Continuous-Time Gradients

We study the estimation of policy gradients for continuous-time systems ...
research
10/02/2019

Analyzing the Variance of Policy Gradient Estimators for the Linear-Quadratic Regulator

We study the variance of the REINFORCE policy gradient estimator in envi...
research
06/23/2023

Correcting discount-factor mismatch in on-policy policy gradient methods

The policy gradient theorem gives a convenient form of the policy gradie...
research
01/28/2019

Making Deep Q-learning methods robust to time discretization

Despite remarkable successes, Deep Reinforcement Learning (DRL) is not r...
research
10/24/2022

On All-Action Policy Gradients

In this paper, we analyze the variance of stochastic policy gradient wit...

Please sign up or login with your details

Forgot password? Click here to reset