Smoothing Policies and Safe Policy Gradients

05/08/2019
by   Matteo Papini, et al.
0

Policy gradient algorithms are among the best candidates for the much anticipated application of reinforcement learning to real-world control tasks, such as the ones arising in robotics. However, the trial-and-error nature of these methods introduces safety issues whenever the learning phase itself must be performed on a physical system. In this paper, we address a specific safety formulation, where danger is encoded in the reward signal and the learning agent is constrained to never worsen its performance. By studying actor-only policy gradient from a stochastic optimization perspective, we establish improvement guarantees for a wide class of parametric policies, generalizing existing results on Gaussian policies. This, together with novel upper bounds on the variance of policy gradient estimators, allows to identify those meta-parameter schedules that guarantee monotonic improvement with high probability. The two key meta-parameters are the step size of the parameter updates and the batch size of the gradient estimators. By a joint, adaptive selection of these meta-parameters, we obtain a safe policy gradient algorithm.

READ FULL TEXT
research
11/25/2021

Distributed Policy Gradient with Variance Reduction in Multi-Agent Reinforcement Learning

This paper studies a distributed policy gradient in collaborative multi-...
research
03/13/2018

Learning to Explore with Meta-Policy Gradient

The performance of off-policy learning, including deep Q-learning and de...
research
06/13/2018

Marginal Policy Gradients for Complex Control

Many complex domains, such as robotics control and real-time strategy (R...
research
01/28/2019

Lyapunov-based Safe Policy Optimization for Continuous Control

We study continuous action reinforcement learning problems in which it i...
research
01/10/2018

Expected Policy Gradients for Reinforcement Learning

We propose expected policy gradients (EPG), which unify stochastic polic...
research
06/30/2020

Policy Gradient Optimization of Thompson Sampling Policies

We study the use of policy gradient algorithms to optimize over a class ...
research
06/29/2023

Probabilistic Constraint for Safety-Critical Reinforcement Learning

In this paper, we consider the problem of learning safe policies for pro...

Please sign up or login with your details

Forgot password? Click here to reset