Constrained Policy Gradient Method for Safe and Fast Reinforcement Learning: a Neural Tangent Kernel Based Approach

07/19/2021
by   Balázs Varga, et al.
0

This paper presents a constrained policy gradient algorithm. We introduce constraints for safe learning with the following steps. First, learning is slowed down (lazy learning) so that the episodic policy change can be computed with the help of the policy gradient theorem and the neural tangent kernel. Then, this enables us the evaluation of the policy at arbitrary states too. In the same spirit, learning can be guided, ensuring safety via augmenting episode batches with states where the desired action probabilities are prescribed. Finally, exogenous discounted sum of future rewards (returns) can be computed at these specific state-action pairs such that the policy network satisfies constraints. Computing the returns is based on solving a system of linear equations (equality constraints) or a constrained quadratic program (inequality constraints). Simulation results suggest that adding constraints (external information) to the learning can improve learning in terms of speed and safety reasonably if constraints are appropriately selected. The efficiency of the constrained learning was demonstrated with a shallow and wide ReLU network in the Cartpole and Lunar Lander OpenAI gym environments. The main novelty of the paper is giving a practical use of the neural tangent kernel in reinforcement learning.

READ FULL TEXT
research
12/27/2021

Safe Reinforcement Learning with Chance-constrained Model Predictive Control

Real-world reinforcement learning (RL) problems often demand that agents...
research
06/20/2017

Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines

We show how an action-dependent baseline can be used by the policy gradi...
research
10/26/2019

Convergent Policy Optimization for Safe Reinforcement Learning

We study the safe reinforcement learning problem with nonlinear function...
research
01/28/2019

Lyapunov-based Safe Policy Optimization for Continuous Control

We study continuous action reinforcement learning problems in which it i...
research
04/02/2020

Safe Reinforcement Learning via Projection on a Safe Set: How to Achieve Optimality?

For all its successes, Reinforcement Learning (RL) still struggles to de...
research
07/11/2023

Safe Reinforcement Learning for Strategic Bidding of Virtual Power Plants in Day-Ahead Markets

This paper presents a novel safe reinforcement learning algorithm for st...
research
07/25/2023

Submodular Reinforcement Learning

In reinforcement learning (RL), rewards of states are typically consider...

Please sign up or login with your details

Forgot password? Click here to reset