Escaping from Zero Gradient: Revisiting Action-Constrained Reinforcement Learning via Frank-Wolfe Policy Optimization

02/22/2021
by   Jyun-Li Lin, et al.
0

Action-constrained reinforcement learning (RL) is a widely-used approach in various real-world applications, such as scheduling in networked systems with resource constraints and control of a robot with kinematic constraints. While the existing projection-based approaches ensure zero constraint violation, they could suffer from the zero-gradient problem due to the tight coupling of the policy gradient and the projection, which results in sample-inefficient training and slow convergence. To tackle this issue, we propose a learning algorithm that decouples the action constraints from the policy parameter update by leveraging state-wise Frank-Wolfe and a regression-based policy update scheme. Moreover, we show that the proposed algorithm enjoys convergence and policy improvement properties in the tabular case as well as generalizes the popular DDPG algorithm for action-constrained RL in the general case. Through experiments, we demonstrate that the proposed algorithm significantly outperforms the benchmark methods on a variety of control tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/06/2021

Smoothed functional-based gradient algorithms for off-policy reinforcement learning

We consider the problem of control in an off-policy reinforcement learni...
research
06/09/2020

Constrained episodic reinforcement learning in concave-convex and knapsack settings

We propose an algorithm for tabular episodic reinforcement learning with...
research
06/12/2022

Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm

We consider the problem of constrained Markov decision process (CMDP) in...
research
01/21/2023

Quasi-optimal Learning with Continuous Treatments

Many real-world applications of reinforcement learning (RL) require maki...
research
11/22/2020

Reinforcement learning with distance-based incentive/penalty (DIP) updates for highly constrained industrial control systems

Typical reinforcement learning (RL) methods show limited applicability f...
research
06/24/2021

Density Constrained Reinforcement Learning

We study constrained reinforcement learning (CRL) from a novel perspecti...
research
01/28/2019

Lyapunov-based Safe Policy Optimization for Continuous Control

We study continuous action reinforcement learning problems in which it i...

Please sign up or login with your details

Forgot password? Click here to reset