Dynamic penalty function approach for constraints handling in reinforcement learning

12/22/2020
by   Haeun Yoo, et al.
0

Reinforcement learning (RL) is attracting attentions as an effective way to solve sequential optimization problems involving high dimensional state/action space and stochastic uncertainties. Many of such problems involve constraints expressed by inequalities. This study focuses on using RL to solve such constrained optimal control problems. Most of RL application studies have considered inequality constraints as soft constraints by adding penalty terms for violating the constraints to the reward function. However, while training neural networks to represent the value (or Q) function, a key step in RL, one can run into computational issues caused by the sharp change in the function value at the constraint boundary due to the large penalty imposed. This difficulty during training can lead to convergence problems and ultimately poor closed-loop performance. To address this problem, this study suggests the use of a dynamic penalty function which gradually and systematically increases the penalty factor during training as the iteration episodes proceed. First, we examined the ability of a neural network to represent an artificial value function when uniform, linear, or dynamic penalty functions are added to prevent constraint violation. The agent trained by a Deep Q Network (DQN) algorithm with the dynamic penalty function approach was compared with agents with other constant penalty functions in a simple vehicle control problem. Results show that the dynamic penalty approach can improve the neural network's approximation accuracy and that brings faster convergence to a solution closer to the optimal solution.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/05/2020

Deep RBF Value Functions for Continuous Control

A core operation in reinforcement learning (RL) is finding an action tha...
research
11/29/2022

Interpreting Primal-Dual Algorithms for Constrained MARL

Constrained multiagent reinforcement learning (C-MARL) is gaining import...
research
11/22/2020

Reinforcement learning with distance-based incentive/penalty (DIP) updates for highly constrained industrial control systems

Typical reinforcement learning (RL) methods show limited applicability f...
research
06/30/2019

Reinforcement Learning for Robotic Time-optimal Path Tracking Using Prior Knowledge

Time-optimal path tracking, as a significant tool for industrial robots,...
research
05/25/2021

Safe Value Functions

The relationship between safety and optimality in control is not well un...
research
05/30/2019

On Value Functions and the Agent-Environment Boundary

When function approximation is deployed in reinforcement learning (RL), ...
research
05/17/2019

Enforcing constraints for time series prediction in supervised, unsupervised and reinforcement learning

We assume that we are given a time series of data from a dynamical syste...

Please sign up or login with your details

Forgot password? Click here to reset