
Deep RBF Value Functions for Continuous Control
A core operation in reinforcement learning (RL) is finding an action tha...
read it

Reinforcement learning with distancebased incentive/penalty (DIP) updates for highly constrained industrial control systems
Typical reinforcement learning (RL) methods show limited applicability f...
read it

Safe Value Functions
The relationship between safety and optimality in control is not well un...
read it

Reinforcement Learning for Robotic Timeoptimal Path Tracking Using Prior Knowledge
Timeoptimal path tracking, as a significant tool for industrial robots,...
read it

Deep adaptive dynamic programming for nonaffine nonlinear optimal control problem with state constraints
This paper presents a constrained deep adaptive dynamic programming (CDA...
read it

Constrained Cohort Intelligence using Static and Dynamic Penalty Function Approach for Mechanical Components Design
Most of the metaheuristics can efficiently solve unconstrained problems;...
read it

On Value Functions and the AgentEnvironment Boundary
When function approximation is deployed in reinforcement learning (RL), ...
read it
Dynamic penalty function approach for constraints handling in reinforcement learning
Reinforcement learning (RL) is attracting attentions as an effective way to solve sequential optimization problems involving high dimensional state/action space and stochastic uncertainties. Many of such problems involve constraints expressed by inequalities. This study focuses on using RL to solve such constrained optimal control problems. Most of RL application studies have considered inequality constraints as soft constraints by adding penalty terms for violating the constraints to the reward function. However, while training neural networks to represent the value (or Q) function, a key step in RL, one can run into computational issues caused by the sharp change in the function value at the constraint boundary due to the large penalty imposed. This difficulty during training can lead to convergence problems and ultimately poor closedloop performance. To address this problem, this study suggests the use of a dynamic penalty function which gradually and systematically increases the penalty factor during training as the iteration episodes proceed. First, we examined the ability of a neural network to represent an artificial value function when uniform, linear, or dynamic penalty functions are added to prevent constraint violation. The agent trained by a Deep Q Network (DQN) algorithm with the dynamic penalty function approach was compared with agents with other constant penalty functions in a simple vehicle control problem. Results show that the dynamic penalty approach can improve the neural network's approximation accuracy and that brings faster convergence to a solution closer to the optimal solution.
READ FULL TEXT
Comments
There are no comments yet.