
TimeOptimal Path Tracking for Industrial Robots: A Dynamic ModelFree Reinforcement Learning Approach
In pursuit of the timeoptimal path tracking (TOPT) trajectory of a robo...
read it

Timeoptimal path tracking for industrial robot: A modelfree reinforcement approach
In pursuit of the timeoptimal motion of a robot manipulator along a pre...
read it

Dynamic penalty function approach for constraints handling in reinforcement learning
Reinforcement learning (RL) is attracting attentions as an effective way...
read it

Efficient reinforcement learning control for continuum robots based on Inexplicit Prior Knowledge
Compared to rigid robots that are often studied in reinforcement learnin...
read it

Trajectory Tracking of Underactuated Sea Vessels With Uncertain Dynamics: An Integral Reinforcement Learning Approach
Underactuated systems like sea vessels have degrees of motion that are i...
read it

Learning Principle of Least Action with Reinforcement Learning
Nature provides a way to understand physics with reinforcement learning ...
read it

Learning Efficient Navigation in Vortical Flow Fields
Efficient pointtopoint navigation in the presence of a background flow...
read it
Reinforcement Learning for Robotic Timeoptimal Path Tracking Using Prior Knowledge
Timeoptimal path tracking, as a significant tool for industrial robots, has attracted the attention of numerous researchers. In most timeoptimal path tracking problems, the actuator torque constraints are assumed to be conservative, which ignores the motor characteristic; i.e., the actuator torque constraints are velocitydependent, and the relationship between torque and velocity is piecewise linear. However, considering that the motor characteristics increase the solving difficulty, in this study, an improved Qlearning algorithm for robotic timeoptimal path tracking using prior knowledge is proposed. After considering the limitations of the Qlearning algorithm, an improved actionvalue function is proposed to improve the convergence rate. The proposed algorithms use the idea of reward and penalty, rewarding the actions that satisfy constraint conditions and penalizing the actions that break constraint conditions, to finally obtain a timeoptimal trajectory that satisfies the constraint conditions. The effectiveness of the algorithms is verified by experiments.
READ FULL TEXT
Comments
There are no comments yet.