Multi Pseudo Q-learning Based Deterministic Policy Gradient for Tracking Control of Autonomous Underwater Vehicles

09/07/2019
by   Wenjie Shi, et al.
0

This paper investigates trajectory tracking problem for a class of underactuated autonomous underwater vehicles (AUVs) with unknown dynamics and constrained inputs. Different from existing policy gradient methods which employ single actor-critic but cannot realize satisfactory tracking control accuracy and stable learning, our proposed algorithm can achieve high-level tracking control accuracy of AUVs and stable learning by applying a hybrid actors-critics architecture, where multiple actors and critics are trained to learn a deterministic policy and action-value function, respectively. Specifically, for the critics, the expected absolute Bellman error based updating rule is used to choose the worst critic to be updated in each time step. Subsequently, to calculate the loss function with more accurate target value for the chosen critic, Pseudo Q-learning, which uses sub-greedy policy to replace the greedy policy in Q-learning, is developed for continuous action spaces, and Multi Pseudo Q-learning (MPQ) is proposed to reduce the overestimation of action-value function and to stabilize the learning. As for the actors, deterministic policy gradient is applied to update the weights, and the final learned policy is defined as the average of all actors to avoid large but bad updates. Moreover, the stability analysis of the learning is given qualitatively. The effectiveness and generality of the proposed MPQ-based Deterministic Policy Gradient (MPQ-DPG) algorithm are verified by the application on AUV with two different reference trajectories. And the results demonstrate high-level tracking control accuracy and stable learning of MPQ-DPG. Besides, the results also validate that increasing the number of the actors and critics will further improve the performance.

READ FULL TEXT

page 1

page 13

research
04/05/2021

NQMIX: Non-monotonic Value Function Factorization for Deep Multi-Agent Reinforcement Learning

Multi-agent value-based approaches recently make great progress, especia...
research
10/09/2020

Is Standard Deviation the New Standard? Revisiting the Critic in Deep Policy Gradients

Policy gradient algorithms have proven to be successful in diverse decis...
research
11/15/2019

Improved Exploration through Latent Trajectory Optimization in Deep Deterministic Policy Gradient

Model-free reinforcement learning algorithms such as Deep Deterministic ...
research
11/22/2021

Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms

We study policy gradient (PG) for reinforcement learning in continuous t...
research
09/15/2018

Sampled Policy Gradient for Learning to Play the Game Agar.io

In this paper, a new offline actor-critic learning algorithm is introduc...
research
09/14/2020

Safe learning-based trajectory tracking for underactuated vehicles with partially unknown dynamics

Underactuated vehicles have gained much attention in the recent years du...
research
02/01/2023

Distillation Policy Optimization

On-policy algorithms are supposed to be stable, however, sample-intensiv...

Please sign up or login with your details

Forgot password? Click here to reset