An Improved End-to-End Multi-Target Tracking Method Based on Transformer Self-Attention

11/11/2022
by   Yong Hong, et al.
0

This study proposes an improved end-to-end multi-target tracking algorithm that adapts to multi-view multi-scale scenes based on the self-attentive mechanism of the transformer's encoder-decoder structure. A multi-dimensional feature extraction backbone network is combined with a self-built semantic raster map, which is stored in the encoder for correlation and generates target position encoding and multi-dimensional feature vectors. The decoder incorporates four methods: spatial clustering and semantic filtering of multi-view targets, dynamic matching of multi-dimensional features, space-time logic-based multi-target tracking, and space-time convergence network (STCN)-based parameter passing. Through the fusion of multiple decoding methods, muti-camera targets are tracked in three dimensions: temporal logic, spatial logic, and feature matching. For the MOT17 dataset, this study's method significantly outperforms the current state-of-the-art method MiniTrackV2 [49] by 2.2 Furthermore, this study proposes a retrospective mechanism for the first time, and adopts a reverse-order processing method to optimise the historical mislabeled targets for improving the Identification F1-score(IDF1). For the self-built dataset OVIT-MOT01, the IDF1 improves from 0.948 to 0.967, and the Multi-camera Tracking Accuracy(MCTA) improves from 0.878 to 0.909, which significantly improves the continuous tracking accuracy and scene adaptation. This research method introduces a new attentional tracking paradigm which is able to achieve state-of-the-art performance on multi-target tracking (MOT17 and OVIT-MOT01) tasks.

READ FULL TEXT

page 3

page 6

page 9

page 10

page 22

page 23

page 24

page 25

research
03/24/2021

Multi-view 3D Reconstruction with Transformer

Deep CNN-based methods have so far achieved the state of the art results...
research
11/27/2022

BEV-Locator: An End-to-end Visual Semantic Localization Network Using Multi-View Images

Accurate localization ability is fundamental in autonomous driving. Trad...
research
04/21/2021

MVFuseNet: Improving End-to-End Object Detection and Motion Forecasting through Multi-View Fusion of LiDAR Data

In this work, we propose MVFuseNet, a novel end-to-end method for joint ...
research
01/25/2019

Multiple Hypothesis Tracking Algorithm for Multi-Target Multi-Camera Tracking with Disjoint Views

In this study, a multiple hypothesis tracking (MHT) algorithm for multi-...
research
12/18/2021

Space Non-cooperative Object Active Tracking with Deep Reinforcement Learning

Active visual tracking of space non-cooperative object is significant fo...
research
10/19/2020

SAINT+: Integrating Temporal Features for EdNet Correctness Prediction

We propose SAINT+, a successor of SAINT which is a Transformer based kno...
research
04/17/2020

Highway Transformer: Self-Gating Enhanced Self-Attentive Networks

Self-attention mechanisms have made striking state-of-the-art (SOTA) pro...

Please sign up or login with your details

Forgot password? Click here to reset