RPT: Learning Point Set Representation for Siamese Visual Tracking

by   Ziang Ma, et al.

While remarkable progress has been made in robust visual tracking, accurate target state estimation still remains a highly challenging problem. In this paper, we argue that this issue is closely related to the prevalent bounding box representation, which provides only a coarse spatial extent of object. Thus an effcient visual tracking framework is proposed to accurately estimate the target state with a finer representation as a set of representative points. The point set is trained to indicate the semantically and geometrically significant positions of target region, enabling more fine-grained localization and modeling of object appearance. We further propose a multi-level aggregation strategy to obtain detailed structure information by fusing hierarchical convolution layers. Extensive experiments on several challenging benchmarks including OTB2015, VOT2018, VOT2019 and GOT-10k demonstrate that our method achieves new state-of-the-art performance while running at over 20 FPS.


page 2

page 3


ATOM: Accurate Tracking by Overlap Maximization

While recent years have witnessed astonishing improvements in visual tra...

Target Transformed Regression for Accurate Tracking

Accurate tracking is still a challenging task due to appearance variatio...

SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking

By decomposing the visual tracking task into two subproblems as classifi...

Siamese Keypoint Prediction Network for Visual Object Tracking

Visual object tracking aims to estimate the location of an arbitrary tar...

Dense RepPoints: Representing Visual Objects with Dense Point Sets

We present an object representation, called Dense RepPoints, for flexibl...

RPT++: Customized Feature Representation for Siamese Visual Tracking

While recent years have witnessed remarkable progress in the feature rep...

Visual Tracking via Boolean Map Representations

In this paper, we present a simple yet effective Boolean map based repre...