DeepAI AI Chat
Log In Sign Up

Siamese Tracking with Lingual Object Constraints

by   Maximilian Filtenborg, et al.

Classically, visual object tracking involves following a target object throughout a given video, and it provides us the motion trajectory of the object. However, for many practical applications, this output is often insufficient since additional semantic information is required to act on the video material. Example applications of this are surveillance and target-specific video summarization, where the target needs to be monitored with respect to certain predefined constraints, e.g., 'when standing near a yellow car'. This paper explores, tracking visual objects subjected to additional lingual constraints. Differently from Li et al., we impose additional lingual constraints upon tracking, which enables new applications of tracking. Whereas in their work the goal is to improve and extend upon tracking itself. To perform benchmarks and experiments, we contribute two datasets: c-MOT16 and c-LaSOT, curated through appending additional constraints to the frames of the original LaSOT and MOT16 datasets. We also experiment with two deep models SiamCT-DFG and SiamCT-CA, obtained through extending a recent state-of-the-art Siamese tracking method and adding modules inspired from the fields of natural language processing and visual question answering. Through experimental results, we show that the proposed model SiamCT-CA can significantly outperform its counterparts. Furthermore, our method enables the selective compression of videos, based on the validity of the constraint.


page 3

page 7

page 12

page 15

page 17

page 18

page 19


Visual Object Tracking with Discriminative Filters and Siamese Networks: A Survey and Outlook

Accurate and robust visual object tracking is one of the most challengin...

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

In video object tracking, there exist rich temporal contexts among succe...

Visual Object Tracking based on Adaptive Siamese and Motion Estimation Network

Recently, convolutional neural network (CNN) has attracted much attentio...

Does Video Compression Impact Tracking Accuracy?

Everyone "knows" that compressing a video will degrade the accuracy of o...

A Generic Object Re-identification System for Short Videos

Short video applications like TikTok and Kwai have been a great hit rece...

Trajectory Factory: Tracklet Cleaving and Re-connection by Deep Siamese Bi-GRU for Multiple Object Tracking

Multi-Object Tracking (MOT) is a challenging task in the complex scene s...