Video Annotation for Visual Tracking via Selection and Refinement

08/09/2021
by   Kenan Dai, et al.
0

Deep learning based visual trackers entail offline pre-training on large volumes of video datasets with accurate bounding box annotations that are labor-expensive to achieve. We present a new framework to facilitate bounding box annotations for video sequences, which investigates a selection-and-refinement strategy to automatically improve the preliminary annotations generated by tracking algorithms. A temporal assessment network (T-Assess Net) is proposed which is able to capture the temporal coherence of target locations and select reliable tracking results by measuring their quality. Meanwhile, a visual-geometry refinement network (VG-Refine Net) is also designed to further enhance the selected tracking results by considering both target appearance and temporal geometry constraints, allowing inaccurate tracking results to be corrected. The combination of the above two networks provides a principled approach to ensure the quality of automatic video annotation. Experiments on large scale tracking benchmarks demonstrate that our method can deliver highly accurate bounding box annotations and significantly reduce human labor by 94.0 tracking performance with augmented training data.

READ FULL TEXT

page 2

page 4

page 8

research
01/06/2021

Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in Videos

Segmenting objects in videos is a fundamental computer vision task. The ...
research
07/04/2020

Alpha-Refine: Boosting Tracking Performance by Precise Bounding Box Estimation

In recent years, the multiple-stage strategy has become a popular trend ...
research
11/21/2021

3D Visual Tracking Framework with Deep Learning for Asteroid Exploration

3D visual tracking is significant to deep space exploration programs, wh...
research
02/07/2018

Computer-Aided Annotation for Video Tampering Dataset of Forensic Research

The annotation of video tampering dataset is a boring task that takes a ...
research
10/16/2020

A Simple Baseline for Pose Tracking in Videos of Crowded Scenes

This paper presents our solution to ACM MM challenge: Large-scale Human-...
research
03/23/2023

3D-POP – An automated annotation approach to facilitate markerless 2D-3D tracking of freely moving birds with marker-based motion capture

Recent advances in machine learning and computer vision are revolutioniz...
research
12/08/2020

A Dataset and Application for Facial Recognition of Individual Gorillas in Zoo Environments

We put forward a video dataset with 5k+ facial bounding box annotations ...

Please sign up or login with your details

Forgot password? Click here to reset