TAP-Vid: A Benchmark for Tracking Any Point in a Video

11/07/2022
by   Carl Doersch, et al.
0

Generic motion understanding from video involves not only tracking objects, but also perceiving how their surfaces deform and move. This information is useful to make inferences about 3D shape, physical properties and object interactions. While the problem of tracking arbitrary physical points on surfaces over longer video clips has received some attention, no dataset or benchmark for evaluation existed, until now. In this paper, we first formalize the problem, naming it tracking any point (TAP). We introduce a companion benchmark, TAP-Vid, which is composed of both real-world videos with accurate human annotations of point tracks, and synthetic videos with perfect ground-truth point tracks. Central to the construction of our benchmark is a novel semi-automatic crowdsourced pipeline which uses optical flow estimates to compensate for easier, short-term motion like camera shake, allowing annotators to focus on harder sections of video. We validate our pipeline on synthetic data and propose a simple end-to-end point tracking model TAP-Net, showing that it outperforms all prior methods on our benchmark when trained on synthetic data.

READ FULL TEXT

page 3

page 4

page 6

page 20

page 21

page 22

page 23

page 25

research
06/30/2021

Synthetic Data Are as Good as the Real for Association Knowledge Learning in Multi-object Tracking

Association, aiming to link bounding boxes of the same identity in a vid...
research
12/30/2021

SFU-HW-Tracks-v1: Object Tracking Dataset on Raw Video Sequences

We present a dataset that contains object annotations with unique object...
research
07/14/2023

CoTracker: It is Better to Track Together

Methods for video motion prediction either estimate jointly the instanta...
research
12/21/2016

Learning Motion Patterns in Videos

The problem of determining whether an object is in motion, irrespective ...
research
07/27/2023

PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking

We introduce PointOdyssey, a large-scale synthetic dataset, and data gen...
research
06/14/2023

TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement

We present a novel model for Tracking Any Point (TAP) that effectively t...
research
04/22/2022

Leveraging Deepfakes to Close the Domain Gap between Real and Synthetic Images in Facial Capture Pipelines

We propose an end-to-end pipeline for both building and tracking 3D faci...

Please sign up or login with your details

Forgot password? Click here to reset