SPIN: A High Speed, High Resolution Vision Dataset for Tracking and Action Recognition in Ping Pong

by   Steven Schwarcz, et al.

We introduce a new high resolution, high frame rate stereo video dataset, which we call SPIN, for tracking and action recognition in the game of ping pong. The corpus consists of ping pong play with three main annotation streams that can be used to learn tracking and action recognition models – tracking of the ping pong ball and poses of humans in the videos and the spin of the ball being hit by humans. The training corpus consists of 53 hours of data with labels derived from previous models in a semi-supervised method. The testing corpus contains 1 hour of data with the same information, except that crowd compute was used to obtain human annotations of the ball position, from which ball spin has been derived. Along with the dataset we introduce several baseline models that were trained on this data. The models were specifically chosen to be able to perform inference at the same rate as the images are generated – specifically 150 fps. We explore the advantages of multi-task training on this data, and also show interesting properties of ping pong ball trajectories that are derived from our observational data, rather than from prior physics models. To our knowledge this is the first large scale dataset of ping pong; we offer it to the community as a rich dataset that can be used for a large variety of machine learning and vision tasks such as tracking, pose estimation, semi-supervised and unsupervised learning and generative modeling.


page 1

page 4

page 5

page 11


DiscrimNet: Semi-Supervised Action Recognition from Videos using Generative Adversarial Networks

We propose an action recognition framework using Gen- erative Adversaria...

Pose from Action: Unsupervised Learning of Pose Features based on Motion

Human actions are comprised of a sequence of poses. This makes videos of...

Generating Human Action Videos by Coupling 3D Game Engines and Probabilistic Graphical Models

Deep video action recognition models have been highly successful in rece...

HUST bearing: a practical dataset for ball bearing fault diagnosis

In this work, we introduce a practical dataset named HUST bearing, that ...

Delta Sampling R-BERT for limited data and low-light action recognition

We present an approach to perform supervised action recognition in the d...

Iterate Cluster: Iterative Semi-Supervised Action Recognition

We propose a novel system for active semi-supervised feature-based actio...

CVB: A Video Dataset of Cattle Visual Behaviors

Existing image/video datasets for cattle behavior recognition are mostly...

Please sign up or login with your details

Forgot password? Click here to reset