Log In Sign Up

SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos

by   Silvio Giancola, et al.

In this paper, we introduce SoccerNet, a benchmark for action spotting in soccer videos. The dataset is composed of 500 complete soccer games from six main European leagues, covering three seasons from 2014 to 2017 and a total duration of 764 hours. A total of 6,637 temporal annotations are automatically parsed from online match reports at a one minute resolution for three main classes of events (Goal, Yellow/Red Card, and Substitution). As such, the dataset is easily scalable. These annotations are manually refined to a one second resolution by anchoring them at a single timestamp following well-defined soccer rules. With an average of one event every 6.9 minutes, this dataset focuses on the problem of localizing very sparse events within long videos. We define the task of spotting as finding the anchors of soccer events in a video. Making use of recent developments in the realm of generic action recognition and detection in video, we provide strong baselines for detecting soccer events. We show that our best model for classifying temporal segments of length one minute reaches a mean Average Precision (mAP) of 67.8 spotting task, our baseline reaches an Average-mAP of 49.7 δ ranging from 5 to 60 seconds.


page 1

page 14

page 15


Improved Soccer Action Spotting using both Audio and Video Streams

In this paper, we propose a study on multi-modal (audio and video) actio...

RMS-Net: Regression and Masking for Soccer Event Spotting

The recently proposed action spotting task consists in finding the exact...

Multi-shot Temporal Event Localization: a Benchmark

Current developments in temporal event or action localization usually ta...

Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding

An event happening in the world is often made of different activities an...

Hajj and Umrah Event Recognition Datasets

In this note, new Hajj and Umrah Event Recognition datasets (HUER) are p...

Moments in Time Dataset: one million videos for event understanding

We present the Moments in Time Dataset, a large-scale human-annotated co...

Learning to score the figure skating sports videos

This paper targets at learning to score the figure skating sports videos...