Continuous Video to Simple Signals for Swimming Stroke Detection with Convolutional Neural Networks

by   Brandon Victor, et al.
Australian Sports Commission
La Trobe University

In many sports, it is useful to analyse video of an athlete in competition for training purposes. In swimming, stroke rate is a common metric used by coaches; requiring a laborious labelling of each individual stroke. We show that using a Convolutional Neural Network (CNN) we can automatically detect discrete events in continuous video (in this case, swimming strokes). We create a CNN that learns a mapping from a window of frames to a point on a smooth 1D target signal, with peaks denoting the location of a stroke, evaluated as a sliding window. To our knowledge this process of training and utilizing a CNN has not been investigated before; either in sports or fundamental computer vision research. Most research has been focused on action recognition and using it to classify many clips in continuous video for action localisation. In this paper we demonstrate our process works well on the task of detecting swimming strokes in the wild. However, without modifying the model architecture or training method, the process is also shown to work equally well on detecting tennis strokes, implying that this is a general process. The outputs of our system are surprisingly smooth signals that predict an arbitrary event at least as accurately as humans (manually evaluated from a sample of negative results). A number of different architectures are evaluated, pertaining to slightly different problem formulations and signal targets.


page 4

page 7


Forensic Video Steganalysis in Spatial Domain by Noise Residual Convolutional Neural Network

This research evaluates a convolutional neural network (CNN) based appro...

Towards Training Stronger Video Vision Transformers for EPIC-KITCHENS-100 Action Recognition

With the recent surge in the research of vision transformers, they have ...

An Improved Convolutional Neural Network System for Automatically Detecting Rebar in GPR Data

As a mature technology, Ground Penetration Radar (GPR) is now widely emp...

A high performance computing method for accelerating temporal action proposal generation

Temporal action proposal generation, coming from temporal action recogni...

Location Dependency in Video Prediction

Deep convolutional neural networks are used to address many computer vis...

Deconvolution-and-convolution Networks

2D Convolutional neural network (CNN) has arguably become the de facto s...

[RE] CNN-generated images are surprisingly easy to spot...for now

This work evaluates the reproducibility of the paper "CNN-generated imag...

Please sign up or login with your details

Forgot password? Click here to reset