Attentive Spatio-Temporal Representation Learning for Diving Classification

04/30/2019
by   Gagan Kanojia, et al.
14

Competitive diving is a well recognized aquatic sport in which a person dives from a platform or a springboard into the water. Based on the acrobatics performed during the dive, diving is classified into a finite set of action classes which are standardized by FINA. In this work, we propose an attention guided LSTM-based neural network architecture for the task of diving classification. The network takes the frames of a diving video as input and determines its class. We evaluate the performance of the proposed model on a recently introduced competitive diving dataset, Diving48. It contains over 18000 video clips which covers 48 classes of diving. The proposed model outperforms the classification accuracy of the state-of-the-art models in both 2D and 3D frameworks by 11.54 network is able to localize the diver in the video frames during the dive without being trained with such a supervision.

READ FULL TEXT

page 2

page 4

page 5

page 7

research
02/11/2020

Learning spatio-temporal representations with temporal squeeze pooling

In this paper, we propose a new video representation learning method, na...
research
10/01/2018

Where and When to Look? Spatio-temporal Attention for Action Recognition in Videos

Inspired by the observation that humans are able to process videos effic...
research
08/24/2022

A Spatio-Temporal Attentive Network for Video-Based Crowd Counting

Automatic people counting from images has recently drawn attention for u...
research
01/01/2023

Hierarchical Explanations for Video Action Recognition

We propose Hierarchical ProtoPNet: an interpretable network that explain...
research
07/15/2019

A Short Note on the Kinetics-700 Human Action Dataset

We describe an extension of the DeepMind Kinetics human action dataset f...
research
03/30/2022

Rabbit, toad, and the Moon: Can machine categorize them into one class?

Recent machine learning algorithms such as neural networks can classify ...
research
02/21/2021

Efficient Two-Stream Network for Violence Detection Using Separable Convolutional LSTM

Automatically detecting violence from surveillance footage is a subset o...

Please sign up or login with your details

Forgot password? Click here to reset