An Information-rich Sampling Technique over Spatio-Temporal CNN for Classification of Human Actions in Videos

02/06/2020
by   S. H. Shabbeer Basha, et al.
0

We propose a novel scheme for human action recognition in videos, using a 3-dimensional Convolutional Neural Network (3D CNN) based classifier. Traditionally in deep learning based human activity recognition approaches, either a few random frames or every k^th frame of the video is considered for training the 3D CNN, where k is a small positive integer, like 4, 5, or 6. This kind of sampling reduces the volume of the input data, which speeds-up training of the network and also avoids over-fitting to some extent, thus enhancing the performance of the 3D CNN model. In the proposed video sampling technique, consecutive k frames of a video are aggregated into a single frame by computing a Gaussian-weighted summation of the k frames. The resulting frame (aggregated frame) preserves the information in a better way than the conventional approaches and experimentally shown to perform better. In this letter, a 3D CNN architecture is proposed to extract the spatio-temporal features and follows Long Short-Term Memory (LSTM) to recognize the human actions. The proposed 3D CNN architecture is capable of handling the videos where the camera is placed at a distance from the performer. Experiments are performed with KTH and WEIZMANN human actions datasets, whereby it is shown to produce comparable results with the state-of-the-art techniques.

READ FULL TEXT
research
05/08/2018

Low-Latency Human Action Recognition with Weighted Multi-Region Convolutional Neural Network

Spatio-temporal contexts are crucial in understanding human actions in v...
research
11/28/2018

Future-State Predicting LSTM for Early Surgery Type Recognition

This work presents a novel approach for the early recognition of the typ...
research
11/12/2022

Deep Unsupervised Key Frame Extraction for Efficient Video Classification

Video processing and analysis have become an urgent task since a huge am...
research
02/15/2019

TMAV: Temporal Motionless Analysis of Video using CNN in MPSoC

Analyzing video for traffic categorization is an important pillar of Int...
research
07/02/2020

Estimating Blink Probability for Highlight Detection in Figure Skating Videos

Highlight detection in sports videos has a broad viewership and huge com...
research
08/20/2019

ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning

In this paper we introduce ViSiL, a Video Similarity Learning architectu...
research
03/10/2023

Accurate Real-time Polyp Detection in Videos from Concatenation of Latent Features Extracted from Consecutive Frames

An efficient deep learning model that can be implemented in real-time fo...

Please sign up or login with your details

Forgot password? Click here to reset