Three-stream network for enriched Action Recognition

04/27/2021
by   Ivaxi Sheth, et al.
0

Understanding accurate information on human behaviours is one of the most important tasks in machine intelligence. Human Activity Recognition that aims to understand human activities from a video is a challenging task due to various problems including background, camera motion and dataset variations. This paper proposes two CNN based architectures with three streams which allow the model to exploit the dataset under different settings. The three pathways are differentiated in frame rates. The single pathway, operates at a single frame rate captures spatial information, the slow pathway operates at low frame rates captures the spatial information and the fast pathway operates at high frame rates that capture fine temporal information. Post CNN encoders, we add bidirectional LSTM and attention heads respectively to capture the context and temporal features. By experimenting with various algorithms on UCF-101, Kinetics-600 and AVA dataset, we observe that the proposed models achieve state-of-art performance for human action recognition task.

READ FULL TEXT

page 2

page 5

page 6

page 7

research
12/10/2018

SlowFast Networks for Video Recognition

We present SlowFast networks for video recognition. Our model involves (...
research
05/23/2017

Two-Stream 3D Convolutional Neural Network for Skeleton-Based Action Recognition

It remains a challenge to efficiently extract spatialtemporal informatio...
research
05/02/2017

Investigation of Different Skeleton Features for CNN-based 3D Action Recognition

Deep learning techniques are being used in skeleton based action recogni...
research
03/05/2021

Slow-Fast Auditory Streams For Audio Recognition

We propose a two-stream convolutional network for audio recognition, tha...
research
12/09/2017

A Deep Recurrent Framework for Cleaning Motion Capture Data

We present a deep, bidirectional, recurrent framework for cleaning noisy...
research
04/01/2022

Vision Transformer with Cross-attention by Temporal Shift for Efficient Action Recognition

We propose Multi-head Self/Cross-Attention (MSCA), which introduces a te...
research
10/22/2020

Learning to Sort Image Sequences via Accumulated Temporal Differences

Consider a set of n images of a scene with dynamic objects captured with...

Please sign up or login with your details

Forgot password? Click here to reset