Time and Frequency Network for Human Action Detection in Videos

03/08/2021
by   Changhai Li, et al.
57

Currently, spatiotemporal features are embraced by most deep learning approaches for human action detection in videos, however, they neglect the important features in frequency domain. In this work, we propose an end-to-end network that considers the time and frequency features simultaneously, named TFNet. TFNet holds two branches, one is time branch formed of three-dimensional convolutional neural network(3D-CNN), which takes the image sequence as input to extract time features; and the other is frequency branch, extracting frequency features through two-dimensional convolutional neural network(2D-CNN) from DCT coefficients. Finally, to obtain the action patterns, these two features are deeply fused under the attention mechanism. Experimental results on the JHMDB51-21 and UCF101-24 datasets demonstrate that our approach achieves remarkable performance for frame-mAP.

READ FULL TEXT

page 1

page 2

page 5

page 6

research
03/30/2017

Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos

Deep learning has been demonstrated to achieve excellent results for ima...
research
12/25/2019

Concise and Effective Network for 3D Human Modeling from Orthogonal Silhouettes

In this paper, we revisit the problem of 3D human modeling from two orth...
research
04/24/2018

Vocal melody extraction using patch-based CNN

A patch-based convolutional neural network (CNN) model presented in this...
research
11/15/2022

Detecting train driveshaft damages using accelerometer signals and Differential Convolutional Neural Networks

Railway axle maintenance is critical to avoid catastrophic failures. Now...
research
08/26/2023

A Two-Dimensional Deep Network for RF-based Drone Detection and Identification Towards Secure Coverage Extension

As drones become increasingly prevalent in human life, they also raises ...
research
05/18/2020

Learning Deep Models from Synthetic Data for Extracting Dolphin Whistle Contours

We present a learning-based method for extracting whistles of toothed wh...
research
09/12/2019

TF-Attention-Net: An End To End Neural Network For Singing Voice Separation

In terms of source separation task, most of deep neural networks have tw...

Please sign up or login with your details

Forgot password? Click here to reset