V4D:4D Convolutional Neural Networks for Video-level Representation Learning

02/18/2020
by   Shiwen Zhang, et al.
0

Most existing 3D CNNs for video representation learning are clip-based methods, and thus do not consider video-level temporal evolution of spatio-temporal features. In this paper, we propose Video-level 4D Convolutional Neural Networks, referred as V4D, to model the evolution of long-range spatio-temporal representation with 4D convolutions, and at the same time, to preserve strong 3D spatio-temporal representation with residual connections. Specifically, we design a new 4D residual block able to capture inter-clip interactions, which could enhance the representation power of the original clip-level 3D CNNs. The 4D residual blocks can be easily integrated into the existing 3D CNNs to perform long-range modeling hierarchically. We further introduce the training and inference methods for the proposed V4D. Extensive experiments are conducted on three video recognition benchmarks, where V4D achieves excellent results, surpassing recent 3D CNNs by a large margin.

READ FULL TEXT
research
08/25/2017

Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition

Convolutional neural networks with spatio-temporal 3D kernels (3D CNNs) ...
research
04/30/2022

Coarse-to-Fine Video Denoising with Dual-Stage Spatial-Channel Transformer

Video denoising aims to recover high-quality frames from the noisy video...
research
06/19/2018

Spatio-Temporal Channel Correlation Networks for Action Classification

The work in this paper is driven by the question if spatio-temporal corr...
research
11/28/2017

Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks

Convolutional Neural Networks (CNN) have been regarded as a powerful cla...
research
04/18/2019

4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks

In many robotics and VR/AR applications, 3D-videos are readily-available...
research
02/04/2019

Saliency Tubes: Visual Explanations for Spatio-Temporal Convolutions

Deep learning approaches have been established as the main methodology f...
research
05/31/2019

Design Light-weight 3D Convolutional Networks for Video Recognition Temporal Residual, Fully Separable Block, and Fast Algorithm

Deep 3-dimensional (3D) Convolutional Network (ConvNet) has shown promis...

Please sign up or login with your details

Forgot password? Click here to reset