4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks

04/18/2019
by   Christopher Choy, et al.
50

In many robotics and VR/AR applications, 3D-videos are readily-available sources of input (a continuous sequence of depth images, or LIDAR scans). However, those 3D-videos are processed frame-by-frame either through 2D convnets or 3D perception algorithms. In this work, we propose 4-dimensional convolutional neural networks for spatio-temporal perception that can directly process such 3D-videos using high-dimensional convolutions. For this, we adopt sparse tensors and propose the generalized sparse convolution that encompasses all discrete convolutions. To implement the generalized sparse convolution, we create an open-source auto-differentiation library for sparse tensors that provides extensive functions for high-dimensional convolutional neural networks. We create 4D spatio-temporal convolutional neural networks using the library and validate them on various 3D semantic segmentation benchmarks and proposed 4D datasets for 3D-video perception. To overcome challenges in the 4D space, we propose the hybrid kernel, a special case of the generalized sparse convolution, and the trilateral-stationary conditional random field that enforces spatio-temporal consistency in the 7D space-time-chroma space. Experimentally, we show that convolutional neural networks with only generalized 3D sparse convolutions can outperform 2D or 2D-3D hybrid methods by a large margin. Also, we show that on 3D-videos, 4D spatio-temporal convolutional neural networks are robust to noise, outperform 3D convolutional neural networks and are faster than the 3D counterpart in some cases.

READ FULL TEXT

page 7

page 8

page 16

page 17

page 18

page 19

page 20

page 21

research
03/25/2015

Initialization Strategies of Spatio-Temporal Convolutional Neural Networks

We propose a new way of incorporating temporal information present in vi...
research
07/29/2019

Seeing Things in Random-Dot Videos

The human visual system correctly groups features and interprets videos ...
research
02/18/2020

V4D:4D Convolutional Neural Networks for Video-level Representation Learning

Most existing 3D CNNs for video representation learning are clip-based m...
research
02/04/2019

Saliency Tubes: Visual Explanations for Spatio-Temporal Convolutions

Deep learning approaches have been established as the main methodology f...
research
03/30/2023

Hybrid Dealiasing of Complex Convolutions

Efficient algorithms for computing linear convolutions based on the fast...
research
05/31/2021

Continual 3D Convolutional Neural Networks for Real-time Processing of Videos

This paper introduces Continual 3D Convolutional Neural Networks (Co3D C...

Please sign up or login with your details

Forgot password? Click here to reset