TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Device

09/27/2021
by   Ji Lin, et al.
0

The explosive growth in video streaming requires video understanding at high accuracy and low computation cost. Conventional 2D CNNs are computationally cheap but cannot capture temporal relationships; 3D CNN-based methods can achieve good performance but are computationally intensive. In this paper, we propose a generic and effective Temporal Shift Module (TSM) that enjoys both high efficiency and high performance. The key idea of TSM is to shift part of the channels along the temporal dimension, thus facilitate information exchanged among neighboring frames. It can be inserted into 2D CNNs to achieve temporal modeling at zero computation and zero parameters. TSM offers several unique advantages. Firstly, TSM has high performance; it ranks the first on the Something-Something leaderboard upon submission. Secondly, TSM has high efficiency; it achieves a high frame rate of 74fps and 29fps for online video recognition on Jetson Nano and Galaxy Note8. Thirdly, TSM has higher scalability compared to 3D networks, enabling large-scale Kinetics training on 1,536 GPUs in 15 minutes. Lastly, TSM enables action concepts learning, which 2D networks cannot model; we visualize the category attention map and find that spatial-temporal action detector emerges during the training of classification tasks. The code is publicly available at https://github.com/mit-han-lab/temporal-shift-module.

READ FULL TEXT

page 1

page 4

page 8

page 10

page 12

research
11/20/2018

Temporal Shift Module for Efficient Video Understanding

The explosive growth in online video streaming gives rise to challenges ...
research
03/11/2021

ACTION-Net: Multipath Excitation for Action Recognition

Spatial-temporal, channel-wise, and motion patterns are three complement...
research
10/01/2019

Training Kinetics in 15 Minutes: Large-scale Distributed Training on Videos

Deep video recognition is more computationally expensive than image reco...
research
07/02/2019

Learnable Gated Temporal Shift Module for Deep Video Inpainting

How to efficiently utilize temporal information to recover videos in a c...
research
12/10/2018

SlowFast Networks for Video Recognition

We present SlowFast networks for video recognition. Our model involves (...
research
07/25/2020

Approximated Bilinear Modules for Temporal Modeling

We consider two less-emphasized temporal properties of video: 1. Tempora...
research
09/02/2023

ASF-Net: Robust Video Deraining via Temporal Alignment and Online Adaptive Learning

In recent times, learning-based methods for video deraining have demonst...

Please sign up or login with your details

Forgot password? Click here to reset