TDN: Temporal Difference Networks for Efficient Action Recognition

12/18/2020
by   Limin Wang, et al.
0

Temporal modeling still remains challenging for action recognition in videos. To mitigate this issue, this paper presents a new video architecture, termed as Temporal Difference Network (TDN), with a focus on capturing multi-scale temporal information for efficient action recognition. The core of our TDN is to devise an efficient temporal module (TDM) by explicitly leveraging a temporal difference operator, and systematically assess its effect on short-term and long-term motion modeling. To fully capture temporal information over the entire video, our TDN is established with a two-level difference modeling paradigm. Specifically, for local motion modeling, temporal difference over consecutive frames is used to supply 2D CNNs with finer motion pattern, while for global motion modeling, temporal difference across segments is incorporated to capture long-range structure for motion feature excitation. TDN provides a simple and principled temporal modeling framework and could be instantiated with the existing CNNs at a small extra computational cost. Our TDN presents a new state of the art on the Something-Something V1 and V2 datasets and is on par with the best performance on the Kinetics-400 dataset. In addition, we conduct in-depth ablation studies and plot the visualization results of our TDN, hopefully providing insightful analysis on temporal difference operation. We release the code at https://github.com/MCG-NJU/TDN.

READ FULL TEXT

page 8

page 10

page 11

page 12

research
05/14/2020

TAM: Temporal Adaptive Module for Video Recognition

Temporal modeling is crucial for capturing spatiotemporal structure in v...
research
06/30/2021

Long-Short Temporal Modeling for Efficient Action Recognition

Efficient long-short temporal modeling is key for enhancing the performa...
research
12/31/2022

An end-to-end multi-scale network for action prediction in videos

In this paper, we develop an efficient multi-scale network to predict ac...
research
06/02/2021

TSI: Temporal Saliency Integration for Video Action Recognition

Efficient spatiotemporal modeling is an important yet challenging proble...
research
11/21/2019

TEINet: Towards an Efficient Architecture for Video Recognition

Efficiency is an important issue in designing video architectures for ac...
research
04/10/2023

Local-Global Temporal Difference Learning for Satellite Video Super-Resolution

Optical-flow-based and kernel-based approaches have been widely explored...
research
04/25/2022

Temporal Relevance Analysis for Video Action Models

In this paper, we provide a deep analysis of temporal modeling for actio...

Please sign up or login with your details

Forgot password? Click here to reset