Dynamic Temporal Pyramid Network: A Closer Look at Multi-Scale Modeling for Activity Detection

08/07/2018
by   Da Zhang, et al.
0

Recognizing instances at different scales simultaneously is a fundamental challenge in visual detection problems. While spatial multi-scale modeling has been well studied in object detection, how to effectively apply a multi-scale architecture to temporal models for activity detection is still under-explored. In this paper, we identify three unique challenges that need to be specifically handled for temporal activity detection compared to its spatial counterpart. To address all these issues, we propose Dynamic Temporal Pyramid Network (DTPN), a new activity detection framework with a multi-scale pyramidal architecture featuring three novel designs: (1) We sample input video frames dynamically with varying frame per seconds (FPS) to construct a natural pyramidal input for video of an arbitrary length. (2) We design a two-branch multi-scale temporal feature hierarchy to deal with the inherent temporal scale variation of activity instances. (3) We further exploit the temporal context of activities by appropriately fusing multi-scale feature maps, and demonstrate that both local and global temporal contexts are important. By combining all these components into a uniform network, we end up with a single-shot activity detector involving single-pass inferencing and end-to-end training. Extensive experiments show that the proposed DTPN achieves state-of-the-art performance on the challenging ActvityNet dataset.

READ FULL TEXT
research
01/28/2018

Contextual Multi-Scale Region Convolutional 3D Network for Activity Detection

Activity detection is a fundamental problem in computer vision. Detectin...
research
01/14/2022

Argus++: Robust Real-time Activity Detection for Unconstrained Video Streams with Overlapping Cube Proposals

Activity detection is one of the attractive computer vision tasks to exp...
research
05/10/2021

Temporal-Spatial Feature Pyramid for Video Saliency Detection

In this paper, we propose a 3D fully convolutional encoder-decoder archi...
research
07/20/2023

No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection

Temporal video grounding (TVG) aims to retrieve the time interval of a l...
research
11/18/2019

Multi-Temporal Recurrent Neural Networks For Progressive Non-Uniform Single Image Deblurring With Incremental Temporal Training

Multi-scale (MS) approaches have been widely investigated for blind sing...
research
08/27/2019

Temporal Reasoning Graph for Activity Recognition

Despite great success has been achieved in activity analysis, it still h...
research
07/23/2018

Person Search by Multi-Scale Matching

We consider the problem of person search in unconstrained scene images. ...

Please sign up or login with your details

Forgot password? Click here to reset