Modeling Temporal Concept Receptive Field Dynamically for Untrimmed Video Analysis

11/23/2021
by   Zhaobo Qi, et al.
0

Event analysis in untrimmed videos has attracted increasing attention due to the application of cutting-edge techniques such as CNN. As a well studied property for CNN-based models, the receptive field is a measurement for measuring the spatial range covered by a single feature response, which is crucial in improving the image categorization accuracy. In video domain, video event semantics are actually described by complex interaction among different concepts, while their behaviors vary drastically from one video to another, leading to the difficulty in concept-based analytics for accurate event categorization. To model the concept behavior, we study temporal concept receptive field of concept-based event representation, which encodes the temporal occurrence pattern of different mid-level concepts. Accordingly, we introduce temporal dynamic convolution (TDC) to give stronger flexibility to concept-based event analytics. TDC can adjust the temporal concept receptive field size dynamically according to different inputs. Notably, a set of coefficients are learned to fuse the results of multiple convolutions with different kernel widths that provide various temporal concept receptive field sizes. Different coefficients can generate appropriate and accurate temporal concept receptive field size according to input videos and highlight crucial concepts. Based on TDC, we propose the temporal dynamic concept modeling network (TDCMN) to learn an accurate and complete concept representation for efficient untrimmed video analysis. Experiment results on FCVID and ActivityNet show that TDCMN demonstrates adaptive event recognition ability conditioned on different inputs, and improve the event recognition performance of Concept-based methods by a large margin. Code is available at https://github.com/qzhb/TDCMN.

READ FULL TEXT
research
04/13/2020

Event detection in coarsely annotated sports videos via parallel multi receptive field 1D convolutions

In problems such as sports video analytics, it is difficult to obtain ac...
research
08/05/2022

Blockwise Temporal-Spatial Pathway Network

Algorithms for video action recognition should consider not only spatial...
research
11/15/2022

Dynamic Temporal Filtering in Video Models

Video temporal dynamics is conventionally modeled with 3D spatial-tempor...
research
05/05/2018

Revisiting Temporal Modeling for Video-based Person ReID

Video-based person reID is an important task, which has received much at...
research
07/22/2021

EAN: Event Adaptive Network for Enhanced Action Recognition

Efficiently modeling spatial-temporal information in videos is crucial f...
research
11/26/2019

Learning Efficient Video Representation with Video Shuffle Networks

3D CNN shows its strong ability in learning spatiotemporal representatio...
research
07/19/2023

Exploring Transformer Extrapolation

Length extrapolation has attracted considerable attention recently since...

Please sign up or login with your details

Forgot password? Click here to reset