DeepAI AI Chat
Log In Sign Up

Enriching Local and Global Contexts for Temporal Action Localization

by   Zixin Zhu, et al.
Xi'an Jiaotong University
University of Illinois at Chicago

Effectively tackling the problem of temporal action localization (TAL) necessitates a visual representation that jointly pursues two confounding goals, i.e., fine-grained discrimination for temporal localization and sufficient visual invariance for action classification. We address this challenge by enriching both the local and global contexts in the popular two-stage temporal localization framework, where action proposals are first generated followed by action classification and temporal boundary regression. Our proposed model, dubbed ContextLoc, can be divided into three sub-networks: L-Net, G-Net and P-Net. L-Net enriches the local context via fine-grained modeling of snippet-level features, which is formulated as a query-and-retrieval process. G-Net enriches the global context via higher-level modeling of the video-level representation. In addition, we introduce a novel context adaptation module to adapt the global context to different proposals. P-Net further models the context-aware inter-proposal relations. We explore two existing models to be the P-Net in our experiments. The efficacy of our proposed method is validated by experimental results on the THUMOS14 (54.3% at IoU@0.5) and ActivityNet v1.3 (51.24% at IoU@0.5) datasets, which outperforms recent states of the art.


page 1

page 3

page 8


Temporal Context Aggregation Network for Temporal Action Proposal Refinement

Temporal action proposal generation aims to estimate temporal intervals ...

Context-aware Biaffine Localizing Network for Temporal Sentence Grounding

This paper addresses the problem of temporal sentence grounding (TSG), w...

Learning Sparse 2D Temporal Adjacent Networks for Temporal Action Localization

In this report, we introduce the Winner method for HACS Temporal Action ...

Enhancing Fine-grained Sentiment Classification Exploiting Local Context Embedding

Target-oriented sentiment classification is a fine-grained task of natur...

Precise Temporal Action Localization by Evolving Temporal Proposals

Locating actions in long untrimmed videos has been a challenging problem...

Scale Matters: Temporal Scale Aggregation Network for Precise Action Localization in Untrimmed Videos

Temporal action localization is a recently-emerging task, aiming to loca...

Temporal Context Network for Activity Localization in Videos

We present a Temporal Context Network (TCN) for precise temporal localiz...