Coarse-Fine Networks for Temporal Activity Detection in Videos

03/01/2021
by   Kumara Kahatapitiya, et al.
14

In this paper, we introduce 'Coarse-Fine Networks', a two-stream architecture which benefits from different abstractions of temporal resolution to learn better video representations for long-term motion. Traditional Video models process inputs at one (or few) fixed temporal resolution without any dynamic frame selection. However, we argue that, processing multiple temporal resolutions of the input and doing so dynamically by learning to estimate the importance of each frame can largely improve video representations, specially in the domain of temporal activity localization. To this end, we propose (1) `Grid Pool', a learned temporal downsampling layer to extract coarse features, and, (2) `Multi-stage Fusion', a spatio-temporal attention mechanism to fuse a fine-grained context with the coarse features. We show that our method can outperform the state-of-the-arts for action detection in public datasets including Charades with a significantly reduced compute and memory footprint.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/19/2020

CFAD: Coarse-to-Fine Action Detector for Spatiotemporal Action Localization

Most current pipelines for spatio-temporal action localization connect f...
research
06/17/2019

Spatio-Temporal Fusion Networks for Action Recognition

The video based CNN works have focused on effective ways to fuse appeara...
research
03/29/2021

Video Classification with FineCoarse Networks

A rich representation of the information in video data can be realized b...
research
05/23/2021

Coarse to Fine Multi-Resolution Temporal Convolutional Network

Temporal convolutional networks (TCNs) are a commonly used architecture ...
research
12/08/2021

STAF: A Spatio-Temporal Attention Fusion Network for Few-shot Video Classification

We propose STAF, a Spatio-Temporal Attention Fusion network for few-shot...
research
04/16/2020

Top-Down Networks: A coarse-to-fine reimagination of CNNs

Biological vision adopts a coarse-to-fine information processing pathway...
research
07/16/2022

Monitoring Vegetation From Space at Extremely Fine Resolutions via Coarsely-Supervised Smooth U-Net

Monitoring vegetation productivity at extremely fine resolutions is valu...

Please sign up or login with your details

Forgot password? Click here to reset