Listen to Look: Action Recognition by Previewing Audio

12/10/2019
by   Ruohan Gao, et al.
9

In the face of the video data deluge, today's expensive clip-level classifiers are increasingly impractical. We propose a framework for efficient action recognition in untrimmed video that uses audio as a preview mechanism to eliminate both short-term and long-term visual redundancies. First, we devise an ImgAud2Vid framework that hallucinates clip-level features by distilling from lighter modalities—a single frame and its accompanying audio—reducing short-term temporal redundancy for efficient clip-level recognition. Second, building on ImgAud2Vid, we further propose ImgAud-Skimming, an attention-based long short-term memory network that iteratively selects useful moments in untrimmed videos, reducing long-term temporal redundancy for efficient video-level recognition. Extensive experiments on four action recognition datasets demonstrate that our method achieves the state-of-the-art in terms of both recognition accuracy and speed.

READ FULL TEXT

page 1

page 5

page 8

page 12

page 13

page 14

research
06/30/2021

Long-Short Temporal Modeling for Efficient Action Recognition

Efficient long-short temporal modeling is key for enhancing the performa...
research
08/02/2018

RGB Video Based Tennis Action Recognition Using a Deep Weighted Long Short-Term Memory

Action recognition has attracted increasing attention from RGB input in ...
research
10/19/2021

LSTC: Boosting Atomic Action Detection with Long-Short-Term Context

In this paper, we place the atomic action detection problem into a Long-...
research
04/23/2021

The Influence of Audio on Video Memorability with an Audio Gestalt Regulated Video Memorability System

Memories are the tethering threads that tie us to the world, and memorab...
research
08/26/2021

Identity-aware Graph Memory Network for Action Detection

Action detection plays an important role in high-level video understandi...
research
04/02/2021

Beyond Short Clips: End-to-End Video-Level Learning with Collaborative Memories

The standard way of training video models entails sampling at each itera...
research
06/26/2023

Action Anticipation with Goal Consistency

In this paper, we address the problem of short-term action anticipation,...

Please sign up or login with your details

Forgot password? Click here to reset