Paying More Attention to Motion: Attention Distillation for Learning Video Representations

04/05/2019
by   Miao Liu, et al.
0

We address the challenging problem of learning motion representations using deep models for video recognition. To this end, we make use of attention modules that learn to highlight regions in the video and aggregate features for recognition. Specifically, we propose to leverage output attention maps as a vehicle to transfer the learned representation from a motion (flow) network to an RGB network. We systematically study the design of attention modules, and develop a novel method for attention distillation. Our method is evaluated on major action benchmarks, and consistently improves the performance of the baseline RGB network by a significant margin. Moreover, we demonstrate that our attention maps can leverage motion cues in learning to identify the location of actions in video frames. We believe our method provides a step towards learning motion-aware representations in deep models.

READ FULL TEXT

page 1

page 3

page 8

research
06/23/2023

Learning Scene Flow With Skeleton Guidance For 3D Action Recognition

Among the existing modalities for 3D action recognition, 3D flow has bee...
research
12/11/2018

Learning Discriminative Motion Features Through Detection

Despite huge success in the image domain, modern detection models such a...
research
07/13/2020

Deep Reinforced Attention Learning for Quality-Aware Visual Recognition

In this paper, we build upon the weakly-supervised generation mechanism ...
research
03/30/2023

Decomposed Cross-modal Distillation for RGB-based Temporal Action Detection

Temporal action detection aims to predict the time intervals and the cla...
research
11/18/2021

M2A: Motion Aware Attention for Accurate Video Action Recognition

Advancements in attention mechanisms have led to significant performance...
research
07/18/2019

Incorporating Temporal Prior from Motion Flow for Instrument Segmentation in Minimally Invasive Surgery Video

Automatic instrument segmentation in video is an essentially fundamental...
research
04/19/2023

MAMAF-Net: Motion-Aware and Multi-Attention Fusion Network for Stroke Diagnosis

Stroke is a major cause of mortality and disability worldwide from which...

Please sign up or login with your details

Forgot password? Click here to reset