MotionSqueeze: Neural Motion Feature Learning for Video Understanding

07/20/2020
by   Heeseung Kwon, et al.
0

Motion plays a crucial role in understanding videos and most state-of-the-art neural models for video classification incorporate motion information typically using optical flows extracted by a separate off-the-shelf method. As the frame-by-frame optical flows require heavy computation, incorporating motion information has remained a major computational bottleneck for video understanding. In this work, we replace external and heavy computation of optical flows with internal and light-weight learning of motion features. We propose a trainable neural module, dubbed MotionSqueeze, for effective motion feature extraction. Inserted in the middle of any neural network, it learns to establish correspondences across frames and convert them into motion features, which are readily fed to the next downstream layer for better prediction. We demonstrate that the proposed method provides a significant gain on four standard benchmarks for action recognition with only a small amount of additional cost, outperforming the state of the art on Something-Something-V1 V2 datasets.

READ FULL TEXT

page 14

page 22

page 23

research
10/22/2019

Predictive Coding Networks Meet Action Recognition

Action recognition is a key problem in computer vision that labels video...
research
01/05/2023

EgoDistill: Egocentric Head Motion Distillation for Efficient Video Understanding

Recent advances in egocentric video understanding models are promising, ...
research
01/23/2018

Let's Dance: Learning From Online Dance Videos

In recent years, deep neural network approaches have naturally extended ...
research
11/08/2019

Stacked dense optical flows and dropout layers to predict sperm motility and morphology

In this paper, we analyse two deep learning methods to predict sperm mot...
research
09/23/2018

Learning for Video Super-Resolution through HR Optical Flow Estimation

Video super-resolution (SR) aims to generate a sequence of high-resoluti...
research
03/23/2021

Learning Comprehensive Motion Representation for Action Recognition

For action recognition learning, 2D CNN-based methods are efficient but ...
research
08/06/2023

Learning Fine-Grained Features for Pixel-wise Video Correspondences

Video analysis tasks rely heavily on identifying the pixels from differe...

Please sign up or login with your details

Forgot password? Click here to reset