Memory Enhanced Global-Local Aggregation for Video Object Detection

03/26/2020
by   Yihong Chen, et al.
4

How do humans recognize an object in a piece of video? Due to the deteriorated quality of single frame, it may be hard for people to identify an occluded object in this frame by just utilizing information within one image. We argue that there are two important cues for humans to recognize objects in videos: the global semantic information and the local localization information. Recently, plenty of methods adopt the self-attention mechanisms to enhance the features in key frame with either global semantic information or local localization information. In this paper we introduce memory enhanced global-local aggregation (MEGA) network, which is among the first trials that takes full consideration of both global and local information. Furthermore, empowered by a novel and carefully-designed Long Range Memory (LRM) module, our proposed MEGA could enable the key frame to get access to much more content than any previous methods. Enhanced by these two sources of information, our method achieves state-of-the-art performance on ImageNet VID dataset. Code is available at <https://github.com/Scalsol/mega.pytorch>.

READ FULL TEXT
research
04/06/2021

Learning to Estimate Hidden Motions with Global Motion Aggregation

Occlusions pose a significant challenge to optical flow algorithms that ...
research
03/25/2021

Real-Time and Accurate Object Detection in Compressed Video by Long Short-term Feature Aggregation

Video object detection is a fundamental problem in computer vision and h...
research
07/20/2023

Spinal nerve segmentation method and dataset construction in endoscopic surgical scenarios

Endoscopic surgery is currently an important treatment method in the fie...
research
09/16/2019

Global Aggregation then Local Distribution in Fully Convolutional Networks

It has been widely proven that modelling long-range dependencies in full...
research
06/16/2020

Global Feature Aggregation for Accident Anticipation

Anticipation of accidents ahead of time in autonomous and non-autonomous...
research
11/23/2022

TransVCL: Attention-enhanced Video Copy Localization Network with Flexible Supervision

Video copy localization aims to precisely localize all the copied segmen...
research
05/08/2022

Recurrent Dynamic Embedding for Video Object Segmentation

Space-time memory (STM) based video object segmentation (VOS) networks u...

Please sign up or login with your details

Forgot password? Click here to reset