Memory Warps for Learning Long-Term Online Video Representations

03/28/2018
by   Tuan-Hung Vu, et al.
0

This paper proposes a novel memory-based online video representation that is efficient, accurate and predictive. This is in contrast to prior works that often rely on computationally heavy 3D convolutions, ignore actual motion when aligning features over time, or operate in an off-line mode to utilize future frames. In particular, our memory (i) holds the feature representation, (ii) is spatially warped over time to compensate for observer and scene motions, (iii) can carry long-term information, and (iv) enables predicting feature representations in future frames. By exploring a variant that operates at multiple temporal scales, we efficiently learn across even longer time horizons. We apply our online framework to object detection in videos, obtaining a large 2.3 times speed-up and losing only 0.9 dataset, compared to prior works that even use future frames. Finally, we demonstrate the predictive property of our representation in two novel detection setups, where features are propagated over time to (i) significantly enhance a real-time detector by more than 10 setup and to (ii) anticipate objects in future frames.

READ FULL TEXT

page 14

page 25

page 26

research
04/14/2021

Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction

Learning to predict the long-term future of video frames is notoriously ...
research
12/16/2017

Impression Network for Video Object Detection

Video object detection is more challenging compared to image object dete...
research
07/14/2022

XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model

We present XMem, a video object segmentation architecture for long video...
research
06/13/2017

Long-Term Video Interpolation with Bidirectional Predictive Network

This paper considers the challenging task of long-term video interpolati...
research
03/14/2022

Implicit Motion Handling for Video Camouflaged Object Detection

We propose a new video camouflaged object detection (VCOD) framework tha...
research
04/20/2019

Cubic LSTMs for Video Prediction

Predicting future frames in videos has become a promising direction of r...
research
03/23/2020

RoboMem: Giving Long Term Memory to Robots

Robots have the potential to improve health monitoring outcomes for the ...

Please sign up or login with your details

Forgot password? Click here to reset