Memory Warps for Learning Long-Term Online Video Representations

by   Tuan-Hung Vu, et al.

This paper proposes a novel memory-based online video representation that is efficient, accurate and predictive. This is in contrast to prior works that often rely on computationally heavy 3D convolutions, ignore actual motion when aligning features over time, or operate in an off-line mode to utilize future frames. In particular, our memory (i) holds the feature representation, (ii) is spatially warped over time to compensate for observer and scene motions, (iii) can carry long-term information, and (iv) enables predicting feature representations in future frames. By exploring a variant that operates at multiple temporal scales, we efficiently learn across even longer time horizons. We apply our online framework to object detection in videos, obtaining a large 2.3 times speed-up and losing only 0.9 dataset, compared to prior works that even use future frames. Finally, we demonstrate the predictive property of our representation in two novel detection setups, where features are propagated over time to (i) significantly enhance a real-time detector by more than 10 setup and to (ii) anticipate objects in future frames.



There are no comments yet.


page 14

page 25

page 26


Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction

Learning to predict the long-term future of video frames is notoriously ...

Impression Network for Video Object Detection

Video object detection is more challenging compared to image object dete...

Long-Term Video Interpolation with Bidirectional Predictive Network

This paper considers the challenging task of long-term video interpolati...

Exploring Structure for Long-Term Tracking of Multiple Objects in Sports Videos

In this paper, we propose a novel approach for exploiting structural rel...

Spatial-Temporal Memory Networks for Video Object Detection

We introduce Spatial-Temporal Memory Networks (STMN) for video object de...

MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition

While today's video recognition systems parse snapshots or short clips a...

RoboMem: Giving Long Term Memory to Robots

Robots have the potential to improve health monitoring outcomes for the ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.