Memory Warps for Learning Long-Term Online Video Representations

03/28/2018
by   Tuan-Hung Vu, et al.
0

This paper proposes a novel memory-based online video representation that is efficient, accurate and predictive. This is in contrast to prior works that often rely on computationally heavy 3D convolutions, ignore actual motion when aligning features over time, or operate in an off-line mode to utilize future frames. In particular, our memory (i) holds the feature representation, (ii) is spatially warped over time to compensate for observer and scene motions, (iii) can carry long-term information, and (iv) enables predicting feature representations in future frames. By exploring a variant that operates at multiple temporal scales, we efficiently learn across even longer time horizons. We apply our online framework to object detection in videos, obtaining a large 2.3 times speed-up and losing only 0.9 dataset, compared to prior works that even use future frames. Finally, we demonstrate the predictive property of our representation in two novel detection setups, where features are propagated over time to (i) significantly enhance a real-time detector by more than 10 setup and to (ii) anticipate objects in future frames.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 14

page 25

page 26

04/14/2021

Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction

Learning to predict the long-term future of video frames is notoriously ...
12/16/2017

Impression Network for Video Object Detection

Video object detection is more challenging compared to image object dete...
06/13/2017

Long-Term Video Interpolation with Bidirectional Predictive Network

This paper considers the challenging task of long-term video interpolati...
12/19/2016

Exploring Structure for Long-Term Tracking of Multiple Objects in Sports Videos

In this paper, we propose a novel approach for exploiting structural rel...
12/18/2017

Spatial-Temporal Memory Networks for Video Object Detection

We introduce Spatial-Temporal Memory Networks (STMN) for video object de...
01/20/2022

MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition

While today's video recognition systems parse snapshots or short clips a...
03/23/2020

RoboMem: Giving Long Term Memory to Robots

Robots have the potential to improve health monitoring outcomes for the ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.