TKN: Transformer-based Keypoint Prediction Network For Real-time Video Prediction

03/17/2023
by   Haoran Li, et al.
0

Video prediction is a complex time-series forecasting task with great potential in many use cases. However, conventional methods overemphasize accuracy while ignoring the slow prediction speed caused by complicated model structures that learn too much redundant information with excessive GPU memory consumption. Furthermore, conventional methods mostly predict frames sequentially (frame-by-frame) and thus are hard to accelerate. Consequently, valuable use cases such as real-time danger prediction and warning cannot achieve fast enough inference speed to be applicable in reality. Therefore, we propose a transformer-based keypoint prediction neural network (TKN), an unsupervised learning method that boost the prediction process via constrained information extraction and parallel prediction scheme. TKN is the first real-time video prediction solution to our best knowledge, while significantly reducing computation costs and maintaining other performance. Extensive experiments on KTH and Human3.6 datasets demonstrate that TKN predicts 11 times faster than existing methods while reducing memory consumption by 17.4 achieving state-of-the-art prediction performance on average.

READ FULL TEXT

page 1

page 3

page 4

page 6

page 9

page 10

research
12/14/2020

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Many real-world applications require the prediction of long sequence tim...
research
03/20/2018

DYAN: A Dynamical Atoms Network for Video Prediction

The ability to anticipate the future is essential when making real time ...
research
06/27/2020

MiniNet: An extremely lightweight convolutional neural network for real-time unsupervised monocular depth estimation

Predicting depth from a single image is an attractive research topic sin...
research
05/16/2023

Is a Video worth n× n Images? A Highly Efficient Approach to Transformer-based Video Question Answering

Conventional Transformer-based Video Question Answering (VideoQA) approa...
research
10/23/2020

Efficient grouping for keypoint detection

The success of deep neural networks in the traditional keypoint detectio...
research
03/30/2020

TapLab: A Fast Framework for Semantic Video Segmentation Tapping into Compressed-Domain Knowledge

Real-time semantic video segmentation is a challenging task due to the s...
research
02/10/2022

Case-based reasoning for rare events prediction on strategic sites

Satellite imagery is now widely used in the defense sector for monitorin...

Please sign up or login with your details

Forgot password? Click here to reset