Compressed Vision for Efficient Video Understanding

10/06/2022
by   Olivia Wiles, et al.
14

Experience and reasoning occur across multiple temporal scales: milliseconds, seconds, hours or days. The vast majority of computer vision research, however, still focuses on individual images or short videos lasting only a few seconds. This is because handling longer videos require more scalable approaches even to process them. In this work, we propose a framework enabling research on hour-long videos with the same hardware that can now process second-long videos. We replace standard video compression, e.g. JPEG, with neural compression and show that we can directly feed compressed videos as inputs to regular video networks. Operating on compressed videos improves efficiency at all pipeline levels – data transfer, speed and memory – making it possible to train models faster and on much longer videos. Processing compressed signals has, however, the downside of precluding standard augmentation techniques if done naively. We address that by introducing a small network that can apply transformations to latent codes corresponding to commonly used augmentations in the original video space. We demonstrate that with our compressed vision pipeline, we can train video models more efficiently on popular benchmarks such as Kinetics600 and COIN. We also perform proof-of-concept experiments with new tasks defined over hour-long videos at standard frame rates. Processing such long videos is impossible without using compressed representation.

READ FULL TEXT

page 8

page 9

page 13

page 20

page 21

page 22

page 23

research
12/14/2017

Learning Binary Residual Representations for Domain-specific Video Streaming

We study domain-specific video streaming. Specifically, we target a stre...
research
02/06/2022

Perceptual Coding for Compressed Video Understanding: A New Framework and Benchmark

Most video understanding methods are learned on high-quality videos. How...
research
03/24/2023

Towards Scalable Neural Representation for Diverse Videos

Implicit neural representations (INR) have gained increasing attention i...
research
01/02/2021

Video Captioning in Compressed Video

Existing approaches in video captioning concentrate on exploring global ...
research
10/15/2022

A Codec Information Assisted Framework for Efficient Compressed Video Super-Resolution

Online processing of compressed videos to increase their resolutions att...
research
09/03/2020

Mononizing Binocular Videos

This paper presents the idea ofmono-nizingbinocular videos and a frame-w...
research
02/13/2023

Anti-Compression Contrastive Facial Forgery Detection

Forgery facial images and videos have increased the concern of digital s...

Please sign up or login with your details

Forgot password? Click here to reset