ECO: Efficient Convolutional Network for Online Video Understanding

04/24/2018
by   Mohammadreza Zolfaghari, et al.
0

The state of the art in video understanding suffers from two problems: (1) The major part of reasoning is performed locally in the video, therefore, it misses important relationships within actions that span several seconds. (2) While there are local methods with fast per-frame processing, the processing of the whole video is not efficient and hampers fast video retrieval or online classification of long-term activities. In this paper, we introduce a network architecture that takes long-term content into account and enables fast per-video processing at the same time. The architecture is based on merging long-term content already in the network rather than in a post-hoc fusion. Together with a sampling strategy, which exploits that neighboring frames are largely redundant, this yields high-quality action classification and video captioning at up to 230 videos per second, where each video can consist of a few hundred frames. The approach achieves competitive performance across all datasets while being 10x to 80x faster than state-of-the-art methods.

READ FULL TEXT

page 12

page 14

research
12/12/2018

Long-Term Feature Banks for Detailed Video Understanding

To understand the world, we humans constantly need to relate the present...
research
07/24/2023

Multiscale Video Pretraining for Long-Term Activity Forecasting

Long-term activity forecasting is an especially challenging research pro...
research
07/01/2022

Video + CLIP Baseline for Ego4D Long-term Action Anticipation

In this report, we introduce our adaptation of image-text models for lon...
research
04/26/2019

ARCHANGEL: Tamper-proofing Video Archives using Temporal Content Hashes on the Blockchain

We present ARCHANGEL; a novel distributed ledger based system for assuri...
research
11/12/2022

Deep Unsupervised Key Frame Extraction for Efficient Video Classification

Video processing and analysis have become an urgent task since a huge am...
research
05/22/2023

READMem: Robust Embedding Association for a Diverse Memory in Unconstrained Video Object Segmentation

We present READMem (Robust Embedding Association for a Diverse Memory), ...
research
10/15/2019

Tiny Video Networks

Video understanding is a challenging problem with great impact on the ab...

Please sign up or login with your details

Forgot password? Click here to reset