Technical Report: Temporal Aggregate Representations

by   Fadime Sener, et al.
National University of Singapore
University of Bonn

This technical report extends our work presented in [9] with more experiments. In [9], we tackle long-term video understanding, which requires reasoning from current and past or future observations and raises several fundamental questions. How should temporal or sequential relationships be modelled? What temporal extent of information and context needs to be processed? At what temporal scale should they be derived? [9] addresses these questions with a flexible multi-granular temporal aggregation framework. In this report, we conduct further experiments with this framework on different tasks and a new dataset, EPIC-KITCHENS-100.


Temporal Aggregate Representations for Long Term Video Understanding

Future prediction requires reasoning from current and past observations ...

Semistability-Based Convergence Analysis for Paracontracting Multiagent Coordination Optimization

This sequential technical report extends some of the previous results we...

Deep Video Matting via Spatio-Temporal Alignment and Aggregation

Despite the significant progress made by deep learning in natural image ...

RecallM: An Architecture for Temporal Context Understanding and Question Answering

The ideal long-term memory mechanism for Large Language Model (LLM) base...

EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding

We introduce EgoSchema, a very long-form video question-answering datase...

Clockwork Variational Autoencoders

Deep learning has enabled algorithms to generate realistic images. Howev...

A Model of Sequential Learning based on Non-Axiomatic Logic

Sequential learning is a fundamental function of an intelligent agent. T...

Code Repositories


[ECCV 2020] Temporal Aggregate Representations for Long-Range Video Understanding

view repo

Please sign up or login with your details

Forgot password? Click here to reset