Technical Report: Temporal Aggregate Representations

06/06/2021
by   Fadime Sener, et al.
21

This technical report extends our work presented in [9] with more experiments. In [9], we tackle long-term video understanding, which requires reasoning from current and past or future observations and raises several fundamental questions. How should temporal or sequential relationships be modelled? What temporal extent of information and context needs to be processed? At what temporal scale should they be derived? [9] addresses these questions with a flexible multi-granular temporal aggregation framework. In this report, we conduct further experiments with this framework on different tasks and a new dataset, EPIC-KITCHENS-100.

READ FULL TEXT
research
06/01/2020

Temporal Aggregate Representations for Long Term Video Understanding

Future prediction requires reasoning from current and past observations ...
research
08/13/2013

Semistability-Based Convergence Analysis for Paracontracting Multiagent Coordination Optimization

This sequential technical report extends some of the previous results we...
research
04/22/2021

Deep Video Matting via Spatio-Temporal Alignment and Aggregation

Despite the significant progress made by deep learning in natural image ...
research
07/06/2023

RecallM: An Architecture for Temporal Context Understanding and Question Answering

The ideal long-term memory mechanism for Large Language Model (LLM) base...
research
08/17/2023

EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding

We introduce EgoSchema, a very long-form video question-answering datase...
research
02/18/2021

Clockwork Variational Autoencoders

Deep learning has enabled algorithms to generate realistic images. Howev...
research
08/24/2023

A Model of Sequential Learning based on Non-Axiomatic Logic

Sequential learning is a fundamental function of an intelligent agent. T...

Please sign up or login with your details

Forgot password? Click here to reset