MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

07/31/2023
by   Enxin Song, et al.
0

Recently, integrating video foundation models and large language models to build a video understanding system overcoming the limitations of specific pre-defined vision tasks. Yet, existing systems can only handle videos with very few frames. For long videos, the computation complexity, memory cost, and long-term temporal connection are the remaining challenges. Inspired by Atkinson-Shiffrin memory model, we develop an memory mechanism including a rapidly updated short-term memory and a compact thus sustained long-term memory. We employ tokens in Transformers as the carriers of memory. MovieChat achieves state-of-the-art performace in long video understanding.

READ FULL TEXT

page 3

page 5

research
07/14/2022

XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model

We present XMem, a video object segmentation architecture for long video...
research
01/20/2022

MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition

While today's video recognition systems parse snapshots or short clips a...
research
03/25/2023

Selective Structured State-Spaces for Long-Form Video Understanding

Effective modeling of complex spatiotemporal dependencies in long-form v...
research
07/06/2023

RecallM: An Architecture for Temporal Context Understanding and Question Answering

The ideal long-term memory mechanism for Large Language Model (LLM) base...
research
03/23/2020

RoboMem: Giving Long Term Memory to Robots

Robots have the potential to improve health monitoring outcomes for the ...
research
02/04/2021

Adaptive Semiparametric Language Models

We present a language model that combines a large parametric neural netw...
research
07/22/2012

A New Training Algorithm for Kanerva's Sparse Distributed Memory

The Sparse Distributed Memory proposed by Pentii Kanerva (SDM in short) ...

Please sign up or login with your details

Forgot password? Click here to reset