Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers

08/25/2023
by   Matthew Dutson, et al.
0

Vision Transformers achieve impressive accuracy across a range of visual recognition tasks. Unfortunately, their accuracy frequently comes with high computational costs. This is a particular issue in video recognition, where models are often applied repeatedly across frames or temporal chunks. In this work, we exploit temporal redundancy between subsequent inputs to reduce the cost of Transformers for video processing. We describe a method for identifying and re-processing only those tokens that have changed significantly over time. Our proposed family of models, Eventful Transformers, can be converted from existing Transformers (often without any re-training) and give adaptive control over the compute cost at runtime. We evaluate our method on large-scale datasets for video object detection (ImageNet VID) and action recognition (EPIC-Kitchens 100). Our approach leads to significant computational savings (on the order of 2-4x) with only minor reductions in accuracy.

READ FULL TEXT

page 1

page 4

page 9

research
07/01/2021

VideoLightFormer: Lightweight Action Recognition using Transformers

Efficient video action recognition remains a challenging problem. One la...
research
08/25/2023

Enhancing Landmark Detection in Cluttered Real-World Scenarios with Vision Transformers

Visual place recognition tasks often encounter significant challenges in...
research
07/19/2022

Time Is MattEr: Temporal Self-supervision for Video Transformers

Understanding temporal dynamics of video is an essential aspect of learn...
research
11/18/2021

Evaluating Transformers for Lightweight Action Recognition

In video action recognition, transformers consistently reach state-of-th...
research
03/15/2023

EgoViT: Pyramid Video Transformer for Egocentric Action Recognition

Capturing interaction of hands with objects is important to autonomously...
research
03/17/2023

Dual-path Adaptation from Image to Video Transformers

In this paper, we efficiently transfer the surpassing representation pow...
research
12/02/2021

Improved Multiscale Vision Transformers for Classification and Detection

In this paper, we study Multiscale Vision Transformers (MViT) as a unifi...

Please sign up or login with your details

Forgot password? Click here to reset