DeepAI
Log In Sign Up

Video Swin Transformers for Egocentric Video Understanding @ Ego4D Challenges 2022

07/22/2022
by   María Escobar, et al.
0

We implemented Video Swin Transformer as a base architecture for the tasks of Point-of-No-Return temporal localization and Object State Change Classification. Our method achieved competitive performance on both challenges.

READ FULL TEXT

page 2

page 5

08/02/2022

Two-Stream Transformer Architecture for Long Video Understanding

Pure vision transformer architectures are highly effective for short vid...
11/16/2022

Exploring State Change Capture of Heterogeneous Backbones @ Ego4D Hands and Objects Challenge 2022

Capturing the state changes of interacting objects is a key technology f...
10/10/2022

Turbo Training with Token Dropout

The objective of this paper is an efficient training method for video ta...
10/15/2019

Tiny Video Networks

Video understanding is a challenging problem with great impact on the ab...
06/15/2022

Structured Video Tokens @ Ego4D PNR Temporal Localization Challenge 2022

This technical report describes the SViT approach for the Ego4D Point of...
07/19/2022

Time Is MattEr: Temporal Self-supervision for Video Transformers

Understanding temporal dynamics of video is an essential aspect of learn...
09/08/2022

Video Vision Transformers for Violence Detection

Law enforcement and city safety are significantly impacted by detecting ...