Video Swin Transformers for Egocentric Video Understanding @ Ego4D Challenges 2022

07/22/2022
by   María Escobar, et al.
0

We implemented Video Swin Transformer as a base architecture for the tasks of Point-of-No-Return temporal localization and Object State Change Classification. Our method achieved competitive performance on both challenges.

READ FULL TEXT

page 2

page 5

research
08/02/2022

Two-Stream Transformer Architecture for Long Video Understanding

Pure vision transformer architectures are highly effective for short vid...
research
11/16/2022

Exploring State Change Capture of Heterogeneous Backbones @ Ego4D Hands and Objects Challenge 2022

Capturing the state changes of interacting objects is a key technology f...
research
10/10/2022

Turbo Training with Token Dropout

The objective of this paper is an efficient training method for video ta...
research
10/15/2019

Tiny Video Networks

Video understanding is a challenging problem with great impact on the ab...
research
02/03/2023

Egocentric Video Task Translation @ Ego4D Challenge 2022

This technical report describes the EgoTask Translation approach that ex...
research
06/15/2022

Structured Video Tokens @ Ego4D PNR Temporal Localization Challenge 2022

This technical report describes the SViT approach for the Ego4D Point of...
research
04/01/2023

SVT: Supertoken Video Transformer for Efficient Video Understanding

Whether by processing videos with fixed resolution from start to end or ...

Please sign up or login with your details

Forgot password? Click here to reset