Knowledge Fusion Transformers for Video Action Recognition

09/29/2020
by   Ganesh Samarth, et al.
4

We introduce Knowledge Fusion Transformers for video action classification. We present a self-attention based feature enhancer to fuse action knowledge in 3D inception based spatio-temporal context of the video clip intended to be classified. We show, how using only one stream networks and with little or, no pretraining can pave the way for a performance close to the current state-of-the-art. Additionally, we present how different self-attention architectures used at different levels of the network can be blended-in to enhance feature representation. Our architecture is trained and evaluated on UCF-101 and Charades dataset, where it is competitive with the state of the art. It also exceeds by a large gap from single stream networks with no to less pretraining.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/27/2022

Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition

Transformer-based methods have recently achieved great advancement on 2D...
research
08/02/2022

Two-Stream Transformer Architecture for Long Video Understanding

Pure vision transformer architectures are highly effective for short vid...
research
09/03/2023

COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers

We present COMEDIAN, a novel pipeline to initialize spatio-temporal tran...
research
09/13/2022

Vision Transformers for Action Recognition: A Survey

Vision transformers are emerging as a powerful tool to solve computer vi...
research
03/23/2022

A Context-Aware Feature Fusion Framework for Punctuation Restoration

To accomplish the punctuation restoration task, most existing approaches...
research
11/02/2021

Relational Self-Attention: What's Missing in Attention for Video Understanding

Convolution has been arguably the most important feature transform for m...
research
02/16/2022

ActionFormer: Localizing Moments of Actions with Transformers

Self-attention based Transformer models have demonstrated impressive res...

Please sign up or login with your details

Forgot password? Click here to reset