An Image Classifier Can Suffice For Video Understanding

06/26/2021
by   Quanfu Fan, et al.
0

We propose a new perspective on video understanding by casting the video recognition problem as an image recognition task. We show that an image classifier alone can suffice for video understanding without temporal modeling. Our approach is simple and universal. It composes input frames into a super image to train an image classifier to fulfill the task of action recognition, in exactly the same way as classifying an image. We prove the viability of such an idea by demonstrating strong and promising performance on four public datasets including Kinetics400, Something-to-something (V2), MiT and Jester, using a recently developed vision transformer. We also experiment with the prevalent ResNet image classifiers in computer vision to further validate our idea. The results on Kinetics400 are comparable to some of the best-performed CNN approaches based on spatio-temporal modeling. our code and models will be made available at https://github.com/IBM/sifar-pytorch.

READ FULL TEXT

page 3

page 9

research
10/13/2021

Object-Region Video Transformers

Evidence from cognitive psychology suggests that understanding spatio-te...
research
10/22/2020

Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition

In recent years, a number of approaches based on 2D CNNs and 3D CNNs hav...
research
02/17/2023

Video Action Recognition Collaborative Learning with Dynamics via PSO-ConvNet Transformer

Human Action Recognition (HAR) involves the task of categorizing actions...
research
08/09/2022

Sports Video Analysis on Large-Scale Data

This paper investigates the modeling of automated machine description on...
research
03/16/2022

Gate-Shift-Fuse for Video Action Recognition

Convolutional Neural Networks are the de facto models for image recognit...
research
04/19/2021

Writing in The Air: Unconstrained Text Recognition from Finger Movement Using Spatio-Temporal Convolution

In this paper, we introduce a new benchmark dataset for the challenging ...
research
04/07/2022

BankNote-Net: Open dataset for assistive universal currency recognition

Millions of people around the world have low or no vision. Assistive sof...

Please sign up or login with your details

Forgot password? Click here to reset