Human Action Recognition in Still Images Using ConViT

07/18/2023
by   Seyed Rohollah Hosseyni, et al.
0

Understanding the relationship between different parts of the image plays a crucial role in many visual recognition tasks. Despite the fact that Convolutional Neural Networks (CNNs) have demonstrated impressive results in detecting single objects, they lack the capability to extract the relationship between various regions of an image, which is a crucial factor in human action recognition. To address this problem, this paper proposes a new module that functions like a convolutional layer using Vision Transformer (ViT). The proposed action recognition model comprises two components: the first part is a deep convolutional network that extracts high-level spatial features from the image, and the second component of the model utilizes a Vision Transformer that extracts the relationship between various regions of the image using the feature map generated by the CNN output. The proposed model has been evaluated on the Stanford40 and PASCAL VOC 2012 action datasets and has achieved 95.5 mAP and 91.5 state-of-the-art methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 6

research
03/21/2018

Exploiting deep residual networks for human action recognition from skeletal data

The computer vision community is currently focusing on solving action re...
research
07/18/2018

Skeletal Movement to Color Map: A Novel Representation for 3D Action Recognition with Inception Residual Networks

We propose a novel skeleton-based representation for 3D action recogniti...
research
07/09/2016

Action Recognition with Joint Attention on Multi-Level Deep Features

We propose a novel deep supervised neural network for the task of action...
research
11/08/2016

Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks

Recently, Convolutional Neural Networks (ConvNets) have shown promising ...
research
01/31/2023

Skeleton-based Human Action Recognition via Convolutional Neural Networks (CNN)

Recently, there has been a remarkable increase in the interest towards s...
research
06/06/2021

Transformed ROIs for Capturing Visual Transformations in Videos

Modeling the visual changes that an action brings to a scene is critical...
research
08/26/2020

Visual Concept Reasoning Networks

A split-transform-merge strategy has been broadly used as an architectur...

Please sign up or login with your details

Forgot password? Click here to reset