MSQNet: Actor-agnostic Action Recognition with Multi-modal Query

07/20/2023
by   Anindya Mondal, et al.
0

Existing action recognition methods are typically actor-specific due to the intrinsic topological and apparent differences among the actors. This requires actor-specific pose estimation (e.g., humans vs. animals), leading to cumbersome model design complexity and high maintenance costs. Moreover, they often focus on learning the visual modality alone and single-label classification whilst neglecting other available information sources (e.g., class name text) and the concurrent occurrence of multiple actions. To overcome these limitations, we propose a new approach called 'actor-agnostic multi-modal multi-label action recognition,' which offers a unified solution for various types of actors, including humans and animals. We further formulate a novel Multi-modal Semantic Query Network (MSQNet) model in a transformer-based object detection framework (e.g., DETR), characterized by leveraging visual and textual modalities to represent the action classes better. The elimination of actor-specific model designs is a key advantage, as it removes the need for actor pose estimation altogether. Extensive experiments on five publicly available benchmarks show that our MSQNet consistently outperforms the prior arts of actor-specific alternatives on human and animal single- and multi-label action recognition tasks by up to 50 https://github.com/mondalanindya/MSQNet.

READ FULL TEXT

page 3

page 4

page 7

page 8

research
11/25/2022

Towards Good Practices for Missing Modality Robust Action Recognition

Standard multi-modal models assume the use of the same modalities in tra...
research
01/15/2022

Tailor Versatile Multi-modal Learning for Multi-label Emotion Recognition

Multi-modal Multi-label Emotion Recognition (MMER) aims to identify vari...
research
11/12/2015

Hand-Object Interaction and Precise Localization in Transitive Action Recognition

Action recognition in still images has seen major improvement in recent ...
research
09/05/2017

Multi-label Class-imbalanced Action Recognition in Hockey Videos via 3D Convolutional Neural Networks

Automatic analysis of the video is one of most complex problems in the f...
research
05/12/2023

MM-Fi: Multi-Modal Non-Intrusive 4D Human Dataset for Versatile Wireless Sensing

4D human perception plays an essential role in a myriad of applications,...
research
03/28/2023

CycleACR: Cycle Modeling of Actor-Context Relations for Video Action Detection

The relation modeling between actors and scene context advances video ac...
research
08/27/2022

Actor-identified Spatiotemporal Action Detection – Detecting Who Is Doing What in Videos

The success of deep learning on video Action Recognition (AR) has motiva...

Please sign up or login with your details

Forgot password? Click here to reset