Syntactically Guided Generative Embeddings for Zero-Shot Skeleton Action Recognition

01/27/2021
by   Pranay Gupta, et al.
11

We introduce SynSE, a novel syntactically guided generative approach for Zero-Shot Learning (ZSL). Our end-to-end approach learns progressively refined generative embedding spaces constrained within and across the involved modalities (visual, language). The inter-modal constraints are defined between action sequence embedding and embeddings of Parts of Speech (PoS) tagged words in the corresponding action description. We deploy SynSE for the task of skeleton-based action sequence recognition. Our design choices enable SynSE to generalize compositionally, i.e., recognize sequences whose action descriptions contain words not encountered during training. We also extend our approach to the more challenging Generalized Zero-Shot Learning (GZSL) problem via a confidence-based gating mechanism. We are the first to present zero-shot skeleton action recognition results on the large-scale NTU-60 and NTU-120 skeleton action datasets with multiple splits. Our results demonstrate SynSE's state of the art performance in both ZSL and GZSL settings compared to strong baselines on the NTU-60 and NTU-120 datasets. The code and pretrained models are available at https://github.com/skelemoa/synse-zsl

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/07/2023

Zero-shot Skeleton-based Action Recognition via Mutual Information Estimation and Maximization

Zero-shot skeleton-based action recognition aims to recognize actions of...
research
03/27/2023

Prompt-Guided Zero-Shot Anomaly Action Recognition using Pretrained Deep Skeleton Features

This study investigates unsupervised anomaly action recognition, which i...
research
02/23/2022

ProFormer: Learning Data-efficient Representations of Body Movement with Prototype-based Feature Augmentation and Visual Transformers

Automatically understanding human behaviour allows household robots to i...
research
11/26/2019

Skeleton based Zero Shot Action Recognition in Joint Pose-Language Semantic Space

How does one represent an action? How does one describe an action that w...
research
10/16/2018

Cross-Modal and Hierarchical Modeling of Video and Text

Visual data and text data are composed of information at multiple granul...
research
11/28/2022

Action-GPT: Leveraging Large-scale Language Models for Improved and Generalized Action Generation

We introduce Action-GPT, a plug-and-play framework for incorporating Lar...
research
07/17/2022

Zero-Shot Temporal Action Detection via Vision-Language Prompting

Existing temporal action detection (TAD) methods rely on large training ...

Please sign up or login with your details

Forgot password? Click here to reset