Semantic Embedding Space for Zero-Shot Action Recognition

02/05/2015
by   Xun Xu, et al.
0

The number of categories for action recognition is growing rapidly. It is thus becoming increasingly hard to collect sufficient training data to learn conventional models for each category. This issue may be ameliorated by the increasingly popular 'zero-shot learning' (ZSL) paradigm. In this framework a mapping is constructed between visual features and a human interpretable semantic description of each category, allowing categories to be recognised in the absence of any training data. Existing ZSL studies focus primarily on image data, and attribute-based semantic representations. In this paper, we address zero-shot recognition in contemporary video action recognition tasks, using semantic word vector space as the common space to embed videos and category labels. This is more challenging because the mapping between the semantic space and space-time features of videos containing complex actions is more complex and harder to learn. We demonstrate that a simple self-training and data augmentation strategy can significantly improve the efficacy of this mapping. Experiments on human action datasets including HMDB51 and UCF101 demonstrate that our approach achieves the state-of-the-art zero-shot action recognition performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/13/2015

Transductive Zero-Shot Action Recognition by Word-Vector Embedding

The number of categories for action recognition is growing rapidly and i...
research
11/26/2016

Multi-Task Zero-Shot Action Recognition with Prioritised Data Augmentation

Zero-Shot Learning (ZSL) promises to scale visual recognition by bypassi...
research
06/28/2017

Alternative Semantic Representations for Zero-Shot Human Action Recognition

A proper semantic representation for encoding side information is key to...
research
09/13/2019

Zero-Shot Action Recognition in Videos: A Survey

Zero-Shot Action Recognition has attracted attention in the last years, ...
research
01/02/2019

Action2Vec: A Crossmodal Embedding Approach to Action Learning

We describe a novel cross-modal embedding space for actions, named Actio...
research
12/18/2021

Tell me what you see: A zero-shot action recognition method based on natural language descriptions

Recently, several approaches have explored the detection and classificat...
research
04/01/2016

Learning a Pose Lexicon for Semantic Action Recognition

This paper presents a novel method for learning a pose lexicon comprisin...

Please sign up or login with your details

Forgot password? Click here to reset