Zero-Shot Action Recognition in Videos: A Survey

Zero-Shot Action Recognition has attracted attention in the last years, and many approaches have been proposed for recognition of objects, events, and actions in images and videos. There is a demand for methods that can classify instances from classes that are not present in the training of models, especially in the complex task of automatic video understanding, since collecting, annotating, and labeling videos are difficult and laborious tasks. We identify that there are many methods available in the literature, however, it is difficult to categorize which techniques can be considered state of the art. Despite the existence of some surveys about zero-shot action recognition in still images and experimental protocol, there is no work focusing on videos. Hence, in this paper, we present a survey of the methods comprising techniques to perform visual feature extraction and semantic feature extraction as well to learn the mapping between these features considering specifically zero-shot action recognition in videos. We also provide a complete description of datasets, experiments, and protocols, presenting open issues and directions for future work essential for the development of the computer vision research field.

READ FULL TEXT

page 2

page 4

research
07/21/2019

TARN: Temporal Attentive Relation Network for Few-Shot and Zero-Shot Action Recognition

In this paper we propose a novel Temporal Attentive Relation Network (TA...
research
08/04/2019

Action Recognition in Untrimmed Videos with Composite Self-Attention Two-Stream Framework

With the rapid development of deep learning algorithms, action recogniti...
research
01/22/2020

Zero-Shot Activity Recognition with Videos

In this paper, we examined the zero-shot activity recognition task with ...
research
12/23/2015

Convolutional Architecture Exploration for Action Recognition and Image Classification

Convolutional Architecture for Fast Feature Encoding (CAFFE) [11] is a s...
research
02/01/2023

Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization

Contrastive Language-Image Pretraining (CLIP) has demonstrated impressiv...
research
02/05/2015

Semantic Embedding Space for Zero-Shot Action Recognition

The number of categories for action recognition is growing rapidly. It i...
research
11/13/2015

Transductive Zero-Shot Action Recognition by Word-Vector Embedding

The number of categories for action recognition is growing rapidly and i...

Please sign up or login with your details

Forgot password? Click here to reset