Spatial-Aware Object Embeddings for Zero-Shot Localization and Classification of Actions

07/28/2017
by   Pascal Mettes, et al.
0

We aim for zero-shot localization and classification of human actions in video. Where traditional approaches rely on global attribute or object classification scores for their zero-shot knowledge transfer, our main contribution is a spatial-aware object embedding. To arrive at spatial awareness, we build our embedding on top of freely available actor and object detectors. Relevance of objects is determined in a word embedding space and further enforced with estimated spatial preferences. Besides local object awareness, we also embed global object awareness into our embedding to maximize actor and object interaction. Finally, we exploit the object positions and sizes in the spatial-aware embedding to demonstrate a new spatio-temporal action retrieval scenario with composite queries. Action localization and classification experiments on four contemporary action video datasets support our proposal. Apart from state-of-the-art results in the zero-shot localization and classification settings, our spatial-aware embedding is even competitive with recent supervised action localization alternatives.

READ FULL TEXT

page 1

page 3

page 4

page 6

page 7

research
04/10/2021

Object Priors for Classifying and Localizing Unseen Actions

This work strives for the classification and localization of human actio...
research
07/17/2022

Zero-Shot Temporal Action Detection via Vision-Language Prompting

Existing temporal action detection (TAD) methods rely on large training ...
research
01/02/2019

Action2Vec: A Crossmodal Embedding Approach to Action Learning

We describe a novel cross-modal embedding space for actions, named Actio...
research
09/24/2022

Global Semantic Descriptors for Zero-Shot Action Recognition

The success of Zero-shot Action Recognition (ZSAR) methods is intrinsica...
research
06/13/2019

Semantics to Space(S2S): Embedding semantics into spatial space for zero-shot verb-object query inferencing

We present a novel deep zero-shot learning (ZSL) model for inferencing h...
research
08/15/2023

Helping Hands: An Object-Aware Ego-Centric Video Recognition Model

We introduce an object-aware decoder for improving the performance of sp...
research
11/15/2022

A Low-Shot Object Counting Network With Iterative Prototype Adaptation

We consider low-shot counting of arbitrary semantic categories in the im...

Please sign up or login with your details

Forgot password? Click here to reset