Action2Vec: A Crossmodal Embedding Approach to Action Learning

01/02/2019
by   Meera Hahn, et al.
0

We describe a novel cross-modal embedding space for actions, named Action2Vec, which combines linguistic cues from class labels with spatio-temporal features derived from video clips. Our approach uses a hierarchical recurrent network to capture the temporal structure of video features. We train our embedding using a joint loss that combines classification accuracy with similarity to Word2Vec semantics. We evaluate Action2Vec by performing zero shot action recognition and obtain state of the art results on three standard datasets. In addition, we present two novel analogy tests which quantify the extent to which our joint embedding captures distributional semantics. This is the first joint embedding space to combine verbs and action videos, and the first to be thoroughly evaluated with respect to its distributional semantics.

READ FULL TEXT
research
04/21/2020

TAEN: Temporal Aware Embedding Network for Few-Shot Action Recognition

Classification of a new class entities requires collecting and annotatin...
research
07/28/2017

Spatial-Aware Object Embeddings for Zero-Shot Localization and Classification of Actions

We aim for zero-shot localization and classification of human actions in...
research
05/09/2023

ImageBind: One Embedding Space To Bind Them All

We present ImageBind, an approach to learn a joint embedding across six ...
research
02/05/2015

Semantic Embedding Space for Zero-Shot Action Recognition

The number of categories for action recognition is growing rapidly. It i...
research
12/02/2015

Zero-Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos

We propose a new zero-shot Event Detection method by Multi-modal Distrib...
research
06/17/2015

Learning Contextualized Semantics from Co-occurring Terms via a Siamese Architecture

One of the biggest challenges in Multimedia information retrieval and un...
research
06/21/2018

Learning Shared Multimodal Embeddings with Unpaired Data

In this paper, we propose a method to learn a joint multimodal embedding...

Please sign up or login with your details

Forgot password? Click here to reset