AVGZSLNet: Audio-Visual Generalized Zero-Shot Learning by Reconstructing Label Features from Multi-Modal Embeddings

05/27/2020
by   Pratik Mazumder, et al.
0

In this paper, we solve for the problem of generalized zero-shot learning in a multi-modal setting, where we have novel classes of audio/video during testing that were not seen during training. We demonstrate that projecting the audio and video embeddings to the class label text feature space allows us to use the semantic relatedness of text embeddings as a means for zero-shot learning. Importantly, our multi-modal zero-shot learning approach works even if a modality is missing at test time. Our approach makes use of a cross-modal decoder which enforces the constraint that the class label text features can be reconstructed from the audio and video embeddings of data points in order to perform better on the multi-modal zero-shot learning task. We further minimize the gap between audio and video embedding distributions using KL-Divergence loss. We test our approach on the zero-shot classification and retrieval tasks, and it performs better than other models in the presence of a single modality as well as in the presence of multiple modalities.

READ FULL TEXT
research
03/07/2022

Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language

Learning to classify video data from classes not included in the trainin...
research
12/08/2021

Everything at Once – Multi-modal Fusion Transformer for Video Retrieval

Multi-modal learning from video data has seen increased attention recent...
research
12/05/2018

Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders

Many approaches in generalized zero-shot learning rely on cross-modal ma...
research
09/01/2022

Zero-Shot Multi-Modal Artist-Controlled Retrieval and Exploration of 3D Object Sets

When creating 3D content, highly specialized skills are generally needed...
research
09/06/2021

Zero-Shot Open Set Detection by Extending CLIP

In a regular open set detection problem, samples of known classes (also ...
research
08/24/2023

Hyperbolic Audio-visual Zero-shot Learning

Audio-visual zero-shot learning aims to classify samples consisting of a...
research
09/20/2019

Retro-Actions: Learning 'Close' by Time-Reversing 'Open' Videos

We investigate video transforms that result in class-homogeneous label-t...

Please sign up or login with your details

Forgot password? Click here to reset