DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

07/04/2022
by   Zhuo Chen, et al.
0

Zero-shot learning (ZSL) aims to predict unseen classes whose samples have never appeared during training, often utilizing additional semantic information (a.k.a. side information) to bridge the training (seen) classes and the unseen classes. One of the most effective and widely used semantic information for zero-shot image classification are attributes which are annotations for class-level visual characteristics. However, due to the shortage of fine-grained annotations, the attribute imbalance and co-occurrence, the current methods often fail to discriminate those subtle visual distinctions between images, which limits their performances. In this paper, we present a transformer-based end-to-end ZSL method named DUET, which integrates latent semantic knowledge from the pretrained language models (PLMs) via a self-supervised multi-modal learning paradigm. Specifically, we (1) developed a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images, (2) applied an attribute-level contrastive learning strategy to further enhance the model's discrimination on fine-grained visual characteristics against the attribute co-occurrence and imbalance, and (3) proposed a multi-task learning policy for considering multi-model objectives. With extensive experiments on three standard ZSL benchmarks and a knowledge graph equipped ZSL benchmark, we find that DUET can often achieve state-of-the-art performance, its components are effective and its predictions are interpretable.

READ FULL TEXT

page 1

page 4

page 8

research
11/13/2020

Transductive Zero-Shot Learning using Cross-Modal CycleGAN

In Computer Vision, Zero-Shot Learning (ZSL) aims at classifying unseen ...
research
01/16/2013

Zero-Shot Learning Through Cross-Modal Transfer

This work introduces a model that can recognize objects in images even i...
research
09/03/2021

Information Symmetry Matters: A Modal-Alternating Propagation Network for Few-Shot Learning

Semantic information provides intra-class consistency and inter-class di...
research
12/24/2021

Learning Aligned Cross-Modal Representation for Generalized Zero-Shot Classification

Learning a common latent embedding by aligning the latent spaces of cros...
research
04/22/2021

Attribute-Modulated Generative Meta Learning for Zero-Shot Classification

Zero-shot learning (ZSL) aims to transfer knowledge from seen classes to...
research
06/13/2018

Cross-modal Hallucination for Few-shot Fine-grained Recognition

State-of-the-art deep learning algorithms generally require large amount...
research
04/05/2020

Long-tail learning with attributes

Learning to classify images with unbalanced class distributions is chall...

Please sign up or login with your details

Forgot password? Click here to reset