Transductive Zero-Shot Learning using Cross-Modal CycleGAN

11/13/2020
by   Patrick Bordes, et al.
0

In Computer Vision, Zero-Shot Learning (ZSL) aims at classifying unseen classes – classes for which no matching training image exists. Most of ZSL works learn a cross-modal mapping between images and class labels for seen classes. However, the data distribution of seen and unseen classes might differ, causing a domain shift problem. Following this observation, transductive ZSL (T-ZSL) assumes that unseen classes and their associated images are known during training, but not their correspondence. As current T-ZSL approaches do not scale efficiently when the number of seen classes is high, we tackle this problem with a new model for T-ZSL based upon CycleGAN. Our model jointly (i) projects images on their seen class labels with a supervised objective and (ii) aligns unseen class labels and visual exemplars with adversarial and cycle-consistency objectives. We show the efficiency of our Cross-Modal CycleGAN model (CM-GAN) on the ImageNet T-ZSL task where we obtain state-of-the-art results. We further validate CM-GAN on a language grounding task, and on a new task that we propose: zero-shot sentence-to-image matching on MS COCO.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/16/2013

Zero-Shot Learning Through Cross-Modal Transfer

This work introduces a model that can recognize objects in images even i...
research
07/04/2022

DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

Zero-shot learning (ZSL) aims to predict unseen classes whose samples ha...
research
07/26/2021

Towards the Unseen: Iterative Text Recognition by Distilling from Errors

Visual text recognition is undoubtedly one of the most extensively resea...
research
10/21/2019

Zero-shot Learning via Simultaneous Generating and Learning

To overcome the absence of training data for unseen classes, conventiona...
research
05/03/2022

Cross-modal Representation Learning for Zero-shot Action Recognition

We present a cross-modal Transformer-based framework, which jointly enco...
research
02/05/2018

Zero-Shot Kernel Learning

In this paper, we address an open problem of zero-shot learning. Its pri...
research
05/29/2023

Improved Probabilistic Image-Text Representations

Image-Text Matching (ITM) task, a fundamental vision-language (VL) task,...

Please sign up or login with your details

Forgot password? Click here to reset