Cross-modal Hallucination for Few-shot Fine-grained Recognition

06/13/2018
by   Frederik Pahde, et al.
0

State-of-the-art deep learning algorithms generally require large amounts of data for model training. Lack thereof can severely deteriorate the performance, particularly in scenarios with fine-grained boundaries between categories. To this end, we propose a multimodal approach that facilitates bridging the information gap by means of meaningful joint embeddings. Specifically, we present a benchmark that is multimodal during training (i.e. images and texts) and single-modal in testing time (i.e. images), with the associated task to utilize multimodal data in base classes (with many samples), to learn explicit visual classifiers for novel classes (with few samples). Next, we propose a framework built upon the idea of cross-modal data hallucination. In this regard, we introduce a discriminative text-conditional GAN for sample generation with a simple self-paced strat- egy for sample selection. We show the results of our pro- posed discriminative hallucinated method for 1-, 2-, and 5- shot learning on the CUB dataset, where the accuracy is improved by employing multimodal data.

READ FULL TEXT

page 4

page 5

research
11/22/2018

Self Paced Adversarial Training for Multimodal Few-shot Learning

State-of-the-art deep learning algorithms yield remarkable results in ma...
research
11/17/2020

Multimodal Prototypical Networks for Few-shot Learning

Although providing exceptional results for many computer vision tasks, s...
research
01/16/2023

Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models

The ability to quickly learn a new task with minimal instruction - known...
research
01/04/2019

Low-Shot Learning from Imaginary 3D Model

Since the advent of deep learning, neural networks have demonstrated rem...
research
07/04/2022

DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

Zero-shot learning (ZSL) aims to predict unseen classes whose samples ha...
research
05/08/2018

Category-Based Deep CCA for Fine-Grained Venue Discovery from Multimodal Data

In this work, travel destination and business location are taken as venu...
research
05/19/2023

Information Screening whilst Exploiting! Multimodal Relation Extraction with Feature Denoising and Multimodal Topic Modeling

Existing research on multimodal relation extraction (MRE) faces two co-e...

Please sign up or login with your details

Forgot password? Click here to reset