Deep Cross Modal Learning for Caricature Verification and Identification(CaVINet)

07/31/2018
by   Jatin Garg, et al.
4

Learning from different modalities is a challenging task. In this paper, we look at the challenging problem of cross modal face verification and recognition between caricature and visual image modalities. Caricature have exaggerations of facial features of a person. Due to the significant variations in the caricatures, building vision models for recognizing and verifying data from this modality is an extremely challenging task. Visual images with significantly lesser amount of distortions can act as a bridge for the analysis of caricature modality. We introduce a publicly available large Caricature-VIsual dataset [CaVI] with images from both the modalities that captures the rich variations in the caricature of an identity. This paper presents the first cross modal architecture that handles extreme distortions of caricatures using a deep learning network that learns similar representations across the modalities. We use two convolutional networks along with transformations that are subjected to orthogonality constraints to capture the shared and modality specific representations. In contrast to prior research, our approach neither depends on manually extracted facial landmarks for learning the representations, nor on the identities of the person for performing verification. The learned shared representation achieves 91 accuracy for verifying unseen images and 75 Further, recognizing the identity in the image by knowledge transfer using a combination of shared and modality specific representations, resulted in an unprecedented performance of 85 accuracy for visual images.

READ FULL TEXT

page 2

page 6

page 8

research
07/25/2016

Learning Aligned Cross-Modal Representations from Weakly Aligned Data

People can recognize scenes across many different modalities beyond natu...
research
10/27/2016

Cross-Modal Scene Networks

People can recognize scenes across many different modalities beyond natu...
research
10/27/2018

A Cross-Modal Distillation Network for Person Re-identification in RGB-Depth

Person re-identification involves the recognition over time of individua...
research
08/28/2019

Adversarial Representation Learning for Text-to-Image Matching

For many computer vision applications such as image captioning, visual q...
research
12/12/2020

Periocular in the Wild Embedding Learning with Cross-Modal Consistent Knowledge Distillation

Periocular biometric, or peripheral area of ocular, is a collaborative a...
research
10/08/2019

A Test for Shared Patterns in Cross-modal Brain Activation Analysis

Determining the extent to which different cognitive modalities (understo...
research
01/20/2022

Omnivore: A Single Model for Many Visual Modalities

Prior work has studied different visual modalities in isolation and deve...

Please sign up or login with your details

Forgot password? Click here to reset