Disjoint Mapping Network for Cross-modal Matching of Voices and Faces

07/12/2018
by   Yandong Wen, et al.
0

We propose a novel framework, called Disjoint Mapping Network (DIMNet), for cross-modal biometric matching, in particular of voices and faces. Different from the existing methods, DIMNet does not explicitly learn the joint relationship between the modalities. Instead, DIMNet learns a shared representation for different modalities by mapping them individually to their common covariates. These shared representations can then be used to find the correspondences between the modalities. We show empirically that DIMNet is able to achieve better performance than other current methods, with the additional benefits of being conceptually simpler and less data-intensive.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/25/2016

Learning Aligned Cross-Modal Representations from Weakly Aligned Data

People can recognize scenes across many different modalities beyond natu...
research
10/27/2016

Cross-Modal Scene Networks

People can recognize scenes across many different modalities beyond natu...
research
11/28/2014

Cross-Modal Learning via Pairwise Constraints

In multimedia applications, the text and image components in a web docum...
research
10/08/2019

A Test for Shared Patterns in Cross-modal Brain Activation Analysis

Determining the extent to which different cognitive modalities (understo...
research
04/16/2019

Shared Predictive Cross-Modal Deep Quantization

With explosive growth of data volume and ever-increasing diversity of da...
research
03/25/2021

Discriminative Semantic Transitive Consistency for Cross-Modal Learning

Cross-modal retrieval is generally performed by projecting and aligning ...
research
12/05/2016

Deep Multi-Modal Image Correspondence Learning

Inference of correspondences between images from different modalities is...

Please sign up or login with your details

Forgot password? Click here to reset