MHTN: Modal-adversarial Hybrid Transfer Network for Cross-modal Retrieval

08/08/2017
by   Xin Huang, et al.
0

Cross-modal retrieval has drawn wide interest for retrieval across different modalities of data. However, existing methods based on DNN face the challenge of insufficient cross-modal training data, which limits the training effectiveness and easily leads to overfitting. Transfer learning is for relieving the problem of insufficient training data, but it mainly focuses on knowledge transfer only from large-scale datasets as single-modal source domain to single-modal target domain. Such large-scale single-modal datasets also contain rich modal-independent semantic knowledge that can be shared across different modalities. Besides, large-scale cross-modal datasets are very labor-consuming to collect and label, so it is significant to fully exploit the knowledge in single-modal datasets for boosting cross-modal retrieval. This paper proposes modal-adversarial hybrid transfer network (MHTN), which to the best of our knowledge is the first work to realize knowledge transfer from single-modal source domain to cross-modal target domain, and learn cross-modal common representation. It is an end-to-end architecture with two subnetworks: (1) Modal-sharing knowledge transfer subnetwork is proposed to jointly transfer knowledge from a large-scale single-modal dataset in source domain to all modalities in target domain with a star network structure, which distills modal-independent supplementary knowledge for promoting cross-modal common representation learning. (2) Modal-adversarial semantic learning subnetwork is proposed to construct an adversarial training mechanism between common representation generator and modality discriminator, making the common representation discriminative for semantics but indiscriminative for modalities to enhance cross-modal semantic consistency during transfer process. Comprehensive experiments on 4 widely-used datasets show its effectiveness and generality.

READ FULL TEXT

page 1

page 4

page 8

page 9

research
06/01/2017

Cross-modal Common Representation Learning by Hybrid Transfer Network

DNN-based cross-modal retrieval is a research hotspot to retrieve across...
research
03/10/2018

Deep Cross-media Knowledge Transfer

Cross-media retrieval is a research hotspot in multimedia area, which ai...
research
07/01/2022

(Un)likelihood Training for Interpretable Embedding

Cross-modal representation learning has become a new normal for bridging...
research
11/30/2017

Graph Distillation for Action Detection with Privileged Information

In this work, we propose a technique that tackles the video understandin...
research
10/01/2016

X-CNN: Cross-modal Convolutional Neural Networks for Sparse Datasets

In this paper we propose cross-modal convolutional neural networks (X-CN...
research
09/23/2022

Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval

As an increasingly popular task in multimedia information retrieval, vid...
research
04/14/2023

Cross-domain Food Image-to-Recipe Retrieval by Weighted Adversarial Learning

Food image-to-recipe aims to learn an embedded space linking the rich se...

Please sign up or login with your details

Forgot password? Click here to reset