Modeling Text with Graph Convolutional Network for Cross-Modal Information Retrieval

02/03/2018
by   Jing Yu, et al.
0

Cross-modal information retrieval aims to find heterogeneous data of various modalities from a given query of one modality. The main challenge is to map different modalities into a common semantic space, in which distance between concepts in different modalities can be well modeled. For cross-modal information retrieval between images and texts, existing work mostly uses off-the-shelf Convolutional Neural Network (CNN) for image feature extraction. For texts, word-level features such as bag-of-words or word2vec are employed to build deep learning models to represent texts. Besides word-level semantics, the semantic relations between words are also informative but less explored. In this paper, we model texts by graphs using similarity measure based on word2vec. A dual-path neural network model is proposed for couple feature learning in cross-modal information retrieval. One path utilizes Graph Convolutional Network (GCN) for text modeling based on graph representations. The other path uses a neural network with layers of nonlinearities for image modeling based on off-the-shelf features. The model is trained by a pairwise similarity loss function to maximize the similarity of relevant text-image pairs and minimize the similarity of irrelevant pairs. Experimental results show that the proposed model outperforms the state-of-the-art methods significantly, with 17

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2018

Semantic Modeling of Textual Relationships in Cross-Modal Retrieval

Feature modeling of different modalities is a basic problem in current r...
research
10/31/2018

Textual Relationship Modeling for Cross-Modal Information Retrieval

Feature representation of different modalities is the main focus of curr...
research
10/10/2022

Semantically Enhanced Hard Negatives for Cross-modal Information Retrieval

Visual Semantic Embedding (VSE) aims to extract the semantics of images ...
research
05/25/2016

SS4MCT: A Statistical Stemmer for Morphologically Complex Texts

There have been multiple attempts to resolve various inflection matching...
research
04/20/2023

Is Cross-modal Information Retrieval Possible without Training?

Encoded representations from a pretrained deep learning model (e.g., BER...
research
05/13/2022

Modeling Semantic Composition with Syntactic Hypergraph for Video Question Answering

A key challenge in video question answering is how to realize the cross-...
research
05/19/2018

Do Neural Network Cross-Modal Mappings Really Bridge Modalities?

Feed-forward networks are widely used in cross-modal applications to bri...

Please sign up or login with your details

Forgot password? Click here to reset