Multimodal Convolutional Neural Networks for Matching Image and Sentence

04/23/2015
by   Lin Ma, et al.
0

In this paper, we propose multimodal convolutional neural networks (m-CNNs) for matching image and sentence. Our m-CNN provides an end-to-end framework with convolutional architectures to exploit image representation, word composition, and the matching relations between the two modalities. More specifically, it consists of one image CNN encoding the image content, and one matching CNN learning the joint representation of image and sentence. The matching CNN composes words to different semantic fragments and learns the inter-modal relations between image and the composed fragments at different levels, thus fully exploit the matching relations between image and sentence. Experimental results on benchmark databases of bidirectional image and sentence retrieval demonstrate that the proposed m-CNNs can effectively capture the information necessary for image and sentence matching. Specifically, our proposed m-CNNs for bidirectional image and sentence retrieval on Flickr30K and Microsoft COCO databases achieve the state-of-the-art performances.

READ FULL TEXT
research
06/01/2015

Learning to Answer Questions From Image Using Convolutional Neural Network

In this paper, we propose to employ the convolutional neural network (CN...
research
06/22/2014

Deep Fragment Embeddings for Bidirectional Image Sentence Mapping

We introduce a model for bidirectional retrieval of images and sentences...
research
03/26/2019

Exploring Confidence Measures for Word Spotting in Heterogeneous Datasets

In recent years, convolutional neural networks (CNNs) took over the fiel...
research
04/23/2016

Word2VisualVec: Image and Video to Sentence Matching by Visual Feature Prediction

This paper strives to find the sentence best describing the content of a...
research
10/30/2018

Multimodal matching using a Hybrid Convolutional Neural Network

In this work we propose a novel Convolutional Neural Network (CNN) archi...
research
09/14/2015

Deep Learning Applied to Image and Text Matching

The ability to describe images with natural language sentences is the ha...
research
08/19/2017

Image2song: Song Retrieval via Bridging Image Content and Lyric Words

Image is usually taken for expressing some kinds of emotions or purposes...

Please sign up or login with your details

Forgot password? Click here to reset