Learning Semantic Concepts and Order for Image and Sentence Matching

12/06/2017
by   Yan Huang, et al.
0

Image and sentence matching has made great progress recently, but it remains challenging due to the large visual-semantic discrepancy. This mainly arises from that the representation of pixel-level image usually lacks of high-level semantic information as in its matched sentence. In this work, we propose a semantic-enhanced image and sentence matching model, which can improve the image representation by learning semantic concepts and then organizing them in a correct semantic order. Given an image, we first use a multi-regional multi-label CNN to predict its semantic concepts, including objects, properties, actions, etc. Then, considering that different orders of semantic concepts lead to diverse semantic meanings, we use a context-gated sentence generation scheme for semantic order learning. It simultaneously uses the image global context containing concept relations as reference and the groundtruth semantic order in the matched sentence as supervision. After obtaining the improved image representation, we learn the sentence representation with a conventional LSTM, and then jointly perform image and sentence matching and sentence generation for model learning. Extensive experiments demonstrate the effectiveness of our learned semantic concepts and order, by achieving the state-of-the-art results on two public benchmark datasets.

READ FULL TEXT

page 1

page 4

page 8

research
05/16/2020

Sequential Sentence Matching Network for Multi-turn Response Selection in Retrieval-based Chatbots

Recently, open domain multi-turn chatbots have attracted much interest f...
research
09/12/2021

Constructing Phrase-level Semantic Labels to Form Multi-Grained Supervision for Image-Text Retrieval

Existing research for image text retrieval mainly relies on sentence-lev...
research
11/17/2016

Instance-aware Image and Sentence Matching with Selective Multimodal LSTM

Effective image and sentence matching depends on how to well measure the...
research
04/23/2016

Word2VisualVec: Image and Video to Sentence Matching by Visual Feature Prediction

This paper strives to find the sentence best describing the content of a...
research
04/29/2020

"The Boating Store Had Its Best Sail Ever": Pronunciation-attentive Contextualized Pun Recognition

Humor plays an important role in human languages and it is essential to ...
research
09/26/2022

Word to Sentence Visual Semantic Similarity for Caption Generation: Lessons Learned

This paper focuses on enhancing the captions generated by image-caption ...
research
03/30/2016

Enhancing Sentence Relation Modeling with Auxiliary Character-level Embedding

Neural network based approaches for sentence relation modeling automatic...

Please sign up or login with your details

Forgot password? Click here to reset