Using Text to Teach Image Retrieval

by   Haoyu Dong, et al.

Image retrieval relies heavily on the quality of the data modeling and the distance measurement in the feature space. Building on the concept of image manifold, we first propose to represent the feature space of images, learned via neural networks, as a graph. Neighborhoods in the feature space are now defined by the geodesic distance between images, represented as graph vertices or manifold samples. When limited images are available, this manifold is sparsely sampled, making the geodesic computation and the corresponding retrieval harder. To address this, we augment the manifold samples with geometrically aligned text, thereby using a plethora of sentences to teach us about images. In addition to extensive results on standard datasets illustrating the power of text to help in image retrieval, a new public dataset based on CLEVR is introduced to quantify the semantic similarity between visual data and text data. The experimental results show that the joint embedding manifold is a robust representation, allowing it to be a better basis to perform image retrieval given only an image and a textual instruction on the desired modifications over the image



page 1

page 6

page 7


Composing Text and Image for Image Retrieval - An Empirical Odyssey

In this paper, we study the task of image retrieval, where the input que...

Optimized Feature Space Learning for Generating Efficient Binary Codes for Image Retrieval

In this paper we propose an approach for learning low dimensional optimi...

Learning Deep Similarity Models with Focus Ranking for Fabric Image Retrieval

Fabric image retrieval is beneficial to many applications including clot...

Multitask Text-to-Visual Embedding with Titles and Clickthrough Data

Text-visual (or called semantic-visual) embedding is a central problem i...

Learning Global and Local Consistent Representations for Unsupervised Image Retrieval via Deep Graph Diffusion Networks

Diffusion has shown great success in improving accuracy of unsupervised ...

Iterative Manifold Embedding Layer Learned by Incomplete Data for Large-scale Image Retrieval

Existing manifold learning methods are not appropriate for image retriev...

Understanding Image Retrieval Re-Ranking: A Graph Neural Network Perspective

The re-ranking approach leverages high-confidence retrieved samples to r...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.