Learning to Learn from Web Data through Deep Semantic Embeddings

08/20/2018
by   Raul Gomez, et al.
2

In this paper we propose to learn a multimodal image and text embedding from Web and Social Media data, aiming to leverage the semantic knowledge learnt in the text domain and transfer it to a visual model for semantic image retrieval. We demonstrate that the pipeline can learn from images with associated text without supervision and perform a thourough analysis of five different text embeddings in three different benchmarks. We show that the embeddings learnt with Web and Social Media data have competitive performances over supervised methods in the text based image retrieval task, and we clearly outperform state of the art in the MIRFlickr dataset when training in the target data. Further we demonstrate how semantic multimodal image retrieval can be performed using the learnt embeddings, going beyond classical instance-level retrieval problems. Finally, we present a new dataset, InstaCities1M, composed by Instagram images and their associated texts that can be used for fair comparison of image-text embeddings.

READ FULL TEXT

page 3

page 8

page 10

research
01/07/2019

Self-Supervised Learning from Web Data for Multimodal Retrieval

Self-Supervised learning from multimodal image and text data allows deep...
research
05/23/2017

Better Text Understanding Through Image-To-Text Transfer

Generic text embeddings are successfully used in a variety of tasks. How...
research
07/24/2017

Full-Network Embedding in a Multimodal Embedding Pipeline

The current state-of-the-art for image annotation and image retrieval ta...
research
02/17/2021

I Want This Product but Different : Multimodal Retrieval with Synthetic Query Expansion

This paper addresses the problem of media retrieval using a multimodal q...
research
08/28/2020

Semantics Preserving Hierarchy based Retrieval of Indian heritage monuments

Monument classification can be performed on the basis of their appearanc...
research
12/06/2021

Embedding Arithmetic for Text-driven Image Transformation

Latent text representations exhibit geometric regularities, such as the ...
research
02/14/2018

MemeSequencer: Sparse Matching for Embedding Image Macros

The analysis of the creation, mutation, and propagation of social media ...

Please sign up or login with your details

Forgot password? Click here to reset