Self-Supervised Learning from Web Data for Multimodal Retrieval

01/07/2019
by   Raul Gomez, et al.
6

Self-Supervised learning from multimodal image and text data allows deep neural networks to learn powerful features with no need of human annotated data. Web and Social Media platforms provide a virtually unlimited amount of this multimodal data. In this work we propose to exploit this free available data to learn a multimodal image and text embedding, aiming to leverage the semantic knowledge learnt in the text domain and transfer it to a visual model for semantic image retrieval. We demonstrate that the proposed pipeline can learn from images with associated textwithout supervision and analyze the semantic structure of the learnt joint image and text embedding space. We perform a thorough analysis and performance comparison of five different state of the art text embeddings in three different benchmarks. We show that the embeddings learnt with Web and Social Media data have competitive performances over supervised methods in the text based image retrieval task, and we clearly outperform state of the art in the MIRFlickr dataset when training in the target data. Further, we demonstrate how semantic multimodal image retrieval can be performed using the learnt embeddings, going beyond classical instance-level retrieval problems. Finally, we present a new dataset, InstaCities1M, composed by Instagram images and their associated texts that can be used for fair comparison of image-text embeddings.

READ FULL TEXT

page 3

page 11

page 12

page 13

page 15

page 16

page 21

page 22

research
08/20/2018

Learning to Learn from Web Data through Deep Semantic Embeddings

In this paper we propose to learn a multimodal image and text embedding ...
research
07/24/2017

Full-Network Embedding in a Multimodal Embedding Pipeline

The current state-of-the-art for image annotation and image retrieval ta...
research
04/26/2021

Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos

Multimodal self-supervised learning is getting more and more attention a...
research
05/23/2017

Better Text Understanding Through Image-To-Text Transfer

Generic text embeddings are successfully used in a variety of tasks. How...
research
02/17/2021

I Want This Product but Different : Multimodal Retrieval with Synthetic Query Expansion

This paper addresses the problem of media retrieval using a multimodal q...
research
09/27/2017

Leveraging Weakly Annotated Data for Fashion Image Retrieval and Label Prediction

In this paper, we present a method to learn a visual representation adap...
research
03/06/2023

MABNet: Master Assistant Buddy Network with Hybrid Learning for Image Retrieval

Image retrieval has garnered growing interest in recent times. The curre...

Please sign up or login with your details

Forgot password? Click here to reset