Full-Network Embedding in a Multimodal Embedding Pipeline

07/24/2017
by   Armand Vilalta, et al.
0

The current state-of-the-art for image annotation and image retrieval tasks is obtained through deep neural networks, which combine an image representation and a text representation into a shared embedding space. In this paper we evaluate the impact of using the Full-Network embedding in this setting, replacing the original image representation in a competitive multimodal embedding generation scheme. Unlike the one-layer image embeddings typically used by most approaches, the Full-Network embedding provides a multi-scale representation of images, which results in richer characterizations. To measure the influence of the Full-Network embedding, we evaluate its performance on three different datasets, and compare the results with the original multimodal embedding generation scheme when using a one-layer image embedding, and with the rest of the state-of-the-art. Results for image annotation and image retrieval tasks indicate that the Full-Network embedding is consistently superior to the one-layer embedding. These results motivate the integration of the Full-Network embedding on any multimodal embedding generation scheme, something feasible thanks to the flexibility of the approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/07/2019

Self-Supervised Learning from Web Data for Multimodal Retrieval

Self-Supervised learning from multimodal image and text data allows deep...
research
08/20/2018

Learning to Learn from Web Data through Deep Semantic Embeddings

In this paper we propose to learn a multimodal image and text embedding ...
research
05/26/2023

Generating Images with Multimodal Language Models

We propose a method to fuse frozen text-only large language models (LLMs...
research
05/23/2016

Embedding based on function approximation for large scale image search

The objective of this paper is to design an embedding method that maps l...
research
06/05/2019

Efficient Codebook and Factorization for Second Order Representation Learning

Learning rich and compact representations is an open topic in many field...
research
05/17/2021

Leveraging EfficientNet and Contrastive Learning for Accurate Global-scale Location Estimation

In this paper, we address the problem of global-scale image geolocation,...
research
04/02/2019

Cooperative Embeddings for Instance, Attribute and Category Retrieval

The goal of this paper is to retrieve an image based on instance, attrib...

Please sign up or login with your details

Forgot password? Click here to reset