Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval

05/16/2021
by   Kazuya Ueki, et al.
0

Visual-semantic embedding is an interesting research topic because it is useful for various tasks, such as visual question answering (VQA), image-text retrieval, image captioning, and scene graph generation. In this paper, we focus on zero-shot image retrieval using sentences as queries and present a survey of the technological trends in this area. First, we provide a comprehensive overview of the history of the technology, starting with a discussion of the early studies of image-to-text matching and how the technology has evolved over time. In addition, a description of the datasets commonly used in experiments and a comparison of the evaluation results of each method are presented. We also introduce the implementation available on github for use in confirming the accuracy of experiments and for further improvement. We hope that this survey paper will encourage researchers to further develop their research on bridging images and languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2022

MUST-VQA: MUltilingual Scene-text VQA

In this paper, we present a framework for Multilingual Scene Text Visual...
research
03/07/2023

Graph Neural Networks in Vision-Language Image Understanding: A Survey

2D image understanding is a complex problem within Computer Vision, but ...
research
11/17/2016

Zero-Shot Visual Question Answering

Part of the appeal of Visual Question Answering (VQA) is its promise to ...
research
07/23/2020

ZSCRGAN: A GAN-based Expectation Maximization Model for Zero-Shot Retrieval of Images from Textual Descriptions

Most existing algorithms for cross-modal Information Retrieval are based...
research
03/30/2023

If At First You Don't Succeed: Test Time Re-ranking for Zero-shot, Cross-domain Retrieval

In this paper we propose a novel method for zero-shot, cross-domain imag...
research
11/12/2022

Partial Visual-Semantic Embedding: Fashion Intelligence System with Sensitive Part-by-Part Learning

In this study, we propose a technology called the Fashion Intelligence S...
research
03/28/2015

Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement and Retrieval

Where previous reviews on content-based image retrieval emphasize on wha...

Please sign up or login with your details

Forgot password? Click here to reset