EDIS: Entity-Driven Image Search over Multimodal Web Content

05/23/2023
by   Siqi Liu, et al.
0

Making image retrieval methods practical for real-world search applications requires significant progress in dataset scales, entity comprehension, and multimodal information fusion. In this work, we introduce Entity-Driven Image Search (EDIS), a challenging dataset for cross-modal image search in the news domain. EDIS consists of 1 million web images from actual search engine results and curated datasets, with each image paired with a textual description. Unlike datasets that assume a small set of single-modality candidates, EDIS reflects real-world web image search scenarios by including a million multimodal image-text pairs as candidates. EDIS encourages the development of retrieval models that simultaneously address cross-modal information fusion and matching. To achieve accurate ranking results, a model must: 1) understand named entities and events from text queries, 2) ground entities onto images or text descriptions, and 3) effectively fuse textual and visual representations. Our experimental results show that EDIS challenges state-of-the-art methods with dense entities and a large-scale candidate set. The ablation study also proves that fusing textual features with visual features is critical in improving retrieval results.

READ FULL TEXT

page 1

page 8

page 12

page 13

page 14

page 15

page 16

page 17

research
07/26/2023

Neural-based Cross-modal Search and Retrieval of Artwork

Creating an intelligent search and retrieval system for artwork images, ...
research
07/26/2023

Boon: A Neural Search Engine for Cross-Modal Information Retrieval

Visual-Semantic Embedding (VSE) networks can help search engines better ...
research
07/13/2020

A Feature Analysis for Multimodal News Retrieval

Content-based information retrieval is based on the information containe...
research
04/28/2021

QuTI! Quantifying Text-Image Consistency in Multimodal Documents

The World Wide Web and social media platforms have become popular source...
research
03/23/2020

Multimodal Analytics for Real-world News using Measures of Cross-modal Entity Consistency

The World Wide Web has become a popular source for gathering information...
research
06/05/2018

JTAV: Jointly Learning Social Media Content Representation by Fusing Textual, Acoustic, and Visual Features

Learning social media content is the basis of many real-world applicatio...
research
08/17/2023

FashionLOGO: Prompting Multimodal Large Language Models for Fashion Logo Embeddings

Logo embedding plays a crucial role in various e-commerce applications b...

Please sign up or login with your details

Forgot password? Click here to reset