Joint Visual-Textual Embedding for Multimodal Style Search

06/15/2019
by   Gil Sadeh, et al.
0

We introduce a multimodal visual-textual search refinement method for fashion garments. Existing search engines do not enable intuitive, interactive, refinement of retrieved results based on the properties of a particular product. We propose a method to retrieve similar items, based on a query item image and textual refinement properties. We believe this method can be leveraged to solve many real-life customer scenarios, in which a similar item in a different color, pattern, length or style is desired. We employ a joint embedding training scheme in which product images and their catalog textual metadata are mapped closely in a shared space. This joint visual-textual embedding space enables manipulating catalog images semantically, based on textual refinement requirements. We propose a new training objective function, Mini-Batch Match Retrieval, and demonstrate its superiority over the commonly used triplet loss. Additionally, we demonstrate the feasibility of adding an attribute extraction module, trained on the same catalog data, and demonstrate how to integrate it within the multimodal search to boost its performance. We introduce an evaluation protocol with an associated benchmark, and compare several approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/08/2018

DeepStyle: Multimodal Search Engine for Fashion and Interior Design

In this paper, we propose a multimodal search engine that combines visua...
research
11/27/2018

One-Shot Item Search with Multimodal Data

In the task of near similar image search, features from Deep Neural Netw...
research
02/10/2023

Unified Vision-Language Representation Modeling for E-Commerce Same-Style Products Retrieval

Same-style products retrieval plays an important role in e-commerce plat...
research
08/17/2023

FashionLOGO: Prompting Multimodal Large Language Models for Fashion Logo Embeddings

Logo embedding plays a crucial role in various e-commerce applications b...
research
07/21/2017

What Looks Good with my Sofa: Multimodal Search Engine for Interior Design

In this paper, we propose a multi-modal search engine for interior desig...
research
06/01/2023

PV2TEA: Patching Visual Modality to Textual-Established Information Extraction

Information extraction, e.g., attribute value extraction, has been exten...
research
06/23/2018

Towards Practical Visual Search Engine within Elasticsearch

In this paper, we describe our end-to-end content-based image retrieval ...

Please sign up or login with your details

Forgot password? Click here to reset