Catalog Phrase Grounding (CPG): Grounding of Product Textual Attributes in Product Images for e-commerce Vision-Language Applications

08/30/2023
by   Wenyi Wu, et al.
0

We present Catalog Phrase Grounding (CPG), a model that can associate product textual data (title, brands) into corresponding regions of product images (isolated product region, brand logo region) for e-commerce vision-language applications. We use a state-of-the-art modulated multimodal transformer encoder-decoder architecture unifying object detection and phrase-grounding. We train the model in self-supervised fashion with 2.3 million image-text pairs synthesized from an e-commerce site. The self-supervision data is annotated with high-confidence pseudo-labels generated with a combination of teacher models: a pre-trained general domain phrase grounding model (e.g. MDETR) and a specialized logo detection model. This allows CPG, as a student model, to benefit from transfer knowledge from these base models combining general-domain knowledge and specialized knowledge. Beyond immediate catalog phrase grounding tasks, we can benefit from CPG representations by incorporating them as ML features into downstream catalog applications that require deep semantic understanding of products. Our experiments on product-brand matching, a challenging e-commerce application, show that incorporating CPG representations into the existing production ensemble system leads to on average 5 improvement across all countries globally (with the largest lift of 11 single country) at fixed 95 including a logo detection teacher model and ResNet50.

READ FULL TEXT
research
04/14/2021

K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce

Existing pre-trained language models (PLMs) have demonstrated the effect...
research
12/07/2021

Grounded Language-Image Pre-training

This paper presents a grounded language-image pre-training (GLIP) model ...
research
09/07/2020

E-BERT: A Phrase and Product Knowledge Enhanced Language Model for E-commerce

Pre-trained language models such as BERT have achieved great success in ...
research
06/17/2020

Contrastive Learning for Weakly Supervised Phrase Grounding

Phrase grounding, the problem of associating image regions to caption wo...
research
11/05/2020

Utilizing Every Image Object for Semi-supervised Phrase Grounding

Phrase grounding models localize an object in the image given a referrin...
research
03/14/2023

Medical Phrase Grounding with Region-Phrase Context Contrastive Alignment

Medical phrase grounding (MPG) aims to locate the most relevant region i...
research
10/21/2022

Describing Sets of Images with Textual-PCA

We seek to semantically describe a set of images, capturing both the att...

Please sign up or login with your details

Forgot password? Click here to reset