CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval

02/15/2022
by   Licheng Yu, et al.
0

We introduce CommerceMM - a multimodal model capable of providing a diverse and granular understanding of commerce topics associated to the given piece of content (image, text, image+text), and having the capability to generalize to a wide range of tasks, including Multimodal Categorization, Image-Text Retrieval, Query-to-Product Retrieval, Image-to-Product Retrieval, etc. We follow the pre-training + fine-tuning training regime and present 5 effective pre-training tasks on image-text pairs. To embrace more common and diverse commerce data with text-to-multimodal, image-to-multimodal, and multimodal-to-multimodal mapping, we propose another 9 novel cross-modal and cross-pair retrieval tasks, called Omni-Retrieval pre-training. The pre-training is conducted in an efficient manner with only two forward/backward updates for the combined 14 tasks. Extensive experiments and analysis show the effectiveness of each task. When combining all pre-training tasks, our model achieves state-of-the-art performance on 7 commerce-related downstream tasks after fine-tuning. Additionally, we propose a novel approach of modality randomization to dynamically adjust our model under different efficiency constraints.

READ FULL TEXT

page 1

page 3

page 8

research
10/26/2022

FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning

Multimodal tasks in the fashion domain have significant potential for e-...
research
12/14/2021

ACE-BERT: Adversarial Cross-modal Enhanced BERT for E-commerce Retrieval

Nowadays on E-commerce platforms, products are presented to the customer...
research
01/30/2023

MAKE: Vision-Language Pre-training based Product Retrieval in Taobao Search

Taobao Search consists of two phases: the retrieval phase and the rankin...
research
04/10/2023

Delving into E-Commerce Product Retrieval with Vision-Language Pre-training

E-commerce search engines comprise a retrieval phase and a ranking phase...
research
06/25/2023

Enhancing Dynamic Image Advertising with Vision-Language Pre-training

In the multimedia era, image is an effective medium in search advertisin...
research
05/27/2023

Benchmarking Diverse-Modal Entity Linking with Generative Models

Entities can be expressed in diverse formats, such as texts, images, or ...
research
05/22/2023

Efficient Large-Scale Vision Representation Learning

In this article, we present our approach to single-modality vision repre...

Please sign up or login with your details

Forgot password? Click here to reset