ACE-BERT: Adversarial Cross-modal Enhanced BERT for E-commerce Retrieval

12/14/2021
by   Boxuan Zhang, et al.
0

Nowadays on E-commerce platforms, products are presented to the customers with multiple modalities. These multiple modalities are significant for a retrieval system while providing attracted products for customers. Therefore, how to take into account those multiple modalities simultaneously to boost the retrieval performance is crucial. This problem is a huge challenge to us due to the following reasons: (1) the way of extracting patch features with the pre-trained image model (e.g., CNN-based model) has much inductive bias. It is difficult to capture the efficient information from the product image in E-commerce. (2) The heterogeneity of multimodal data makes it challenging to construct the representations of query text and product including title and image in a common subspace. We propose a novel Adversarial Cross-modal Enhanced BERT (ACE-BERT) for efficient E-commerce retrieval. In detail, ACE-BERT leverages the patch features and pixel features as image representation. Thus the Transformer architecture can be applied directly to the raw image sequences. With the pre-trained enhanced BERT as the backbone network, ACE-BERT further adopts adversarial learning by adding a domain classifier to ensure the distribution consistency of different modality representations for the purpose of narrowing down the representation gap between query and product. Experimental results demonstrate that ACE-BERT outperforms the state-of-the-art approaches on the retrieval task. It is remarkable that ACE-BERT has already been deployed in our E-commerce's search engine, leading to 1.46 revenue.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/15/2022

CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval

We introduce CommerceMM - a multimodal model capable of providing a dive...
research
01/30/2023

MAKE: Vision-Language Pre-training based Product Retrieval in Taobao Search

Taobao Search consists of two phases: the retrieval phase and the rankin...
research
07/20/2020

A Comparison of Supervised Learning to Match Methods for Product Search

The vocabulary gap is a core challenge in information retrieval (IR). In...
research
08/14/2023

AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive Learning

Multimodal contrastive learning aims to train a general-purpose feature ...
research
12/21/2021

Extending CLIP for Category-to-image Retrieval in E-commerce

E-commerce provides rich multimodal data that is barely leveraged in pra...
research
09/17/2020

Cross-Modal Alignment with Mixture Experts Neural Network for Intral-City Retail Recommendation

In this paper, we introduce Cross-modal Alignment with mixture experts N...
research
10/28/2021

A Sequence to Sequence Model for Extracting Multiple Product Name Entities from Dialog

E-commerce voice ordering systems need to recognize multiple product nam...

Please sign up or login with your details

Forgot password? Click here to reset