A Semantic Alignment System for Multilingual Query-Product Retrieval

08/05/2022
by   Qi Zhang, et al.
1

This paper mainly describes our winning solution (team name: www) to Amazon ESCI Challenge of KDD CUP 2022, which achieves a NDCG score of 0.9043 and wins the first place on task 1: the query-product ranking track. In this competition, participants are provided with a real-world large-scale multilingual shopping queries data set and it contains query-product pairs in English, Japanese and Spanish. Three different tasks are proposed in this competition, including ranking the results list as task 1, classifying the query/product pairs into Exact, Substitute, Complement, or Irrelevant (ESCI) categories as task 2 and identifying substitute products for a given query as task 3. We mainly focus on task 1 and propose a semantic alignment system for multilingual query-product retrieval. Pre-trained multilingual language models (LM) are adopted to get the semantic representation of queries and products. Our models are all trained with cross-entropy loss to classify the query-product pairs into ESCI 4 categories at first, and then we use weighted sum with the 4-class probabilities to get the score for ranking. To further boost the model, we also do elaborative data preprocessing, data augmentation by translation, specially handling English texts with English LMs, adversarial training with AWP and FGM, self distillation, pseudo labeling, label smoothing and ensemble. Finally, Our solution outperforms others both on public and private leaderboard.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2022

Shopping Queries Dataset: A Large-Scale ESCI Benchmark for Improving Product Search

Improving the quality of search results can significantly enhance users ...
research
02/14/2023

Enhancing Model Performance in Multilingual Information Retrieval with Comprehensive Data Engineering Techniques

In this paper, we present our solution to the Multilingual Information R...
research
12/30/2019

Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning

While billions of non-English speaking users rely on search engines ever...
research
08/09/2022

A Boring-yet-effective Approach for the Product Ranking Task of the Amazon KDD Cup 2022

In this work we describe our submission to the product ranking task of t...
research
06/09/2016

e-Commerce product classification: our participation at cDiscount 2015 challenge

This report describes our participation in the cDiscount 2015 challenge ...
research
11/30/2018

Cost-sensitive Learning of Deep Semantic Models for Sponsored Ad Retrieval

This paper formulates the problem of learning a neural semantic model fo...
research
05/01/2023

Contextual Multilingual Spellchecker for User Queries

Spellchecking is one of the most fundamental and widely used search feat...

Please sign up or login with your details

Forgot password? Click here to reset