Recent improvements in information retrieval, mainly due to pretrained transformer models, opened up the possibility of improving search in various domains (Jin et al., 2020; MacAvaney et al., 2020; Bi et al., 2020; Choi et al., 2020; Lin et al., 2021; Pradeep et al., 2021; Nogueira et al., 2020a; Ma et al., 2021). Among such domains, e-commerce search receives special attention by the industry as improvements in search quality often lead to increases in revenue.
In this work, we detail our submission to the Amazon KDD Cup 2022, whose goal is to evaluate ranking methods that can be used to improve the customer experience when searching for products.
2. Related Work
Our solution is based on the monoT5 model, that demonstrated strong effectiveness in various passage ranking tasks in different domains. We qualify our method as “boring”, since it is well known in the recent IR literature that models with more parameters can outperform smaller ones with task-specific adaptations. For example, Nogueira et al. (2020b) used the model to achieve state-of-the-art results on TREC 2004 Robust Track (Voorhees, 2004) while Pradeep et al. (2020) used the same model, finetuned only on MS MARCO, to achieve the best or second best performance on medical domain ranking datasets, such as Precision Medicine (Roberts et al., 2019) and TREC-COVID (Zhang et al., 2020). In addition, Rosa et al. (Rosa et al., 2021, 2022b) used large versions of monoT5 to reach the state of the art in a legal domain entailment task in the COLIEE competition (Rabelo et al., 2021; Kim et al., 2022). Furthermore, Rosa et al. (2022a) showed that the 3 billion-parameter variant of monoT5 achieves the state of the art in 12 out of 18 datasets of the Benchmark-IR (BEIR) (Thakur et al., 2021), which consists of datasets from different domains such as web, biomedical, scientific, financial and news.
In this section, we describe mMonoT5, a multilingual variant of monoT5 (Nogueira et al., 2020b), which is an adaptation of the T5 model (Raffel et al., 2020) for the passage ranking task. We first finetune a multilingual T5 model (Xue et al., 2021) on the mMARCO dataset (Bonifacio et al., 2021), which is the translated version of MS MARCO (Bajaj et al., 2018) in 9 languages. The model is trained to generate a “yes” or “no” token depending on the relevance of a document to a query.
mMonoT5 uses the following input template:
where represents a query and represents a document that may or may not be relevant to the given query.
During inference, the model receives the same input prompt and estimates a scorethat quantifies the relevance of a document to a query
After computing all scores for a given query, we rank then with respect to their scores.
After finetuning on mMARCO, we further finetuned the model on the training data of tasks 1 and 2 of the competition. We use the Beautiful Soup library to clean any remaining HTML tags that may appear in the product. Products are presented to the model as the concatenation of the fields product_title, product_description, product_bullet_point, product_brand and product_color_name, joined by whitespaces.
During the competition we observed that using task 2 training data improved the model substantially. Hence, we used task 1 and 2 training data by transforming the labeled data classes to “true” if ‘exact’ and all other classes as “false”. We use these tokens instead of “yes” and “no”, used by the original mMonoT5. We trained the model for 5 epochs, which takes about 72 hours in a TPU v3, using batches of 128 and maximum sequence length of 512 tokens.
We show our results in Table 1. Our best model achieved an nDCG@20 of 0.9012 and 0.9007 on the public and private test sets, respectively, placing us in the ninth place on the leaderboard and only 0.0036 behind the first position.
Initially, we used the mMonoT5 base, with 580M parameters, finetuned on mMarco data to test the model’s zero-shot capability. This model achieves an nDCG@20 of 0.864. Then we further finetuned it on the training data of the competition, which results in a nDCG@20 of 0.89, which later, the 3.7B parameter version surpassed by 0.0112 points. We also tried translating the corpus and queries into English and using the monoT5-3B (English-only) finetuned on the competition data, but it could not out-do its multilingual counterpart.
|monoT5-3B (dataset translated to En)||0.8750||-|
|mMonoT5-580M (mMARCO only)||0.8640||-|
|mMonoT5-3.7B (our best submission)||0.9012||0.9007|
|First place (team www)||0.9057||0.9043|
|20th place (team we666)||0.8933||0.8929|
We described a boring but effective approach based on the multilingual variation of monoT5 that achieved competitive results in the product ranking task of the Amazon KDD Cup 2022.
- MS marco: a human generated machine reading comprehension dataset. arXiv preprint arXiv:1611.09268. Cited by: §3.
- A transformer-based embedding model for personalized product search. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1521–1524. Cited by: §1.
- MMARCO: A multilingual version of MS MARCO passage ranking dataset. CoRR abs/2108.13897. External Links: Cited by: §3.
- Semantic product search for matching structured product catalogs in e-commerce. arXiv preprint arXiv:2008.08180. Cited by: §1.
Aliababa damo academy at trec precision medicine 2020: state-of-the-art evidence retriever for precision medicine with expert-in-the-loop active learning. In TREC, Cited by: §1.
- COLIEE 2022 summary: methods for legal document retrieval and entailment. Proceedings of the Sixteenth International Workshop on Juris-informatics (JURISIN 2022). Cited by: §2.
- Pretrained transformers for text ranking: bert and beyond. Synthesis Lectures on Human Language Technologies 14 (4), pp. 1–325. Cited by: §1.
- Retrieving legal cases from a large-scale candidate corpus. Proceedings of the Eighth International Competition on Legal Information Extraction/Entailment, COLIEE2021. Cited by: §1.
SLEDGE-z: a zero-shot baseline for covid-19 literature search.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 4171–4179. Cited by: §1.
- Navigation-based candidate expansion and pretrained language models for citation recommendation. Scientometrics 125 (3), pp. 3001–3016. Cited by: §1.
- Document ranking with a pretrained sequence-to-sequence model. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pp. 708–718. Cited by: §2, §3.
- Vera: prediction techniques for reducing harmful misinformation in consumer health search. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2066–2070. Cited by: §1.
H2oloo at trec 2020: when all you got is a hammer… deep learning, health misinformation, and precision medicine. Corpus 5 (d3), pp. d2. Cited by: §2.
- Summary of the competition on legal information extraction/entailment (coliee) 2021. Proceedings of the Eighth International Competition on Legal Information Extraction/Entailment. Cited by: §2.
Exploring the limits of transfer learning with a unified text-to-text transformer.
Journal of Machine Learning Research21 (140), pp. 1–67. External Links: Cited by: §3.
- Overview of the trec 2019 precision medicine track. The … text REtrieval conference : TREC. Text REtrieval Conference 26. Cited by: §2.
- No parameter left behind: how distillation and model size affect zero-shot retrieval. arXiv preprint arXiv:2206.02873. Cited by: §2.
- Billions of parameters are worth more than in-domain training data: a case study in the legal case entailment task. arXiv preprint arXiv:2205.15172. Cited by: §2.
To tune or not to tune? zero-shot models for legal case entailment.
ICAIL’21, Eighteenth International Conference on Artificial Intelligence and Law, June 21–25, 2021, São Paulo, Brazil. Cited by: §2.
- BEIR: a heterogenous benchmark for zero-shot evaluation of information retrieval models. arXiv preprint arXiv:2104.08663. External Links: Cited by: §2.
- Overview of the trec 2004 robust track. Proceedings of the Thirteenth Text REtrieval Conference, TREC 2004, Gaithersburg, Maryland, November 16-19, 2004. Cited by: §2.
- MT5: a massively multilingual pre-trained text-to-text transformer. External Links: Cited by: §3.
- Rapidly deploying a neural search engine for the covid-19 open research dataset. In Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020, Cited by: §2.