ProMap: Datasets for Product Mapping in E-commerce

09/13/2023
by   Kateřina Macková, et al.
0

The goal of product mapping is to decide, whether two listings from two different e-shops describe the same products. Existing datasets of matching and non-matching pairs of products, however, often suffer from incomplete product information or contain only very distant non-matching products. Therefore, while predictive models trained on these datasets achieve good results on them, in practice, they are unusable as they cannot distinguish very similar but non-matching pairs of products. This paper introduces two new datasets for product mapping: ProMapCz consisting of 1,495 Czech product pairs and ProMapEn consisting of 1,555 English product pairs of matching and non-matching products manually scraped from two pairs of e-shops. The datasets contain both images and textual descriptions of the products, including their specifications, making them one of the most complete datasets for product mapping. Additionally, the non-matching products were selected in two phases, creating two types of non-matches – close non-matches and medium non-matches. Even the medium non-matches are pairs of products that are much more similar than non-matches in other datasets – for example, they still need to have the same brand and similar name and price. After simple data preprocessing, several machine learning algorithms were trained on these and two the other datasets to demonstrate the complexity and completeness of ProMap datasets. ProMap datasets are presented as a golden standard for further research of product mapping filling the gaps in existing ones.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2022

Multilingual Transformers for Product Matching – Experiments and a New Benchmark in Polish

Product matching corresponds to the task of matching identical products ...
research
02/10/2023

Unified Vision-Language Representation Modeling for E-Commerce Same-Style Products Retrieval

Same-style products retrieval plays an important role in e-commerce plat...
research
03/07/2019

A Clustering-Based Combinatorial Approach to Unsupervised Matching of Product Titles

The constant growth of the e-commerce industry has rendered the problem ...
research
10/07/2021

Cross-Language Learning for Entity Matching

Transformer-based matching methods have significantly moved the state-of...
research
04/01/2022

Unitail: Detecting, Reading, and Matching in Retail Scene

To make full use of computer vision technology in stores, it is required...
research
01/19/2023

Unposed: Unsupervised Pose Estimation based Product Image Recommendations

Product images are the most impressing medium of customer interaction on...
research
06/20/2017

A Bayesian algorithm for detecting identity matches and fraud in image databases

A statistical algorithm for categorizing different types of matches and ...

Please sign up or login with your details

Forgot password? Click here to reset