A Clustering-Based Combinatorial Approach to Unsupervised Matching of Product Titles

by   Leonidas Akritidis, et al.

The constant growth of the e-commerce industry has rendered the problem of product retrieval particularly important. As more enterprises move their activities on the Web, the volume and the diversity of the product-related information increase quickly. These factors make it difficult for the users to identify and compare the features of their desired products. Recent studies proved that the standard similarity metrics cannot effectively identify identical products, since similar titles often refer to different products and vice-versa. Other studies employed external data sources (search engines) to enrich the titles; these solutions are rather impractical mainly because the external data fetching is slow. In this paper we introduce UPM, an unsupervised algorithm for matching products by their titles. UPM is independent of any external sources, since it analyzes the titles and extracts combinations of words out of them. These combinations are evaluated according to several criteria, and the most appropriate of them constitutes the cluster where a product is classified into. UPM is also parameter-free, it avoids product pairwise comparisons, and includes a post-processing verification stage which corrects the erroneous matches. The experimental evaluation of UPM demonstrated its superiority against the state-of-the-art approaches in terms of both efficiency and effectiveness.


Improving Usability of User Centric Decision Making of Multi-Attribute Products on E-commerce Websites

The high number of products available makes it difficult for a user to f...

Semantic Product Search for Matching Structured Product Catalogs in E-Commerce

Retrieving all semantically relevant products from the product catalog i...

Identifying Substitute and Complementary Products for Assortment Optimization with Cleora Embeddings

Recent years brought an increasing interest in the application of machin...

Exploiting Knowledge Graphs for Facilitating Product/Service Discovery

Most of the existing techniques to product discovery rely on syntactic a...

Dropping diversity of products of large US firms: Models and measures

It is widely assumed that in our lifetimes the products available in the...

An Integrated System of Drug Matching and Abnormal Approval Number Correction

This essay is based on the joint project with 111, Inc. The pharmacy e-C...

Interpretable Methods for Identifying Product Variants

For e-commerce companies with large product selections, the organization...