Block-SCL: Blocking Matters for Supervised Contrastive Learning in Product Matching

07/05/2022
by   Mario Almagro, et al.
7

Product matching is a fundamental step for the global understanding of consumer behavior in e-commerce. In practice, product matching refers to the task of deciding if two product offers from different data sources (e.g. retailers) represent the same product. Standard pipelines use a previous stage called blocking, where for a given product offer a set of potential matching candidates are retrieved based on similar characteristics (e.g. same brand, category, flavor, etc.). From these similar product candidates, those that are not a match can be considered hard negatives. We present Block-SCL, a strategy that uses the blocking output to make the most of Supervised Contrastive Learning (SCL). Concretely, Block-SCL builds enriched batches using the hard-negatives samples obtained in the blocking stage. These batches provide a strong training signal leading the model to learn more meaningful sentence embeddings for product matching. Experimental results in several public datasets demonstrate that Block-SCL achieves state-of-the-art results despite only using short product titles as input, no data augmentation, and a lighter transformer backbone than competing methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2022

Supervised Contrastive Learning for Product Matching

Contrastive learning has seen increasing success in the fields of comput...
research
03/06/2023

SC-Block: Supervised Contrastive Blocking within Entity Resolution Pipelines

The goal of entity resolution is to identify records in multiple dataset...
research
05/09/2023

Consistent Text Categorization using Data Augmentation in e-Commerce

The categorization of massive e-Commerce data is a crucial, well-studied...
research
03/01/2022

Two-Level Supervised Contrastive Learning for Response Selection in Multi-Turn Dialogue

Selecting an appropriate response from many candidates given the utteran...
research
04/08/2021

Deep Indexed Active Learning for Matching Heterogeneous Entity Representations

Given two large lists of records, the task in entity resolution (ER) is ...
research
05/31/2018

Skyblocking for Entity Resolution

In this paper, for the first time, we introduce the concept of skyblocki...
research
09/24/2018

An Empirical Study of the I2P Anonymity Network and its Censorship Resistance

Tor and I2P are well-known anonymity networks used by many individuals t...

Please sign up or login with your details

Forgot password? Click here to reset