LaTeX-Numeric: Language-agnostic Text attribute eXtraction for E-commerce Numeric Attributes

04/19/2021
by   Kartik Mehta, et al.
0

In this paper, we present LaTeX-Numeric - a high-precision fully-automated scalable framework for extracting E-commerce numeric attributes from product text like product description. Most of the past work on attribute extraction is not scalable as they rely on manually curated training data, either with or without the use of active learning. We rely on distant supervision for training data generation, removing dependency on manual labels. One issue with distant supervision is that it leads to incomplete training annotation due to missing attribute values while matching. We propose a multi-task learning architecture to deal with missing labels in the training data, leading to F1 improvement of 9.2 architecture benefits both numeric and non-numeric attributes, we present automated techniques to further improve the numeric attributes extraction models. Numeric attributes require a list of units (or aliases) for better matching with distant supervision. We propose an automated algorithm for alias creation using product text and attribute values, leading to a 20.2 improvement. Extensive experiments on real world dataset for 20 numeric attributes across 5 product categories and 3 English marketplaces show that LaTeX-Numeric achieves a high F1-score, without any manual intervention, making it suitable for practical applications. Finally, we show that the improvements are language-agnostic and LaTeX-Numeric achieves 13.9 Romance languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2023

Large Scale Generative Multimodal Attribute Extraction for E-commerce Attributes

E-commerce websites (e.g. Amazon) have a plethora of structured and unst...
research
06/01/2018

OpenTag: Open Attribute Value Extraction from Product Profiles

Extraction of missing attribute values is to find values describing an a...
research
06/12/2021

Scalable Approach for Normalizing E-commerce Text Attributes (SANTA)

In this paper, we present SANTA, a scalable framework to automatically n...
research
05/26/2023

Towards Open-World Product Attribute Mining: A Lightly-Supervised Approach

We present a new task setting for attribute mining on e-commerce product...
research
04/29/2022

OA-Mine: Open-World Attribute Mining for E-Commerce Products with Weak Supervision

Automatic extraction of product attributes from their textual descriptio...
research
06/28/2022

Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product Attribute Extraction

A key challenge in attribute value extraction (AVE) from e-commerce site...
research
09/16/2021

Efficient Attribute Injection for Pretrained Language Models

Metadata attributes (e.g., user and product IDs from reviews) can be inc...

Please sign up or login with your details

Forgot password? Click here to reset