OpenTag: Open Attribute Value Extraction from Product Profiles

06/01/2018
by   Guineng Zheng, et al.
0

Extraction of missing attribute values is to find values describing an attribute of interest from a free text input. Most past related work on extraction of missing attribute values work with a closed world assumption with the possible set of values known beforehand, or use dictionaries of values and hand-crafted features. How can we discover new attribute values that we have never seen before? Can we do this with limited human annotation or supervision? We study this problem in the context of product catalogs that often have missing values for many attributes of interest. In this work, we leverage product profile information such as titles and descriptions to discover missing values of product attributes. We develop a novel deep tagging model OpenTag for this extraction problem with the following contributions: (1) we formalize the problem as a sequence tagging task, and propose a joint model exploiting recurrent neural networks (specifically, bidirectional LSTM) to capture context and semantics, and Conditional Random Fields (CRF) to enforce tagging consistency, (2) we develop a novel attention mechanism to provide interpretable explanation for our model's decisions, (3) we propose a novel sampling strategy exploring active learning to reduce the burden of human annotation. OpenTag does not use any dictionary or hand-crafted features as in prior works. Extensive experiments in real-life datasets in different domains show that OpenTag with our active learning strategy discovers new attribute values from as few as 150 annotated samples (reduction in 3.3x amount of annotation effort) with a high F-score of 83 state-of-the-art models.

READ FULL TEXT
research
04/19/2021

LaTeX-Numeric: Language-agnostic Text attribute eXtraction for E-commerce Numeric Attributes

In this paper, we present LaTeX-Numeric - a high-precision fully-automat...
research
08/15/2022

Exploring Generative Models for Joint Attribute Value Extraction from Product Titles

Attribute values of the products are an essential component in any e-com...
research
03/29/2018

Deep Recurrent Neural Networks for Product Attribute Extraction in eCommerce

Extracting accurate attribute qualities from product titles is a vital c...
research
04/29/2022

OA-Mine: Open-World Attribute Mining for E-Commerce Products with Weak Supervision

Automatic extraction of product attributes from their textual descriptio...
research
08/15/2016

Attribute Extraction from Product Titles in eCommerce

This paper presents a named entity extraction system for detecting attri...
research
06/15/2020

Automatic Validation of Textual Attribute Values in E-commerce Catalog by Learning with Limited Labeled Data

Product catalogs are valuable resources for eCommerce website. In the ca...
research
03/09/2017

Information Extraction in Illicit Domains

Extracting useful entities and attribute values from illicit domains suc...

Please sign up or login with your details

Forgot password? Click here to reset