Unsupervised domain-agnostic identification of product names in social media posts

12/11/2018
by   Nicolai Pogrebnyakov, et al.
0

Product name recognition is a significant practical problem, spurred by the greater availability of platforms for discussing products such as social media and product review functionalities of online marketplaces. Customers, product manufacturers and online marketplaces may want to identify product names in unstructured text to extract important insights, such as sentiment, surrounding a product. Much extant research on product name identification has been domain-specific (e.g., identifying mobile phone models) and used supervised or semi-supervised methods. With massive numbers of new products released to the market every year such methods may require retraining on updated labeled data to stay relevant, and may transfer poorly across domains. This research addresses this challenge and develops a domain-agnostic, unsupervised algorithm for identifying product names based on Facebook posts. The algorithm consists of two general steps: (a) candidate product name identification using an off-the-shelf pretrained conditional random fields (CRF) model, part-of-speech tagging and a set of simple patterns; and (b) filtering of candidate names to remove spurious entries using clustering and word embeddings generated from the data.

READ FULL TEXT
research
10/31/2019

Dreaddit: A Reddit Dataset for Stress Analysis in Social Media

Stress is a nigh-universal human experience, particularly in the online ...
research
10/11/2021

Spatial Data Mining of Public Transport Incidents reported in Social Media

Public transport agencies use social media as an essential tool for comm...
research
09/24/2019

Deep Text Mining of Instagram Data Without Strong Supervision

With the advent of social media, our online feeds increasingly consist o...
research
10/31/2016

Mining Social Media for Open Innovation in Transportation Systems

This work proposes a novel framework for the development of new products...
research
04/04/2019

Text Classification Components for Detecting Descriptions and Names of CAD models

We apply text analysis approaches for a specialized search engine for 3D...
research
04/01/2016

Semi-supervised and Unsupervised Methods for Categorizing Posts in Web Discussion Forums

Web discussion forums are used by millions of people worldwide to share ...

Please sign up or login with your details

Forgot password? Click here to reset