Towards Semantic Noise Cleansing of Categorical Data based on Semantic Infusion

02/06/2020
by   Rishabh Gupta, et al.
3

Semantic Noise affects text analytics activities for the domain-specific industries significantly. It impedes the text understanding which holds prime importance in the critical decision making tasks. In this work, we formalize semantic noise as a sequence of terms that do not contribute to the narrative of the text. We look beyond the notion of standard statistically-based stop words and consider the semantics of terms to exclude the semantic noise. We present a novel Semantic Infusion technique to associate meta-data with the categorical corpus text and demonstrate its near-lossless nature. Based on this technique, we propose an unsupervised text-preprocessing framework to filter the semantic noise using the context of the terms. Later we present the evaluation results of the proposed framework using a web forum dataset from the automobile-domain.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/18/2020

Accelerating Text Mining Using Domain-Specific Stop Word Lists

Text preprocessing is an essential step in text mining. Removing words t...
research
05/15/2022

From Cognitive to Computational Modeling: Text-based Risky Decision-Making Guided by Fuzzy Trace Theory

Understanding, modelling and predicting human risky decision-making is c...
research
12/07/2010

A study on the relation between linguistics-oriented and domain-specific semantics

In this paper we dealt with the comparison and linking between lexical r...
research
10/09/2020

Top-Rank-Focused Adaptive Vote Collection for the Evaluation of Domain-Specific Semantic Models

The growth of domain-specific applications of semantic models, boosted b...
research
10/03/2020

Integrating Categorical Semantics into Unsupervised Domain Translation

While unsupervised domain translation (UDT) has seen a lot of success re...
research
08/24/2019

DAST Model: Deciding About Semantic Complexity of a Text

Measuring of text complexity is a needed task in several domains and app...
research
06/18/2019

Mimicking Human Process: Text Representation via Latent Semantic Clustering for Classification

Considering that words with different characteristic in the text have di...

Please sign up or login with your details

Forgot password? Click here to reset