The Power of Communities: A Text Classification Model with Automated Labeling Process Using Network Community Detection

09/25/2019
by   Minjun Kim, et al.
1

The text classification is one of the most critical areas in machine learning and artificial intelligence research. It has been actively adopted in many business applications such as conversational intelligence systems, news articles categorizations, sentiment analysis, emotion detection systems, and many other recommendation systems in our daily life. One of the problems in supervised text classification models is that the models performance depend heavily on the quality of data labeling that are typically done by humans. In this study, we propose a new network community detection-based approach to automatically label and classify text data into multiclass value spaces. Specifically, we build a network with sentences as the network nodes and pairwise cosine similarities between TFIDF vector representations of the sentences as the network link weights. We use the Louvain method to detect the communities in the sentence network. We train and test Support vector machine and Random forest models on both the human labeled data and network community detection labeled data. Results showed that models with the data labeled by network community detection outperformed the models with the human-labeled data by 2.68-3.75 more accurate conversational intelligence system and other text classification systems.

READ FULL TEXT
research
09/12/2020

Improving Indonesian Text Classification Using Multilingual Language Model

Compared to English, the amount of labeled data for Indonesian text clas...
research
02/05/2022

Improving Probabilistic Models in Text Classification via Active Learning

When using text data, social scientists often classify documents in orde...
research
03/10/2021

An Amharic News Text classification Dataset

In NLP, text classification is one of the primary problems we try to sol...
research
05/04/2023

Enhancing Pashto Text Classification using Language Processing Techniques for Single And Multi-Label Analysis

Text classification has become a crucial task in various fields, leading...
research
12/26/2019

Text Classification for Azerbaijani Language Using Machine Learning and Embedding

Text classification systems will help to solve the text clustering probl...
research
11/22/2019

Classifying Vietnamese Disease Outbreak Reports with Important Sentences and Rich Features

Text classification is an important field of research from mid 90s up to...
research
08/11/2023

Weakly Supervised Text Classification on Free Text Comments in Patient-Reported Outcome Measures

Free text comments (FTC) in patient-reported outcome measures (PROMs) da...

Please sign up or login with your details

Forgot password? Click here to reset