Corpus Statistics in Text Classification of Online Data

03/16/2018
by   Marina Sokolova, et al.
0

Transformation of Machine Learning (ML) from a boutique science to a generally accepted technology has increased importance of reproduction and transportability of ML studies. In the current work, we investigate how corpus characteristics of textual data sets correspond to text classification results. We work with two data sets gathered from sub-forums of an online health-related forum. Our empirical results are obtained for a multi-class sentiment analysis application.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2023

Quantum Text Classifier – A Synchronistic Approach Towards Classical and Quantum Machine Learning

Although it will be a while before a practical quantum computer is avail...
research
10/21/2020

Quasi Error-free Text Classification and Authorship Recognition in a large Corpus of English Literature based on a Novel Feature Set

The Gutenberg Literary English Corpus (GLEC) provides a rich source of t...
research
01/04/2022

Survey on the Convergence of Machine Learning and Blockchain

Machine learning (ML) has been pervasively researched nowadays and it ha...
research
11/02/2018

Comparison of Classification Algorithms Used Medical Documents Categorization

Volume of text based documents have been increasing day by day. Medical ...
research
09/06/2019

Understanding the Impact of Text Highlighting in Crowdsourcing Tasks

Text classification is one of the most common goals of machine learning ...
research
08/29/2018

Centroid estimation based on symmetric KL divergence for Multinomial text classification problem

We define a new method to estimate centroid for text classification base...
research
01/30/2020

Better Multi-class Probability Estimates for Small Data Sets

Many classification applications require accurate probability estimates ...

Please sign up or login with your details

Forgot password? Click here to reset