An Automated Text Categorization Framework based on Hyperparameter Optimization

04/06/2017
by   Eric S. Tellez, et al.
0

A great variety of text tasks such as topic or spam identification, user profiling, and sentiment analysis can be posed as a supervised learning problem and tackle using a text classifier. A text classifier consists of several subprocesses, some of them are general enough to be applied to any supervised learning problem, whereas others are specifically designed to tackle a particular task, using complex and computational expensive processes such as lemmatization, syntactic analysis, etc. Contrary to traditional approaches, we propose a minimalistic and wide system able to tackle text classification tasks independent of domain and language, namely microTC. It is composed by some easy to implement text transformations, text representations, and a supervised learning algorithm. These pieces produce a competitive classifier even in the domain of informally written text. We provide a detailed description of microTC along with an extensive experimental comparison with relevant state-of-the-art methods. mircoTC was compared on 30 different datasets. Regarding accuracy, microTC obtained the best performance in 20 datasets while achieves competitive results in the remaining 10. The compared datasets include several problems like topic and polarity classification, spam detection, user profiling and authorship attribution. Furthermore, it is important to state that our approach allows the usage of the technology even without knowledge of machine learning and natural language processing.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/09/2020

Bangla Text Classification using Transformers

Text classification has been one of the earliest problems in NLP. Over t...
research
10/13/2020

Language Networks: a Practical Approach

This manuscript provides a short and practical introduction to the topic...
research
11/29/2018

EvoMSA: A Multilingual Evolutionary Approach for Sentiment Analysis

Sentiment analysis (SA) is a task related to understanding people's feel...
research
06/21/2019

Meta-learning of textual representations

Recent progress in AutoML has lead to state-of-the-art methods (e.g., Au...
research
08/03/2023

Tag Prediction of Competitive Programming Problems using Deep Learning Techniques

In the past decade, the amount of research being done in the fields of m...
research
08/31/2018

A Supervised Learning Approach For Heading Detection

As the Portable Document Format (PDF) file format increases in popularit...
research
02/14/2018

Authorship Attribution Using the Chaos Game Representation

The Chaos Game Representation, a method for creating images from nucleot...

Please sign up or login with your details

Forgot password? Click here to reset