A Comparison of Synthetic Oversampling Methods for Multi-class Text Classification

08/11/2020
by   Anna Glazkova, et al.
0

The authors compared oversampling methods for the problem of multi-class topic classification. The SMOTE algorithm underlies one of the most popular oversampling methods. It consists in choosing two examples of a minority class and generating a new example based on them. In the paper, the authors compared the basic SMOTE method with its two modifications (Borderline SMOTE and ADASYN) and random oversampling technique on the example of one of text classification tasks. The paper discusses the k-nearest neighbor algorithm, the support vector machine algorithm and three types of neural networks (feedforward network, long short-term memory (LSTM) and bidirectional LSTM). The authors combine these machine learning algorithms with different text representations and compared synthetic oversampling methods. In most cases, the use of oversampling techniques can significantly improve the quality of classification. The authors conclude that for this task, the quality of the KNN and SVM algorithms is more influenced by class imbalance than neural networks.

READ FULL TEXT
research
03/25/2020

Adversarial Multi-Binary Neural Network for Multi-class Classification

Multi-class text classification is one of the key problems in machine le...
research
10/10/2022

Predicting Blossom Date of Cherry Tree With Support Vector Machine and Recurrent Neural Network

Our project probes the relationship between temperatures and the blossom...
research
07/03/2022

Job Offers Classifier using Neural Networks and Oversampling Methods

Both policy and research benefit from a better understanding of individu...
research
12/01/2022

Inference of Media Bias and Content Quality Using Natural-Language Processing

Media bias can significantly impact the formation and development of opi...
research
01/18/2021

Classification of Pedagogical content using conventional machine learning and deep learning model

The advent of the Internet and a large number of digital technologies ha...
research
11/07/2016

AC-BLSTM: Asymmetric Convolutional Bidirectional LSTM Networks for Text Classification

Recently deeplearning models have been shown to be capable of making rem...
research
07/17/2020

Training with reduced precision of a support vector machine model for text classification

This paper presents the impact of using quantization on the efficiency o...

Please sign up or login with your details

Forgot password? Click here to reset