Text classification dataset and analysis for Uzbek language

02/28/2023
by   Elmurod Kuriyozov, et al.
0

Text classification is an important task in Natural Language Processing (NLP), where the goal is to categorize text data into predefined classes. In this study, we analyse the dataset creation steps and evaluation techniques of multi-label news categorisation task as part of text classification. We first present a newly obtained dataset for Uzbek text classification, which was collected from 10 different news and press websites and covers 15 categories of news, press and law texts. We also present a comprehensive evaluation of different models, ranging from traditional bag-of-words models to deep learning architectures, on this newly created dataset. Our experiments show that the Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN) based models outperform the rule-based models. The best performance is achieved by the BERTbek model, which is a transformer-based BERT model trained on the Uzbek corpus. Our findings provide a good baseline for further research in Uzbek text classification.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2018

TextZoo, a New Benchmark for Reconsidering Text Classification

Text representation is a fundamental concern in Natural Language Process...
research
12/13/2021

Khmer Text Classification Using Word Embedding and Neural Networks

Text classification is one of the fundamental tasks in natural language ...
research
01/24/2022

Classification Of Fake News Headline Based On Neural Networks

Over the last few years, Text classification is one of the fundamental t...
research
07/07/2021

Hierarchical Text Classification of Urdu News using Deep Neural Network

Digital text is increasing day by day on the internet. It is very challe...
research
10/24/2020

Large Scale Legal Text Classification Using Transformer Models

Large multi-label text classification is a challenging Natural Language ...
research
08/11/2023

Identification of the Relevance of Comments in Codes Using Bag of Words and Transformer Based Models

The Forum for Information Retrieval (FIRE) started a shared task this ye...
research
07/17/2018

Clinical Text Classification with Rule-based Features and Knowledge-guided Convolutional Neural Networks

Clinical text classification is an important problem in medical natural ...

Please sign up or login with your details

Forgot password? Click here to reset