Hierarchical Text Classification of Urdu News using Deep Neural Network

07/07/2021
by   Taimoor Ahmed Javed, et al.
0

Digital text is increasing day by day on the internet. It is very challenging to classify a large and heterogeneous collection of data, which require improved information processing methods to organize text. To classify large size of corpus, one common approach is to use hierarchical text classification, which aims to classify textual data in a hierarchical structure. Several approaches have been proposed to tackle classification of text but most of the research has been done on English language. This paper proposes a deep learning model for hierarchical text classification of news in Urdu language - consisting of 51,325 sentences from 8 online news websites belonging to the following genres: Sports; Technology; and Entertainment. The objectives of this paper are twofold: (1) to develop a large human-annotated dataset of news in Urdu language for hierarchical text classification; and (2) to classify Urdu news hierarchically using our proposed model based on LSTM mechanism named as Hierarchical Multi-layer LSTMs (HMLSTM). Our model consists of two modules: Text Representing Layer, for obtaining text representation in which we use Word2vec embedding to transform the words to vector and Urdu Hierarchical LSTM Layer (UHLSTML) an end-to-end fully connected deep LSTMs network to perform automatic feature learning, we train one LSTM layer for each level of the class hierarchy. We have performed extensive experiments on our self created dataset named as Urdu News Dataset for Hierarchical Text Classification (UNDHTC). The result shows that our proposed method is very effective for hierarchical text classification and it outperforms baseline methods significantly and also achieved good results as compare to deep neural model.

READ FULL TEXT

page 1

page 20

research
02/28/2023

Text classification dataset and analysis for Uzbek language

Text classification is an important task in Natural Language Processing ...
research
05/05/2020

Efficient strategies for hierarchical text classification: External knowledge and auxiliary tasks

In hierarchical text classification, we perform a sequence of inference ...
research
04/06/2020

A Hierarchical Fine-Tuning Approach Based on Joint Embedding of Words and Parent Categories for Hierarchical Multi-label Text Classification

Many important classification problems in real world consist of a large ...
research
03/04/2020

SeMemNN: A Semantic Matrix-Based Memory Neural Network for Text Classification

Text categorization is the task of assigning labels to documents written...
research
06/24/2022

A multi-model-based deep learning framework for short text multiclass classification with the imbalanced and extremely small data set

Text classification plays an important role in many practical applicatio...
research
12/30/2021

RheFrameDetect: A Text Classification System for Automatic Detection of Rhetorical Frames in AI from Open Sources

Rhetorical Frames in AI can be thought of as expressions that describe A...
research
01/23/2023

SMDDH: Singleton Mention detection using Deep Learning in Hindi Text

Mention detection is an important component of coreference resolution sy...

Please sign up or login with your details

Forgot password? Click here to reset