An Attention Ensemble Approach for Efficient Text Classification of Indian Languages

02/20/2021
by   Atharva Kulkarni, et al.
0

The recent surge of complex attention-based deep learning architectures has led to extraordinary results in various downstream NLP tasks in the English language. However, such research for resource-constrained and morphologically rich Indian vernacular languages has been relatively limited. This paper proffers team SPPU_AKAH's solution for the TechDOfication 2020 subtask-1f: which focuses on the coarse-grained technical domain identification of short text documents in Marathi, a Devanagari script-based Indian language. Availing the large dataset at hand, a hybrid CNN-BiLSTM attention ensemble model is proposed that competently combines the intermediate sentence representations generated by the convolutional neural network and the bidirectional long short-term memory, leading to efficient text classification. Experimental results show that the proposed model outperforms various baseline machine learning and deep learning models in the given task, giving the best validation accuracy of 89.57% and f1-score of 0.8875. Furthermore, the solution resulted in the best system submission for this subtask, giving a test accuracy of 64.26% and f1-score of 0.6157, transcending the performances of other teams as well as the baseline system given by the organizers of the shared task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/21/2020

TechTexC: Classification of Technical Texts using Convolution and Bidirectional Long Short Term Memory Network

This paper illustrates the details description of technical text classif...
research
12/26/2020

Predicting Organizational Cybersecurity Risk: A Deep Learning Approach

Cyberattacks conducted by malicious hackers cause irreparable damage to ...
research
06/21/2022

muBoost: An Effective Method for Solving Indic Multilingual Text Classification Problem

Text Classification is an integral part of many Natural Language Process...
research
08/02/2020

Efficient Urdu Caption Generation using Attention based LSTMs

Recent advancements in deep learning has created a lot of opportunities ...
research
05/26/2021

Joint Optimization of Tokenization and Downstream Model

Since traditional tokenizers are isolated from a downstream task and mod...
research
06/10/2018

Cross-Lingual Task-Specific Representation Learning for Text Classification in Resource Poor Languages

Neural network models have shown promising results for text classificati...
research
03/28/2022

UTSA NLP at SemEval-2022 Task 4: An Exploration of Simple Ensembles of Transformers, Convolutional, and Recurrent Neural Networks

The act of appearing kind or helpful via the use of but having a feeling...

Please sign up or login with your details

Forgot password? Click here to reset