COVID-19-related Nepali Tweets Classification in a Low Resource Setting

10/11/2022
by   Rabin Adhikari, et al.
16

Billions of people across the globe have been using social media platforms in their local languages to voice their opinions about the various topics related to the COVID-19 pandemic. Several organizations, including the World Health Organization, have developed automated social media analysis tools that classify COVID-19-related tweets into various topics. However, these tools that help combat the pandemic are limited to very few languages, making several countries unable to take their benefit. While multi-lingual or low-resource language-specific tools are being developed, they still need to expand their coverage, such as for the Nepali language. In this paper, we identify the eight most common COVID-19 discussion topics among the Twitter community using the Nepali language, set up an online platform to automatically gather Nepali tweets containing the COVID-19-related keywords, classify the tweets into the eight topics, and visualize the results across the period in a web-based dashboard. We compare the performance of two state-of-the-art multi-lingual language models for Nepali tweet classification, one generic (mBERT) and the other Nepali language family-specific model (MuRIL). Our results show that the models' relative performance depends on the data size, with MuRIL doing better for a larger dataset. The annotated data, models, and the web-based dashboard are open-sourced at https://github.com/naamiinepal/covid-tweet-classification.

READ FULL TEXT
research
10/31/2020

Leveraging Natural Language Processing to Mine Issues on Twitter During the COVID-19 Pandemic

The recent global outbreak of the coronavirus disease (COVID-19) has spr...
research
08/07/2020

Change-Point Analysis of Cyberbullying-Related Twitter Discussions During COVID-19

Due to the outbreak of COVID-19, users are increasingly turning to onlin...
research
09/08/2020

Covid-Transformer: Detecting COVID-19 Trending Topics on Twitter Using Universal Sentence Encoder

The novel corona-virus disease (also known as COVID-19) has led to a pan...
research
04/20/2021

Measuring Shifts in Attitudes Towards COVID-19 Measures in Belgium Using Multilingual BERT

We classify seven months' worth of Belgian COVID-related Tweets using mu...
research
03/19/2022

Multi-channel CNN to classify nepali covid-19 related tweets using hybrid features

Because of the current COVID-19 pandemic with its increasing fears among...
research
07/26/2021

IRLCov19: A Large COVID-19 Multilingual Twitter Dataset of Indian Regional Languages

Emerged in Wuhan city of China in December 2019, COVID-19 continues to s...

Please sign up or login with your details

Forgot password? Click here to reset