Sinhala Sentence Embedding: A Two-Tiered Structure for Low-Resource Languages

10/26/2022
by   Gihan Weeraprameshwara, et al.
0

In the process of numerically modeling natural languages, developing language embeddings is a vital step. However, it is challenging to develop functional embeddings for resource-poor languages such as Sinhala, for which sufficiently large corpora, effective language parsers, and any other required resources are difficult to find. In such conditions, the exploitation of existing models to come up with an efficacious embedding methodology to numerically represent text could be quite fruitful. This paper explores the effectivity of several one-tiered and two-tiered embedding architectures in representing Sinhala text in the sentiment analysis domain. With our findings, the two-tiered embedding architecture where the lower-tier consists of a word embedding and the upper-tier consists of a sentence embedding has been proven to perform better than one-tier word embeddings, by achieving a maximum F1 score of 88.04 contrast to the 83.76 embeddings in the hyperbolic space are also developed and compared with Euclidean embeddings in terms of performance. A sentiment data set consisting of Facebook posts and associated reactions have been used for this research. To effectively compare the performance of different embedding systems, the same deep neural network structure has been trained on sentiment data with each of the embedding systems used to encode the text associated.

READ FULL TEXT
research
05/23/2018

Bilingual Sentiment Embeddings: Joint Projection of Sentiment Across Languages

Sentiment analysis in low-resource languages suffers from a lack of anno...
research
06/08/2020

CS-Embed-francesita at SemEval-2020 Task 9: The effectiveness of code-switched word embeddings for sentiment analysis

The growing popularity and applications of sentiment analysis of social ...
research
11/30/2020

Blind signal decomposition of various word embeddings based on join and individual variance explained

In recent years, natural language processing (NLP) has become one of the...
research
03/24/2021

When Word Embeddings Become Endangered

Big languages such as English and Finnish have many natural language pro...
research
11/23/2020

Advancing Humor-Focused Sentiment Analysis through Improved Contextualized Embeddings and Model Architecture

Humor is a natural and fundamental component of human interactions. When...
research
05/09/2018

LearningWord Embeddings for Low-resource Languages by PU Learning

Word embedding is a key component in many downstream applications in pro...
research
07/25/2023

Word Sense Disambiguation as a Game of Neurosymbolic Darts

Word Sense Disambiguation (WSD) is one of the hardest tasks in natural l...

Please sign up or login with your details

Forgot password? Click here to reset