DiaLex: A Benchmark for Evaluating Multidialectal Arabic Word Embeddings

11/22/2020
by   Muhammad Abdul-Mageed, et al.
0

Word embeddings are a core component of modern natural language processing systems, making the ability to thoroughly evaluate them a vital task. We describe DiaLex, a benchmark for intrinsic evaluation of dialectal Arabic word embedding. DiaLex covers five important Arabic dialects: Algerian, Egyptian, Lebanese, Syrian, and Tunisian. Across these dialects, DiaLex provides a testbank for six syntactic and semantic relations, namely male to female, singular to dual, singular to plural, antonym, comparative, and genitive to past tense. DiaLex thus consists of a collection of word pairs representing each of the six relations in each of the five dialects. To demonstrate the utility of DiaLex, we use it to evaluate a set of existing and new Arabic word embeddings that we developed. Our benchmark, evaluation code, and new word embedding models will be publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/20/2017

Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks

Word embeddings have been found to provide meaningful representations fo...
research
05/08/2023

ANALOGICAL - A New Benchmark for Analogy of Long Text for Large Language Models

Over the past decade, analogies, in the form of word-level analogies, ha...
research
08/21/2018

Downsampling Strategies are Crucial for Word Embedding Reliability

The reliability of word embeddings algorithms, i.e., their ability to pr...
research
09/09/2017

Semi-Supervised Instance Population of an Ontology using Word Vector Embeddings

In many modern day systems such as information extraction and knowledge ...
research
11/03/2020

AraWEAT: Multidimensional Analysis of Biases in Arabic Word Embeddings

Recent work has shown that distributional word vector spaces often encod...
research
09/24/2020

CogniFNN: A Fuzzy Neural Network Framework for Cognitive Word Embedding Evaluation

Word embeddings can reflect the semantic representations, and the embedd...
research
10/11/2019

Evaluating Semantic Representations of Source Code

Learned representations of source code enable various software developer...

Please sign up or login with your details

Forgot password? Click here to reset