Improving Natural Language Inference in Arabic using Transformer Models and Linguistically Informed Pre-Training

This paper addresses the classification of Arabic text data in the field of Natural Language Processing (NLP), with a particular focus on Natural Language Inference (NLI) and Contradiction Detection (CD). Arabic is considered a resource-poor language, meaning that there are few data sets available, which leads to limited availability of NLP methods. To overcome this limitation, we create a dedicated data set from publicly available resources. Subsequently, transformer-based machine learning models are being trained and evaluated. We find that a language-specific model (AraBERT) performs competitively with state-of-the-art multilingual approaches, when we apply linguistically informed pre-training methods such as Named Entity Recognition (NER). To our knowledge, this is the first large-scale evaluation for this task in Arabic, as well as the first application of multi-task pre-training in this context.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/09/2020

Application of Pre-training Models in Named Entity Recognition

Named Entity Recognition (NER) is a fundamental Natural Language Process...
research
09/28/2022

ArNLI: Arabic Natural Language Inference for Entailment and Contradiction Detection

Natural Language Inference (NLI) is a hot topic research in natural lang...
research
07/10/2012

Arabic CALL system based on pedagogically indexed text

This article introduces the benefits of using computer as a tool for for...
research
10/19/2022

A Linguistic Investigation of Machine Learning based Contradiction Detection Models: An Empirical Analysis and Future Perspectives

We analyze two Natural Language Inference data sets with respect to thei...
research
05/11/2023

Advancing Neural Encoding of Portuguese with Transformer Albertina PT-*

To advance the neural encoding of Portuguese (PT), and a fortiori the te...
research
06/26/2023

Enriching the NArabizi Treebank: A Multifaceted Approach to Supporting an Under-Resourced Language

In this paper we address the scarcity of annotated data for NArabizi, a ...

Please sign up or login with your details

Forgot password? Click here to reset