Deep Contextualized Pairwise Semantic Similarity for Arabic Language Questions

09/19/2019
by   Hesham Al-Bataineh, et al.
0

Question semantic similarity is a challenging and active research problem that is very useful in many NLP applications, such as detecting duplicate questions in community question answering platforms such as Quora. Arabic is considered to be an under-resourced language, has many dialects, and rich in morphology. Combined together, these challenges make identifying semantically similar questions in Arabic even more difficult. In this paper, we introduce a novel approach to tackle this problem, and test it on two benchmarks; one for Modern Standard Arabic (MSA), and another for the 24 major Arabic dialects. We are able to show that our new system outperforms state-of-the-art approaches by achieving 93 This is achieved by utilizing contextualized word representations (ELMo embeddings) trained on a text corpus containing MSA and dialectic sentences. This in combination with a pairwise fine-grained similarity layer, helps our question-to-question similarity model to generalize predictions on different dialects while being trained only on question-to-question MSA data.

READ FULL TEXT
research
09/12/2019

NSURL-2019 Shared Task 8: Semantic Question Similarity in Arabic

Question semantic similarity (Q2Q) is a challenging task that is very us...
research
04/24/2020

The Inception Team at NSURL-2019 Task 8: Semantic Question Similarity in Arabic

This paper describes our method for the task of Semantic Question Simila...
research
08/10/2020

Question Identification in Arabic Language Using Emotional Based Features

With the growth of content on social media networks, enterprises and ser...
research
05/23/2018

A logical representation of Arabic questions toward automatic passage extraction from the Web

With the expanding growth of Arabic electronic data on the web, extracti...
research
10/18/2016

Addressing Community Question Answering in English and Arabic

This paper studies the impact of different types of features applied to ...
research
06/14/2021

Evaluating Various Tokenizers for Arabic Text Classification

The first step in any NLP pipeline is learning word vector representatio...
research
10/04/2017

Cross-Language Question Re-Ranking

We study how to find relevant questions in community forums when the lan...

Please sign up or login with your details

Forgot password? Click here to reset