Cross-Language Question Re-Ranking

10/04/2017
by   Giovanni Da San Martino, et al.
0

We study how to find relevant questions in community forums when the language of the new questions is different from that of the existing questions in the forum. In particular, we explore the Arabic-English language pair. We compare a kernel-based system with a feed-forward neural network in a scenario where a large parallel corpus is available for training a machine translation system, bilingual dictionaries, and cross-language word embeddings. We observe that both approaches degrade the performance of the system when working on the translated text, especially the kernel-based system, which depends heavily on a syntactic kernel. We address this issue using a cross-language tree kernel, which compares the original Arabic tree to the English trees of the related questions. We show that this kernel almost closes the performance gap with respect to the monolingual system. On the neural network side, we use the parallel corpus to train cross-language embeddings, which we then use to represent the Arabic input and the English related questions in the same space. The results also improve to close to those of the monolingual neural network. Overall, the kernel system shows a better performance compared to the neural network in all cases.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/18/2016

Addressing Community Question Answering in English and Arabic

This paper studies the impact of different types of features applied to ...
research
11/22/2022

ArzEn-ST: A Three-way Speech Translation Corpus for Code-Switched Egyptian Arabic - English

We present our work on collecting ArzEn-ST, a code-switched Egyptian Ara...
research
09/24/2019

Code-switching Language Modeling With Bilingual Word Embeddings: A Case Study for Egyptian Arabic-English

Code-switching (CS) is a widespread phenomenon among bilingual and multi...
research
04/16/2021

"Wikily" Neural Machine Translation Tailored to Cross-Lingual Tasks

We present a simple but effective approach for leveraging Wikipedia for ...
research
11/08/2019

Neural Arabic Text Diacritization: State of the Art Results and a Novel Approach for Machine Translation

In this work, we present several deep learning models for the automatic ...
research
09/19/2019

Deep Contextualized Pairwise Semantic Similarity for Arabic Language Questions

Question semantic similarity is a challenging and active research proble...
research
09/27/2017

A Preliminary Study for Building an Arabic Corpus of Pair Questions-Texts from the Web: AQA-Webcorp

With the development of electronic media and the heterogeneity of Arabic...

Please sign up or login with your details

Forgot password? Click here to reset