Metric Learning in Multilingual Sentence Similarity Measurement for Document Alignment

08/21/2021
by   Charith Rajitha, et al.
0

Document alignment techniques based on multilingual sentence representations have recently shown state of the art results. However, these techniques rely on unsupervised distance measurement techniques, which cannot be fined-tuned to the task at hand. In this paper, instead of these unsupervised distance measurement techniques, we employ Metric Learning to derive task-specific distance measurements. These measurements are supervised, meaning that the distance measurement metric is trained using a parallel dataset. Using a dataset belonging to English, Sinhala, and Tamil, which belong to three different language families, we show that these task-specific supervised distance learning metrics outperform their unsupervised counterparts, for document alignment.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset