Multilingual Sentence Transformer as A Multilingual Word Aligner

01/28/2023
by   Weikang Wang, et al.
0

Multilingual pretrained language models (mPLMs) have shown their effectiveness in multilingual word alignment induction. However, these methods usually start from mBERT or XLM-R. In this paper, we investigate whether multilingual sentence Transformer LaBSE is a strong multilingual word aligner. This idea is non-trivial as LaBSE is trained to learn language-agnostic sentence-level embeddings, while the alignment extraction task requires the more fine-grained word-level embeddings to be language-agnostic. We demonstrate that the vanilla LaBSE outperforms other mPLMs currently used in the alignment task, and then propose to finetune LaBSE on parallel corpus for further improvement. Experiment results on seven language pairs show that our best aligner outperforms previous state-of-the-art models of all varieties. In addition, our aligner supports different language pairs in a single model, and even achieves new state-of-the-art on zero-shot language pairs that does not appear in the finetuning process.

READ FULL TEXT
research
07/19/2022

Multilingual Transformer Encoders: a Word-Level Task-Agnostic Evaluation

Some Transformer-based models can perform cross-lingual transfer learnin...
research
06/24/2022

DetIE: Multilingual Open Information Extraction Inspired by Object Detection

State of the art neural methods for open information extraction (OpenIE)...
research
09/21/2023

Scaling up COMETKIWI: Unbabel-IST 2023 Submission for the Quality Estimation Shared Task

We present the joint contribution of Unbabel and Instituto Superior Técn...
research
01/28/2020

Unsupervised Multilingual Alignment using Wasserstein Barycenter

We study unsupervised multilingual alignment, the problem of finding wor...
research
01/08/2014

Learning Multilingual Word Representations using a Bag-of-Words Autoencoder

Recent work on learning multilingual word representations usually relies...
research
09/11/2021

The Impact of Positional Encodings on Multilingual Compression

In order to preserve word-order information in a non-autoregressive sett...

Please sign up or login with your details

Forgot password? Click here to reset