A Supervised Word Alignment Method based on Cross-Language Span Prediction using Multilingual BERT

04/29/2020
by   Masaaki Nagata, et al.
0

We present a novel supervised word alignment method based on cross-language span prediction. We first formalize a word alignment problem as a collection of independent predictions from a token in the source sentence to a span in the target sentence. As this is equivalent to a SQuAD v2.0 style question answering task, we then solve this problem by using multilingual BERT, which is fine-tuned on a manually created gold word alignment data. We greatly improved the word alignment accuracy by adding the context of the token to the question. In the experiments using five word alignment datasets among Chinese, Japanese, German, Romanian, French, and English, we show that the proposed method significantly outperformed previous supervised and unsupervised word alignment methods without using any bitexts for pretraining. For example, we achieved an F1 score of 86.7 for the Chinese-English data, which is 13.3 points higher than the previous state-of-the-art supervised methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/03/2021

Bootstrapping Multilingual AMR with Contextual Word Alignments

We develop high performance multilingualAbstract Meaning Representation ...
research
10/01/2021

Span Labeling Approach for Vietnamese and Chinese Word Segmentation

In this paper, we propose a span labeling approach to model n-gram infor...
research
04/29/2020

Bilingual Text Extraction as Reading Comprehension

In this paper, we propose a method to extract bilingual texts automatica...
research
01/12/2022

PromptBERT: Improving BERT Sentence Embeddings with Prompts

The poor performance of the original BERT for sentence semantic similari...
research
09/21/2023

Scaling up COMETKIWI: Unbabel-IST 2023 Submission for the Quality Estimation Shared Task

We present the joint contribution of Unbabel and Instituto Superior Técn...
research
08/21/2021

Metric Learning in Multilingual Sentence Similarity Measurement for Document Alignment

Document alignment techniques based on multilingual sentence representat...
research
07/01/2020

Iterative Paraphrastic Augmentation with Discriminative Span Alignment

We introduce a novel paraphrastic augmentation strategy based on sentenc...

Please sign up or login with your details

Forgot password? Click here to reset