Contextual Label Projection for Cross-Lingual Structure Extraction

09/16/2023
by   Tanmay Parekh, et al.
0

Translating training data into target languages has proven beneficial for cross-lingual transfer. However, for structure extraction tasks, translating data requires a label projection step, which translates input text and obtains translated labels in the translated text jointly. Previous research in label projection mostly compromises translation quality by either facilitating easy identification of translated labels from translated text or using word-level alignment between translation pairs to assemble translated phrase-level labels from the aligned words. In this paper, we introduce CLAP, which first translates text to the target language and performs contextual translation on the labels using the translated text as the context, ensuring better accuracy for the translated labels. We leverage instruction-tuned language models with multilingual capabilities as our contextual translator, imposing the constraint of the presence of translated labels in the translated text via instructions. We compare CLAP with other label projection techniques for creating pseudo-training data in target languages on event argument extraction, a representative structure extraction task. Results show that CLAP improves by 2-2.5 F1-score over other methods on the Chinese and Arabic ACE05 datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/10/2020

FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding

Large-scale cross-lingual language models (LM), such as mBERT, Unicoder ...
research
11/28/2022

Frustratingly Easy Label Projection for Cross-lingual Transfer

Translating training data into many languages has emerged as a practical...
research
02/10/2022

Slovene SuperGLUE Benchmark: Translation and Evaluation

We present a Slovene combined machine-human translated SuperGLUE benchma...
research
12/20/2022

T-Projection: High Quality Annotation Projection for Sequence Labeling Tasks

In the absence of readily available labeled data for a given task and la...
research
08/30/2019

PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification

Most existing work on adversarial data generation focuses on English. Fo...
research
10/23/2020

Unsupervised Cross-lingual Adaptation for Sequence Tagging and Beyond

Cross-lingual adaptation with multilingual pre-trained language models (...
research
08/06/2023

Towards Scene-Text to Scene-Text Translation

In this work, we study the task of “visually" translating scene text fro...

Please sign up or login with your details

Forgot password? Click here to reset