T-Projection: High Quality Annotation Projection for Sequence Labeling Tasks

12/20/2022
by   Iker García-Ferrero, et al.
0

In the absence of readily available labeled data for a given task and language, annotation projection has been proposed as one of the possible strategies to automatically generate annotated data which may then be used to train supervised systems. Annotation projection has often been formulated as the task of projecting, on parallel corpora, some labels from a source into a target language. In this paper we present T-Projection, a new approach for annotation projection that leverages large pretrained text2text language models and state-of-the-art machine translation technology. T-Projection decomposes the label projection task into two subtasks: (i) The candidate generation step, in which a set of projection candidates using a multilingual T5 model is generated and, (ii) the candidate selection step, in which the candidates are ranked based on translation probabilities. We evaluate our method in three downstream tasks and five different languages. Our results show that T-projection improves the average F1 score of previous methods by more than 8 points.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/03/2017

Transferring Semantic Roles Using Translation and Syntactic Information

Our paper addresses the problem of annotation projection for semantic ro...
research
05/08/2023

MultiTACRED: A Multilingual Version of the TAC Relation Extraction Dataset

Relation extraction (RE) is a fundamental task in information extraction...
research
09/16/2023

Contextual Label Projection for Cross-Lingual Structure Extraction

Translating training data into target languages has proven beneficial fo...
research
07/08/2017

Weakly Supervised Cross-Lingual Named Entity Recognition via Effective Annotation and Representation Projection

The state-of-the-art named entity recognition (NER) systems are supervis...
research
11/23/2021

CL-NERIL: A Cross-Lingual Model for NER in Indian Languages

Developing Named Entity Recognition (NER) systems for Indian languages h...
research
08/14/2022

Fast Vocabulary Projection Method via Clustering for Multilingual Machine Translation on GPU

Multilingual Neural Machine Translation has been showing great success u...
research
05/14/2016

Capturing divergence in dependency trees to improve syntactic projection

Obtaining syntactic parses is a crucial part of many NLP pipelines. Howe...

Please sign up or login with your details

Forgot password? Click here to reset