DeepAI AI Chat
Log In Sign Up

CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual Labeled Sequence Translation

10/13/2022
by   Jian Yang, et al.
0

Named entity recognition (NER) suffers from the scarcity of annotated training data, especially for low-resource languages without labeled data. Cross-lingual NER has been proposed to alleviate this issue by transferring knowledge from high-resource languages to low-resource languages via aligned cross-lingual representations or machine translation results. However, the performance of cross-lingual NER methods is severely affected by the unsatisfactory quality of translation or label projection. To address these problems, we propose a Cross-lingual Entity Projection framework (CROP) to enable zero-shot cross-lingual NER with the help of a multilingual labeled sequence translation model. Specifically, the target sequence is first translated into the source language and then tagged by a source NER model. We further adopt a labeled sequence translation model to project the tagged sequence back to the target language and label the target raw sentence. Ultimately, the whole pipeline is integrated into an end-to-end model by the way of self-training. Experimental results on two benchmarks demonstrate that our method substantially outperforms the previous strong baseline by a large margin of +3 7 F1 scores and achieves state-of-the-art performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

11/17/2022

ConNER: Consistency Training for Cross-lingual Named Entity Recognition

Cross-lingual named entity recognition (NER) suffers from data scarcity ...
04/02/2022

A Dual-Contrastive Framework for Low-Resource Cross-Lingual Named Entity Recognition

Cross-lingual Named Entity Recognition (NER) has recently become a resea...
08/31/2019

Entity Projection via Machine-Translation for Cross-Lingual NER

Although over 100 languages are supported by strong off-the-shelf machin...
09/01/2021

Boosting Cross-Lingual Transfer via Self-Learning with Uncertainty Estimation

Recent multilingual pre-trained language models have achieved remarkable...
11/11/2020

CalibreNet: Calibration Networks for Multilingual Sequence Labeling

Lack of training data in low-resource languages presents huge challenges...
06/17/2020

Building Low-Resource NER Models Using Non-Speaker Annotation

In low-resource natural language processing (NLP), the key problem is a ...
04/29/2022

Por Qué Não Utiliser Alla Språk? Mixed Training with Gradient Optimization in Few-Shot Cross-Lingual Transfer

The current state-of-the-art for few-shot cross-lingual transfer learnin...

Code Repositories

CROP

[EMNLP 2022] CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual Labeled Sequence Translation


view repo