CalibreNet: Calibration Networks for Multilingual Sequence Labeling

11/11/2020
by   Shining Liang, et al.
0

Lack of training data in low-resource languages presents huge challenges to sequence labeling tasks such as named entity recognition (NER) and machine reading comprehension (MRC). One major obstacle is the errors on the boundary of predicted answers. To tackle this problem, we propose CalibreNet, which predicts answers in two steps. In the first step, any existing sequence labeling method can be adopted as a base model to generate an initial answer. In the second step, CalibreNet refines the boundary of the initial answer. To tackle the challenge of lack of training data in low-resource languages, we dedicatedly develop a novel unsupervised phrase boundary recovery pre-training task to enhance the multilingual boundary detection capability of CalibreNet. Experiments on two cross-lingual benchmark datasets show that the proposed approach achieves SOTA results on zero-shot cross-lingual NER and MRC tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/13/2022

CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual Labeled Sequence Translation

Named entity recognition (NER) suffers from the scarcity of annotated tr...
research
04/29/2020

Enhancing Answer Boundary Detection for Multilingual Machine Reading Comprehension

Multilingual pre-trained models could leverage the training data from a ...
research
10/24/2019

Low-Resource Sequence Labeling via Unsupervised Multilingual Contextualized Representations

Previous work on cross-lingual sequence labeling tasks either requires p...
research
04/28/2020

Extending Multilingual BERT to Low-Resource Languages

Multilingual BERT (M-BERT) has been a huge success in both supervised an...
research
06/01/2021

Reinforced Iterative Knowledge Distillation for Cross-Lingual Named Entity Recognition

Named entity recognition (NER) is a fundamental component in many applic...
research
04/03/2022

Learning Disentangled Semantic Representations for Zero-Shot Cross-Lingual Transfer in Multilingual Machine Reading Comprehension

Multilingual pre-trained models are able to zero-shot transfer knowledge...
research
06/17/2020

Building Low-Resource NER Models Using Non-Speaker Annotation

In low-resource natural language processing (NLP), the key problem is a ...

Please sign up or login with your details

Forgot password? Click here to reset