Cross-Lingual Transfer for Distantly Supervised and Low-resources Indonesian NER

07/25/2019
by   Fariz Ikhwantri, et al.
0

Manually annotated corpora for low-resource languages are usually small in quantity (gold), or large but distantly supervised (silver). Inspired by recent progress of injecting pre-trained language model (LM) on many Natural Language Processing (NLP) task, we proposed to fine-tune pre-trained language model from high-resources languages to low-resources languages to improve the performance of both scenarios. Our empirical experiment demonstrates significant improvement when fine-tuning pre-trained language model in cross-lingual transfer scenarios for small gold corpus and competitive results in large silver compare to supervised cross-lingual transfer, which will be useful when there is no parallel annotation in the same task to begin. We compare our proposed method of cross-lingual transfer using pre-trained LM to different sources of transfer such as mono-lingual LM and Part-of-Speech tagging (POS) in the downstream task of both large silver and small gold NER dataset by exploiting character-level input of bi-directional language model task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/02/2023

Parameter-Efficient Cross-lingual Transfer of Vision and Language Models via Translation-based Alignment

Pre-trained vision and language models such as CLIP have witnessed remar...
research
06/05/2023

Cross-Lingual Transfer Learning for Phrase Break Prediction with Multilingual Language Model

Phrase break prediction is a crucial task for improving the prosody natu...
research
12/01/2021

DPRK-BERT: The Supreme Language Model

Deep language models have achieved remarkable success in the NLP domain....
research
11/21/2018

The Best of Both Worlds: Lexical Resources To Improve Low-Resource Part-of-Speech Tagging

In natural language processing, the deep learning revolution has shifted...
research
06/11/2018

Part-of-Speech Tagging on an Endangered Language: a Parallel Griko-Italian Resource

Most work on part-of-speech (POS) tagging is focused on high resource la...
research
08/24/2020

Cross-lingual Semantic Role Labeling with Model Transfer

Prior studies show that cross-lingual semantic role labeling (SRL) can b...
research
11/12/2020

Using IPA-Based Tacotron for Data Efficient Cross-Lingual Speaker Adaptation and Pronunciation Enhancement

Recent neural Text-to-Speech (TTS) models have been shown to perform ver...

Please sign up or login with your details

Forgot password? Click here to reset