Graph-Based Multilingual Label Propagation for Low-Resource Part-of-Speech Tagging

10/18/2022
by   Ayyoob Imani, et al.
0

Part-of-Speech (POS) tagging is an important component of the NLP pipeline, but many low-resource languages lack labeled data for training. An established method for training a POS tagger in such a scenario is to create a labeled training set by transferring from high-resource languages. In this paper, we propose a novel method for transferring labels from multiple high-resource source to low-resource target languages. We formalize POS tag projection as graph-based label propagation. Given translations of a sentence in multiple languages, we create a graph with words as nodes and alignment links as edges by aligning words for all language pairs. We then propagate node labels from source to target using a Graph Neural Network augmented with transformer layers. We show that our propagation creates training sets that allow us to train POS taggers for a diverse set of languages. When combined with enhanced contextualized embeddings, our method achieves a new state-of-the-art for unsupervised POS tagging of low-resource languages.

READ FULL TEXT
research
04/28/2020

Weakly Supervised POS Taggers Perform Poorly on Truly Low-Resource Languages

Part-of-speech (POS) taggers for low-resource languages which are exclus...
research
08/26/2019

Low-Resource Name Tagging Learned with Weakly Labeled Data

Name tagging in low-resource languages or domains suffers from inadequat...
research
04/10/2019

A Grounded Unsupervised Universal Part-of-Speech Tagger for Low-Resource Languages

Unsupervised part of speech (POS) tagging is often framed as a clusterin...
research
10/03/2021

Enriching Ontology with Temporal Commonsense for Low-Resource Audio Tagging

Audio tagging aims at predicting sound events occurred in a recording. T...
research
03/22/2019

Data Augmentation via Dependency Tree Morphing for Low-Resource Languages

Neural NLP systems achieve high scores in the presence of sizable traini...
research
10/21/2019

Constructing Artificial Data for Fine-tuning for Low-Resource Biomedical Text Tagging with Applications in PICO Annotation

Biomedical text tagging systems are plagued by the dearth of labeled tra...
research
10/17/2022

Transferring Knowledge via Neighborhood-Aware Optimal Transport for Low-Resource Hate Speech Detection

The concerning rise of hateful content on online platforms has increased...

Please sign up or login with your details

Forgot password? Click here to reset