Weakly Supervised POS Taggers Perform Poorly on Truly Low-Resource Languages

04/28/2020
by   Katharina Kann, et al.
0

Part-of-speech (POS) taggers for low-resource languages which are exclusively based on various forms of weak supervision - e.g., cross-lingual transfer, type-level supervision, or a combination thereof - have been reported to perform almost as well as supervised ones. However, weakly supervised POS taggers are commonly only evaluated on languages that are very different from truly low-resource languages, and the taggers use sources of information, like high-coverage and almost error-free dictionaries, which are likely not available for resource-poor languages. We train and evaluate state-of-the-art weakly supervised POS taggers for a typologically diverse set of 15 truly low-resource languages. On these languages, given a realistic amount of resources, even our best model gets only less than half of the words right. Our results highlight the need for new and different approaches to POS tagging for truly low-resource languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/30/2017

Cross-lingual, Character-Level Neural Morphological Tagging

Even for common NLP tasks, sufficient supervision is not available in ma...
research
06/14/2016

Cross-Lingual Morphological Tagging for Low-Resource Languages

Morphologically rich languages often lack the annotated linguistic resou...
research
06/25/2023

Weakly Supervised Scene Text Generation for Low-resource Languages

A large number of annotated training images is crucial for training succ...
research
10/18/2022

Graph-Based Multilingual Label Propagation for Low-Resource Part-of-Speech Tagging

Part-of-Speech (POS) tagging is an important component of the NLP pipeli...
research
10/06/2015

Language Segmentation

Language segmentation consists in finding the boundaries where one langu...
research
08/29/2018

Distant Supervision from Disparate Sources for Low-Resource Part-of-Speech Tagging

We introduce DsDs: a cross-lingual neural part-of-speech tagger that lea...
research
01/27/2021

Mining Large-Scale Low-Resource Pronunciation Data From Wikipedia

Pronunciation modeling is a key task for building speech technology in n...

Please sign up or login with your details

Forgot password? Click here to reset