Taxonomic Loss for Morphological Glossing of Low-Resource Languages

08/29/2023
by   Michael Ginn, et al.
0

Morpheme glossing is a critical task in automated language documentation and can benefit other downstream applications greatly. While state-of-the-art glossing systems perform very well for languages with large amounts of existing data, it is more difficult to create useful models for low-resource languages. In this paper, we propose the use of a taxonomic loss function that exploits morphological information to make morphological glossing more performant when data is scarce. We find that while the use of this loss function does not outperform a standard loss function with regards to single-label prediction accuracy, it produces better predictions when considering the top-n predicted labels. We suggest this property makes the taxonomic loss function useful in a human-in-the-loop annotation setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/06/2020

Tackling the Low-resource Challenge for Canonical Segmentation

Canonical morphological segmentation consists of dividing words into the...
research
03/16/2022

Morphological Processing of Low-Resource Languages: Where We Are and What's Next

Automatic morphological processing can aid downstream natural language p...
research
11/29/2018

Tuplemax Loss for Language Identification

In many scenarios of a language identification task, the user will speci...
research
08/16/2019

Pushing the Limits of Low-Resource Morphological Inflection

Recent years have seen exceptional strides in the task of automatic morp...
research
10/26/2022

Modeling the Graphotactics of Low-Resource Languages Using Sequential GANs

Generative Adversarial Networks (GANs) have been shown to aid in the cre...
research
09/26/2019

On the Importance of Subword Information for Morphological Tasks in Truly Low-Resource Languages

Recent work has validated the importance of subword information for word...
research
08/29/2018

Distant Supervision from Disparate Sources for Low-Resource Part-of-Speech Tagging

We introduce DsDs: a cross-lingual neural part-of-speech tagger that lea...

Please sign up or login with your details

Forgot password? Click here to reset