Fortification of Neural Morphological Segmentation Models for Polysynthetic Minimal-Resource Languages

04/17/2018
by   Katharina Kann, et al.
0

Morphological segmentation for polysynthetic languages is challenging, because a word may consist of many individual morphemes and training data can be extremely scarce. Since neural sequence-to-sequence (seq2seq) models define the state of the art for morphological segmentation in high-resource settings and for (mostly) European languages, we first show that they also obtain competitive performance for Mexican polysynthetic languages in minimal-resource settings. We then propose two novel multi-task training approaches -one with, one without need for external unlabeled resources-, and two corresponding data augmentation methods, improving over the neural baseline for all languages. Finally, we explore cross-lingual transfer as a third way to fortify our neural model and show that we can train one single multi-lingual model for related languages while maintaining comparable or even improved performance, thus reducing the amount of parameters by close to 75 segmentation datasets for Mexicanero, Nahuatl, Wixarika and Yorem Nokki for future research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/16/2019

Pushing the Limits of Low-Resource Morphological Inflection

Recent years have seen exceptional strides in the task of automatic morp...
research
10/06/2020

Tackling the Low-resource Challenge for Canonical Segmentation

Canonical morphological segmentation consists of dividing words into the...
research
01/28/2021

Enhancing Sequence-to-Sequence Neural Lemmatization with External Resources

We propose a novel hybrid approach to lemmatization that enhances the se...
research
02/03/2019

Universal Lemmatizer: A Sequence to Sequence Model for Lemmatizing Universal Dependencies Treebanks

In this paper we present a novel lemmatization method based on a sequenc...
research
07/23/2019

CMU-01 at the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in Morphology

This paper presents the submission by the CMU-01 team to the SIGMORPHON ...
research
09/24/2018

Neural Transductive Learning and Beyond: Morphological Generation in the Minimal-Resource Setting

Neural state-of-the-art sequence-to-sequence (seq2seq) models often do n...
research
04/28/2020

Learning to Learn Morphological Inflection for Resource-Poor Languages

We propose to cast the task of morphological inflection - mapping a lemm...

Please sign up or login with your details

Forgot password? Click here to reset