DeepAI AI Chat
Log In Sign Up

Bootstrapping Techniques for Polysynthetic Morphological Analysis

by   William Lane, et al.

Polysynthetic languages have exceptionally large and sparse vocabularies, thanks to the number of morpheme slots and combinations in a word. This complexity, together with a general scarcity of written data, poses a challenge to the development of natural language technologies. To address this challenge, we offer linguistically-informed approaches for bootstrapping a neural morphological analyzer, and demonstrate its application to Kunwinjku, a polysynthetic Australian language. We generate data from a finite state transducer to train an encoder-decoder model. We improve the model by "hallucinating" missing linguistic structure into the training data, and by resampling from a Zipf distribution to simulate a more natural distribution of morphemes. The best model accounts for all instances of reduplication in the test set and achieves an accuracy of 94.7 improvement over the FST baseline. This process demonstrates the feasibility of bootstrapping a neural morph analyzer from minimal resources.


page 1

page 2

page 3

page 4


Single-Model Encoder-Decoder with Explicit Morphological Representation for Reinflection

Morphological reinflection is the task of generating a target form given...

CKMorph: A Comprehensive Morphological Analyzer for Central Kurdish

A morphological analyzer, which is a significant component of many natur...

A Finite State Transducer Based Morphological Analyzer of Maithili Language

Morphological analyzers are the essential milestones for many linguistic...

Comparing morphological complexity of Spanish, Otomi and Nahuatl

We use two small parallel corpora for comparing the morphological comple...

AVATAR submission to the Ego4D AV Transcription Challenge

In this report, we describe our submission to the Ego4D AudioVisual (AV)...

Neural Polysynthetic Language Modelling

Research in natural language processing commonly assumes that approaches...

A Computational Analysis of Natural Languages to Build a Sentence Structure Aware Artificial Neural Network

Natural languages are complexly structured entities. They exhibit charac...