Bootstrapping Techniques for Polysynthetic Morphological Analysis

05/03/2020
by   William Lane, et al.
0

Polysynthetic languages have exceptionally large and sparse vocabularies, thanks to the number of morpheme slots and combinations in a word. This complexity, together with a general scarcity of written data, poses a challenge to the development of natural language technologies. To address this challenge, we offer linguistically-informed approaches for bootstrapping a neural morphological analyzer, and demonstrate its application to Kunwinjku, a polysynthetic Australian language. We generate data from a finite state transducer to train an encoder-decoder model. We improve the model by "hallucinating" missing linguistic structure into the training data, and by resampling from a Zipf distribution to simulate a more natural distribution of morphemes. The best model accounts for all instances of reduplication in the test set and achieves an accuracy of 94.7 improvement over the FST baseline. This process demonstrates the feasibility of bootstrapping a neural morph analyzer from minimal resources.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2016

Single-Model Encoder-Decoder with Explicit Morphological Representation for Reinflection

Morphological reinflection is the task of generating a target form given...
research
09/17/2021

CKMorph: A Comprehensive Morphological Analyzer for Central Kurdish

A morphological analyzer, which is a significant component of many natur...
research
02/29/2020

A Finite State Transducer Based Morphological Analyzer of Maithili Language

Morphological analyzers are the essential milestones for many linguistic...
research
08/13/2018

Comparing morphological complexity of Spanish, Otomi and Nahuatl

We use two small parallel corpora for comparing the morphological comple...
research
11/18/2022

AVATAR submission to the Ego4D AV Transcription Challenge

In this report, we describe our submission to the Ego4D AudioVisual (AV)...
research
06/07/2023

Can current NLI systems handle German word order? Investigating language model performance on a new German challenge set of minimal pairs

Compared to English, German word order is freer and therefore poses addi...
research
06/13/2019

A Computational Analysis of Natural Languages to Build a Sentence Structure Aware Artificial Neural Network

Natural languages are complexly structured entities. They exhibit charac...

Please sign up or login with your details

Forgot password? Click here to reset