Linguistically inspired morphological inflection with a sequence to sequence model

09/04/2020
by   Eleni Metheniti, et al.
0

Inflection is an essential part of every human language's morphology, yet little effort has been made to unify linguistic theory and computational methods in recent years. Methods of string manipulation are used to infer inflectional changes; our research question is whether a neural network would be capable of learning inflectional morphemes for inflection production in a similar way to a human in early stages of language acquisition. We are using an inflectional corpus (Metheniti and Neumann, 2020) and a single layer seq2seq model to test this hypothesis, in which the inflectional affixes are learned and predicted as a block and the word stem is modelled as a character sequence to account for infixation. Our character-morpheme-based model creates inflection by predicting the stem character-to-character and the inflectional affixes as character blocks. We conducted three experiments on creating an inflected form of a word given the lemma and a set of input and target features, comparing our architecture to a mainstream character-based model with the same hyperparameters, training and test sets. Overall for 17 languages, we noticed small improvements on inflecting known lemmas (+0.68 better performance of our model in predicting inflected forms of unknown words (+3.7 (+1.09

READ FULL TEXT

page 8

page 9

page 10

research
08/28/2018

What do character-level models learn about morphology? The case of dependency parsing

When parsing morphologically-rich languages with neural models, it is be...
research
12/18/2015

Morphological Inflection Generation Using Character Sequence to Sequence Learning

Morphological inflection generation is the task of generating the inflec...
research
02/22/2017

Context-Aware Prediction of Derivational Word-forms

Derivational morphology is a fundamental and complex characteristic of l...
research
06/09/2023

Progress on Constructing Phylogenetic Networks for Languages

In 2006, Warnow, Evans, Ringe, and Nakhleh proposed a stochastic model (...
research
03/13/2023

Instate: Predicting the State of Residence From Last Name

India has twenty-two official languages. Serving such a diverse language...
research
04/16/2020

Kvistur 2.0: a BiLSTM Compound Splitter for Icelandic

In this paper, we present a character-based BiLSTM model for splitting I...
research
07/04/2017

CharManteau: Character Embedding Models For Portmanteau Creation

Portmanteaus are a word formation phenomenon where two words are combine...

Please sign up or login with your details

Forgot password? Click here to reset