Low-Resource Machine Translation using Interlinear Glosses

11/07/2019 ∙ by Zhong Zhou, et al. ∙ Carnegie Mellon University 0

Neural Machine Translation (NMT) does not handle low-resource translation well because NMT is data-hungry and low-resource languages, by their nature, have limited parallel data. Many low-resource languages are morphologically rich, which complicates matters further by increasing data sparsity. However, a good linguist is capable of building a morphological analyzer in far fewer hours than it would take to collect and translate the amount of parallel data needed for conventional NMT. We combine the benefits of both NMT and linguistic information in our work. We use morphological analyzer to automatically generate interlinear glosses with dictionary or parallel data, and translate the source text to interlinear gloss as an interlingua representation, and finally translate into the target text using NMT trained on the ODIN dataset that includes a large collection of interlinear glosses and their corresponding target translations. Our result for translating from the interlinear gloss to the target text using the entire ODIN dataset achieves a BLEU score of 35.07. And our qualitative results show positive findings in a low-resource scenario of Turkish-English translation using 865 lines of training data. Our translation system yield better results than training NMT directly from the source language to the target language in a constrained-data setting, and is helpful to produce translation with sufficiently good content and fluency when data is scarce.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

There are more than seven thousand languages in the world, but most people only speak the most popular thirteen languages, resulting in a long tail of low-resource languages [Nordhoff et al., 2013]. NMT does not handle low-resource translation well because NMT typically is data-hungry; and low-resource languages, by their nature, have limited parallel corpora covering wide range of topics and lack good bilingual dictionaries that are consistent in quality [Koehn and Knowles, 2017, El-Kahlout and Oflazer, 2006, Chaudhary et al., 2018]

. The scarcity of low-resource data renders the transfer learning of generalizations made in high-resource settings to low-resource settings to be a difficult and yet important task

[Sennrich and Zhang, 2019].

Many low-resource languages like the Inuit-Yupik-Unangan languages are morphologically rich, which complicates matters further by increasing data sparsity. However, a good linguist is capable of building a morphological analyzer in far fewer hours than it would take to collect and translate the amount of parallel data needed for conventional NMT. With extreme scarcity of data, extensive morphological analysis and disambiguation that results in the reduction of word sparsity is very useful [Habash and Sadat, 2006]. As the data size grows, the benefits of morphological techniques in increasing in translation quality lessens [Habash and Sadat, 2006, Lee, 2004]. Even with thorough usage of a myriad of computational linguistic knowledge including morphology, syntax, semantics, discourse analysis and much more, low-resource translation still poses many interesting challenges for researchers [Hajič, 2000].

In our work, we combine the benefits of both NMT and linguistic information. Firstly, we exploit the representation of interlinear gloss, which is traditionally used by linguists to encode morphosyntactic structure and cross-lingual lexical relations as shown in Table 1 [Samardzic et al., 2015, Moeller and Hulden, 2018]. We use a morphological analyzer to automatically generate interlinear glosses with dictionary or parallel data, and translate the source text to interlinear gloss as an interlingua representation. Secondly, we exploit the ODIN dataset, which includes a large collection of interlinear glosses and their corresponding target translations [Lewis and Xia, 2010, Xia et al., 2014]. We train a single NMT model with attention and BPE preprocessing on the entire ODIN dataset. In this way, we translate from the interlinear gloss that is generated from morphological analyzer to its final fluent target translation using NMT.

Our result for translating from the interlinear gloss to the target text using the entire ODIN dataset achieves a BLEU score of 35.07. And our qualitative results show positive findings in a low-resource scenario of Turkish-English translation using 865 lines of training data. Our translation system yield better results than training NMT directly from the source language to the target language in a constrained-data setting, and is helpful to produce translation with sufficiently good content and fluency when data is scarce.

Data Example
1. Source Language (Turkish) Kadin dans ediyor.
2. Interlinear gloss with source-lemma Kadin.NOM dance ediyor-AOR.3.SG.
3. Interlinear gloss with target-lemma Woman.NOM dance do-AOR.3.SG.
4. Target Language (English) The woman dances.
1. Source Language (Turkish) Adam kadin-i gör-dü.
2. Interlinear gloss with source-lemma Adam.NOM kadin-ACC gör-AOR.3.SG.
3. Interlinear gloss with target-lemma Man.NOM woman-ACC see-PST.3.SG.
4. Target Language (English) The man saw the woman.
Table 1: Examples of the translation sequence using interlinear glosses.
Notation Meaning in translation sequence
1 Source Language (Turkish) text
2 Interlinear gloss with source-lemma
3 Interlinear gloss with target-lemma
4 Target Language (English) text
Table 2: Notation used in the translation sequence of our model.

2 Related Works

2.1 Sub-word Level Machine Translation

However, a large open vocabulary due to morphological richness, diversity, and lexical productivity is challenging. There is a need to translate many words that have not appeared in training in source text and there is also the need to bring about new word forms in the target translations [Burlot et al., 2017, Matthews et al., 2018].

Many venture to sub-word level analysis to build robustness in translating out-of-vocabulary words (OOVs) [Chaudhary et al., 2018, Cotterell and Schütze, 2015, Wu et al., 2016, Sennrich et al., 2016]. Most OOVs are tagged as unknowns ($UNKs), though they are semantically important words, and they are different from each other [Ling et al., 2015, Sennrich et al., 2016]. To resolve the OOV problem, researchers venture to sub-word level analysis [Chaudhary et al., 2018, Cotterell and Schütze, 2015]. Researchers work on byte-level or character-models [Gillick et al., 2016, Ling et al., 2015, Chung et al., 2016]. Lots of character-level models do not work as well as word-level models, and do not produce optimal alignments, and therefore are limited in their representation power [Tiedemann, 2012]. Consequently, a lot of recent research shifts to sub-word level modeling between character-level and word-level. One prominent direction is BPE that iteratively learns subword units and strikes a balance between a fixed vocabulary size and the expressiveness of translating a complete open vocabulary [Sennrich et al., 2016, Burlot et al., 2017].

Normalized Gloss Meaning of the abbreviations Glosses that appear in the Turkish Odin data
NMLZ Nominalizer or nominalization NML, NOMZ, FNom, NOML
PRS Present tense PRES, PR, pres, Pres, PRESENT
PST Past tense PA, Pst, PST, Past, pst, PAST, PT, PTS, REPPAST, PST1S, past
ABL Ablative Abl, Abli, abl, ABL
ADV Adverb(ial) ADVL, Adv
RPRT Reported Past tense ReportedPast, REPPAST
Table 3: Examples of the normalization mapping created for the Turkish ODIN data.
Tags used Morphological Analyzer Meaning of the abbreviations Sets of normalized glosses included
P1pl 1st person plural possessive 1, PL, POSS
A1sg 1st person singular 1, SG
Reflex Reflexive Pronoun REFL
NarrPart Evidential participle EVID, PTCP
AorPart Aorist participle AOR, PTCP
PresPart Present participle PRS, PTCP
Table 4: Examples of the normalization mapping created for the outputs from Turkish Morphological Analyzer.
Interlinear gloss with target-lemma Example
Before normalization Ahmet self-3.sg-ACC very admire-Progr.-Rep.Past.
After normalization Ahmet self-3.SG-ACC very admire-PROG-Rep.PST.
Before normalization Woman.NOM dance do-AOR.3SG.
After normalization Woman.NOM dance do-AOR.SG.3.
Before normalization Man.NOM woman-ACC see-PAST.3SG.
After normalization Man.NOM woman-ACC see-PST.SG.3.
Table 5: Examples of the normalization of glosses from the Turkish ODIN data.

2.2 Morpheme-Level Machine Translation

Many researchers incorporated morphology in transfer learning in low-resource settings, including fusing morphological knowledge to build language model [El-Kahlout and Oflazer, 2006], learning word embeddings [Alexandrescu and Kirchhoff, 2006, Luong et al., 2013] and co-learning of word embeddings and morpheme embeddings [Qiu et al., 2014]. For MT, morphology is used to articulate both content and function. Therefore, it is crucial not just to use morphology in MT model for content translation, but also to use morphology to increase fluency which is very important and challenging in low-resource scenarios [Clifton and Sarkar, 2011, Abeillé et al., 1990, Bisazza and Federico, 2009, El-Kahlout and Oflazer, 2006, Dyer et al., 2010, Brown, 1999]. In the era of SMT, researchers have used lexicalized tags (LTAG structure) for tree-level machine translation [Abeillé et al., 1990]. Many have worked on morphological word segmentation [Bisazza and Federico, 2009], and also used scores from language model as a feature in the cdec hierarchical machine translation system [El-Kahlout and Oflazer, 2006, Dyer et al., 2010]. Some researchers also used morpheme alignment and grouping to improve translation results [El-Kahlout and Oflazer, 2006]. Some use morphological information to replace terms for structural translation [Brown, 1999].

Researchers used linguistic knowledge to improve NMT translation quality : lemmatisation reduces the data sparsity problem and gives opportunity for a word’s different inflectional forms to use a common representation, part-of-speech (POS) tagging helps with disambiguation, and a combination of linguistic features like a concatenation of feature embedding matrices of word, lemma, subword tags, POS tags, dependency parsing tags improves NMT performance [Sennrich and Haddow, 2016].

Morphemes are the smallest unit that encodes the form-meaning pairing, and are often used as the basic units of analysis instead of words [Goenaga, 1980, Abaitua et al., 1992, Stroppa et al., 2006]

. Morpheme-level translation allows words to share the same stem vector while allowing variation in meanings

[Cotterell and Schütze, 2015, Chaudhary et al., 2018, Renduchintala et al., 2019, Passban et al., 2018, Dalvi et al., 2017], shrinks the vocabulary size and decreases the number of OOVs [Bisazza and Federico, 2009], benefits the inference of rare word representations based on their morphological composition [Qiu et al., 2014], introduces smoothing through morphological information [Goldwater and McClosky, 2005], and makes fine-grained correction of NMT translations like agreement in number and person possible [Stroppa et al., 2006, Matthews et al., 2018]. It is worth noting that using morpheme-level representation is not the only channel to incorporate linguistic knowledge in MT. Researchers used grapheme-level, phoneme-level information, lemma tags along with morpheme-level information in adapting continuous word representations, which does not require bilingual dictionary and parallel corpora and exceeds the result of those that do need parallel corpora or dictionary [Chaudhary et al., 2018].

2.3 Interlinear Gloss Generation

Interlinear gloss is a linguistic abstract representation of morphosyntactic categories and cross linguistic lexical relations in a succinct and language-agnostic style [Samardzic et al., 2015, Moeller and Hulden, 2018]. Systematic glossing standards have developed much later even though linguists have been using interlinear gloss for a long time to record formerly under-documented or undocumented languages [Samardzic et al., 2015, Moeller and Hulden, 2018]. The elegance of interlinear gloss is that there is a one-to-one mapping between each segment of the source sentence to gloss [Samardzic et al., 2015].

Researchers have tried to generate interlinear glosses automatically. Some use supervised POS tagging to do grammatical glossing and use a dictionary and word disambiguation to replace source lemmas with target lemmas [Samardzic et al., 2015]

. Researchers also used conditional random field and active learning to automatically learn gloss representation

[Moeller and Hulden, 2018].

3 Data

3.1 ODIN dataset

We use the ODIN dataset, which is a multilingual resource for interlinear glosses [Lewis and Xia, 2010, Xia et al., 2014, Xia et al., 2015]. The entire ODIN dataset has 57608 parallel sentences with interlinear glosses combining all language data.

We translate Turkish to English. ODIN has 1081 Turkish sentences. Data is split into training, validation and test sets through the ratio of 80 percent, 10 percent and 10 percent. Note that we have extreme small data of 865 lines for training. English has the structure of subject-verb-object (SVO) and Turkish has the structure of subject-object-verb (SOV) [Yeniterzi and Oflazer, 2010].

Outputs from Morphological Analyzer Example
Before normalization Kadi+A3sg+Pnon+Nom dans+A3sg+Pnon+Nom et+Prog1+A3sg.
After normalization Kadin.3.SG.NPOSS.NOM dans.3.SG.NPOSS.NOM ediyor-PROG.3.SG.
After using dictionary Woman.3.SG.NPOSS.NOM dance.3.SG.NPOSS.NOM be-PROG.3.SG.
Reference in ODIN Woman.NOM dance do-AOR.3.SG.
Before normalization Adam+A3pl+Pnon+Nom kadi+A3sg+Pnon+Acc gör+Past+A3sg.
After normalization Adam.3.SG.NPOSS.NOM kadi.3.SG.NPOSS.ACC gör-PST.3.SG.
After using dictionary Man.3.SG.NPOSS.NOM woman.3.SG.NPOSS.ACC see-PST.3.SG.
Reference in ODIN Man.NOM woman-ACC see-PST.3.SG.
Table 6: Examples of interlinear gloss generation (123) from the output from morphological analyzer. Notation of the translation sequence follows from Table 2.
Translation Sequence Baseline1: 14 Baseline2: 14* Full: 1234 Fluency: 34
4-gram BLEU 3.05 8.52 4.74 35.07
1-gram BLEU 21.30 27.60 21.50 64.20
Table 7: BLEU scores of different translation sequence with notation in Table 2. All experiments except the starred Baseline2 use 865 lines of training data. Baseline2 uses additional 57608 lines of Turkish-English parallel data.
Interlinear gloss with target(English)-lemma NMT result of gloss-to-target in fluent target(English) language Expected target(English) sentence for reference
Peter and Mary that/those not came-3SG/3PL Peter and Mary , he didn’t come. Peter and Mary, they didn’t come.
PERF.AV-buy NOM-man ERG-fish DAT-store The man bought fish at the store. The man bought fish at thestore’ 111ODIN dataset is not clean, and this is an example where two words are concatenated together without space followed by a unnecessary punctuation symbol. This example serves a good example to show that our NMT output automatically correct typos in producing fluent target(English) sentence..
Popeye has Olive Oyl.DAT the kitchen table clean-wiped Popeye wiped the kitchen clean to clean the table within. Popeye wiped the kitchen table clean for Olive Oyl.
AGR-do-make-ASP that waterpot AGR-fall-ASP The girl made that waterpot fall. The girl made the waterpot fall.
Table 8: Examples of Gloss-to-Target (34 in Table 2) NMT translation results. The source is the interlinear gloss with target(English)-lemma and the target is the fluent English.
Source Sentence Expected Translation Baseline1: 14 Baseline2: 14* Generation: 123 Full: 1234
Ahmet gelmi. Ahmet must have come. As for the book, the book. Who did you? Ahmet-3.SG.NOM come-3.SG.NOM.POSS. Ahmet came.
Fatma’y bir ogrenci arad bugun. As for Fatma, it was a STUDENT who called her today. As for the book, the book, the book. As for the house, I was going to help me. Fatma-3.SG.NOM it-3.SG.NOM a-ADV student-3.SG.NOM her-3.SG.NOM today-3.SG.GEN. Fatma has a student yesterday.
Problemi çöz-mek zor-dur. To solve the problem is difficult. Ali read the book. Ali saw Ali. Solv-3.SG.ACC the is difficult-3.SG.COP.PRS. The fact that it is difficult for that.
Adam cocuga top verdi. The man gave the child a ball. As for the book, the book, the book, the book. They said that (of the girl), I was going to help me. Man-1.3.SG.NOM.POSS child-3.SG.DAT ball-3.SG.NOM. The man’s child is the ball.
Adamun evi. The man’s house. The child whose cove. The man read the book. Adamun hous-3.SG.ACC. A (specific) house.
Fatma bu kitabkimin yazd gn sanyor. Who does Fatma think wrote this book. As for Fatma knows that I one. Who did you? Fatma-3.SG.NOM this-DET kitabkimin ATATURK-3.SG.NOM wrote-3.SG.NOM it-3.SG.NOM. As for Fatma, it is possible that he wrote this.
Table 9: Qualitative Evaluation. All experiments except the starred Baseline2 use 865 lines of training data. Baseline2 uses additional 57608 lines of parallel data. Notation of the translation sequence follows from Table 2.

We choose Turkish for several reasons. The multitude of word forms in Turkish, a morphological-rich language has given rise to lots of challenges in the field of building language model, machine translation and speech recognition [Matthews et al., 2018, Botha and Blunsom, 2014]. Turkish, which is agglutinative, may also benefit from the use of morphology that pushes the limit of phrasal translation, and give rise to novel generation of multiple target terms [Clifton and Sarkar, 2011, El-Kahlout and Oflazer, 2006, Bisazza and Federico, 2009]. Morphological analyzer also aids language models, and translation of morphologically rich languages [Matthews et al., 2018]. Indeed, it is more helpful for an agglutinative language to use morphology than other languages [Clifton and Sarkar, 2011].

3.2 Gloss Normalization

Since our goal is to make meaningful translation of low-resource language in extreme data-scarce scenarios, we need to exploit our training data to its fullest by ensuring quality. Different researchers may use different symbols for linguistic tags, and this creates challenges for training NMT. ODIN dataset, for example, is not uniform in its use of linguistic glosses [Lewis and Xia, 2010, Xia et al., 2014].

We work with two experienced linguists over an extensive time to normalize the glosses in the Turkish ODIN dataset. In Table 5, we show an example of the normalized glosses. Gloss normalization is important and we use the Leipzig glossing convention 222https://www.eva.mpg.de/lingua/resources/glossing-rules.php. Should any gloss be not present in Leipzig, we use Unimorph 333http://unimorph.org/. This step is crucial in creating good data for training NMT system.

Not only we normalize the glosses in the Turkish ODIN dataset, we also normalize outputs from the Morphological Analyzer [Oflazer, 1994]. In Table 6, we show an example of the normalized outputs from the morphological analyzer in the second line of each example. Again, we use a combination of the Leipzig and Unimorph convention just like the normalization of the Turkish ODIN dataset.

4 Methodology

For clarity, we use 1, 2, 3, 4 to denote each line of the translation sequence as shown in Table 2.

4.1 12: Generation of Interlinear Gloss with Source-Lemma

We use a morphological analyzer [Oflazer, 1994] to generate morphological tags and root for each word token in the source text. After the normalization step that we introduced above, we produce the interlinear gloss with source-lemma. In Table 6, every second line from each example shows the interlinear gloss with source Turkish tokens.

4.2 23: Generation of Interlinear Gloss with Target-Lemma

We use a dictionary produced by aligning parallel corpus to construct interlinear gloss with target English tokens from source Turkish tokens [Dyer et al., 2013]. Of course, we can choose to use 865 lines of training data to produce dictionary, however, in order to produce higher quality dictionary, we use an additional parallel corpus with 57608 lines. This assumption is the additional resource we would like to remove in the future. Using the dictionary, we generate interlinear gloss with target English tokens as shown by every third line in Table 6.

4.3 34: Training NMT system for Gloss-to-Target Translation

Even though ODIN dataset only has 1081 lines of Turkish data, it has 57608 lines of multilingual glossed data. We exploit the full ODIN dataset for training the translation from interlinear gloss with target English tokens to target English text, i.e., from the third line to the fourth line in Table 1.

We normalize the ODIN dataset through working with two experienced linguists. Using interlinear gloss with target English tokens as source, and English translation as target, we train an attentional NMT model with BPE. We use a minibatch size of 64, dropout rate of 0.3, 4 RNN layers of size 1000, a word vector size of 600, number of epochs of 13, a learning rate of 0.8 that decays at the rate of 0.7 if the validation score is not improving or it is past epoch 9. Our code is built on OpenNMT

[Klein et al., 2017] and we evaluate our models using BLEU scores [Papineni et al., 2002], and qualitative evaluation.

5 Results

We use 1, 2, 3, 4 to denote each representation of the translation sequence as shown in Table 2. We use a few acronyms to describe our experiments. In Table 7 and Table 9, Baseline1 denotes attentional NMT system that trains solely on the 865 lines of Turkish source text and English target text without using any information of the interlinear gloss; Baseline2 denotes attentional NMT system that trains on an additional 57608 Turkish-English parallel data; Generation denotes the interlinear gloss with target-lemma generation step (123); Full denotes our translation system from interlinear gloss generation to the final translation from gloss to fluent English (1234).

Quantitatively, we evaluate using both 4-gram and 1-gram BLEU, where 1-gram BLEU helps us to evaluate our translation without considering the correct grammatical order. Our gloss-to-target achieves a high 4-gram BLEU score of 35.07. Since we are translating using 865 lines of training data, our BLEU scores are very low. Full improves from Baseline1, but is still lower than Baseline2; however, we will show qualitatively that Full improves much from both baselines.

In extreme low-resource scenarios like ours, qualitative evaluation is more important than quantitative evaluation. In Table 9, we see clearly that the two baseline NMT systems are hallucinating. The baseline translations have nothing in common with the source sentence, except fluency. However, our model Full produce meaningful translation that preserves the content of the source sentence to a certain extent while also achieves fluency through a good gloss-to-target NMT system. Even though the content is only partially preserved, the essential concept is translated while fluency is preserved.

6 Conclusion

We use a morphological analyzer and dictionary produced by aligning parallel data to automatically generate gloss, and use gloss-to-target attentional NMT with BPE to produce the final target translation. We produce meaningful translation that preserves the content of the source sentence to a certain extent while also achieves fluency by the gloss-to-target NMT system. Our translation system is helpful to produce translation with sufficiently good content and fluency in extremely low-resource settings.

In the future, we would benefit from a more detailed gloss normalization process. This normalization takes time, and effort; should this process be done well, we will be able to reach a much higher score than what we have shown earlier.

We also would like to explore disambiguation in morphological analyzer [Shen et al., 2016]. We would also like to explore morphological analyzer with more detailed morpheme segmentation.

Factored translation which translates a composition of various annotations including word, lemma, part-of-speech, morphology, word class into the target factor representation has achieved lots of success in SMT [Koehn and Hoang, 2007, Yeniterzi and Oflazer, 2010]. In the era of NMT, morphological information and grammatical decomposition that are produced by morphological analyzer are employed in the NMT system [García-Martínez et al., 2016, Burlot et al., 2017, Hokamp, 2017]. It is interesting to see how factored translation would improve our performance of the gloss-to-target translation.

Machine polyglotism that trains machines to translate from multiple source languages to multiple target languages is an exciting direction of multilingual NMT [Johnson et al., 2017, Ha et al., 2016, Firat et al., 2016, Zoph and Knight, 2016, Dong et al., 2015, Gillick et al., 2016, Al-Rfou et al., 2013, Tsvetkov et al., 2016]. Building a single unified attention mechanism by adding the source and target language labels in training a universal model with BPE preprocessing is the state-of-the-art [Johnson et al., 2017, Ha et al., 2016]. Since each language in ODIN dataset contains very little data, we did not train gloss-to-target in a multilingual fashion, it would be interesting to explore ways to use very little data to train in multilingual NMT style. Again, this hinges on the data-hungry nature of NMT.

References

  • [Abaitua et al., 1992] Abaitua, J., Aduriz, I., Agirre, E., Alegria, I., Arregi, X., Artola, X., Arriola, J., de Ilarraza, A. D., Ezeiza, N., Gojenola, K., et al. (1992). Estudio comparativo de diferentes formalismos sintácticos para su aplicación al euskara. Technical report, Technical report, UPV/EHU/LSI, Donostia.
  • [Abeillé et al., 1990] Abeillé, A., Schabes, Y., and Joshi, A. K. (1990). Using lexicalized tags for machine translation. In Proceedings of the 13th conference on Computational linguistics-Volume 3, pages 1–6. Association for Computational Linguistics.
  • [Al-Rfou et al., 2013] Al-Rfou, R., Perozzi, B., and Skiena, S. (2013). Polyglot: Distributed word representations for multilingual nlp. In Proceedings of the 17th Conference on Computational Natural Language Learning, pages 183–192, Sofia, Bulgaria. Association for Computational Linguistics.
  • [Alexandrescu and Kirchhoff, 2006] Alexandrescu, A. and Kirchhoff, K. (2006). Factored neural language models. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, pages 1–4. Association for Computational Linguistics.
  • [Bisazza and Federico, 2009] Bisazza, A. and Federico, M. (2009). Morphological pre-processing for turkish to english statistical machine translation. In IWSLT, pages 129–135.
  • [Botha and Blunsom, 2014] Botha, J. and Blunsom, P. (2014). Compositional morphology for word representations and language modelling. In

    International Conference on Machine Learning

    , pages 1899–1907.
  • [Brown, 1999] Brown, R. D. (1999). Adding linguistic knowledge to a lexical example-based translation system. In Proceedings of the Eighth International Conference on Theoretical and Methodological Issues in Machine Translation (TMI-99), pages 22–32.
  • [Burlot et al., 2017] Burlot, F., Garcia-Martinez, M., Barrault, L., Bougares, F., and Yvon, F. (2017). Word representations in factored neural machine translation. In Conference on Machine Translation, volume 1, pages 43–55.
  • [Chaudhary et al., 2018] Chaudhary, A., Zhou, C., Levin, L., Neubig, G., Mortensen, D. R., and Carbonell, J. G. (2018). Adapting word embeddings to new languages with morphological and phonological subword representtions. arXiv preprint arXiv:1808.09500.
  • [Chung et al., 2016] Chung, J., Cho, K., and Bengio, Y. (2016). A character-level decoder without explicit segmentation for neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 1693–1703.
  • [Clifton and Sarkar, 2011] Clifton, A. and Sarkar, A. (2011). Combining morpheme-based machine translation with post-processing morpheme prediction. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pages 32–42. Association for Computational Linguistics.
  • [Cotterell and Schütze, 2015] Cotterell, R. and Schütze, H. (2015). Morphological word-embeddings. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1287–1292.
  • [Dalvi et al., 2017] Dalvi, F., Durrani, N., Sajjad, H., Belinkov, Y., and Vogel, S. (2017). Understanding and improving morphological learning in the neural machine translation decoder. In

    Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

    , pages 142–151.
  • [Dong et al., 2015] Dong, D., Wu, H., He, W., Yu, D., and Wang, H. (2015). Multi-task learning for multiple language translation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, pages 1723–1732.
  • [Dyer et al., 2013] Dyer, C., Chahuneau, V., and Smith, N. A. (2013). A simple, fast, and effective reparameterization of ibm model 2. In Proceedings of the 12th Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technologies, pages 644–648.
  • [Dyer et al., 2010] Dyer, C., Weese, J., Setiawan, H., Lopez, A., Ture, F., Eidelman, V., Ganitkevitch, J., Blunsom, P., and Resnik, P. (2010). cdec: A decoder, alignment, and learning framework for finite-state and context-free translation models. In Proceedings of the ACL 2010 System Demonstrations, pages 7–12. Association for Computational Linguistics.
  • [El-Kahlout and Oflazer, 2006] El-Kahlout, I. D. and Oflazer, K. (2006). Initial explorations in english to turkish statistical machine translation. In Proceedings of the Workshop on Statistical Machine Translation, pages 7–14. Association for Computational Linguistics.
  • [Firat et al., 2016] Firat, O., Cho, K., and Bengio, Y. (2016). Multi-way, multilingual neural machine translation with a shared attention mechanism. In Proceedings of the 15th Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technologies, pages 866–875.
  • [García-Martínez et al., 2016] García-Martínez, M., Barrault, L., and Bougares, F. (2016). Factored neural machine translation. arXiv preprint arXiv:1609.04621.
  • [Gillick et al., 2016] Gillick, D., Brunk, C., Vinyals, O., and Subramanya, A. (2016). Multilingual language processing from bytes. In Proceedings of the 15th Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technologies, pages 1296–1306.
  • [Goenaga, 1980] Goenaga, P. (1980). Gramatika bideetan. Erein.
  • [Goldwater and McClosky, 2005] Goldwater, S. and McClosky, D. (2005). Improving statistical mt through morphological analysis. In Proceedings of the conference on human language technology and empirical methods in natural language processing, pages 676–683. Association for Computational Linguistics.
  • [Ha et al., 2016] Ha, T.-L., Niehues, J., and Waibel, A. (2016). Toward multilingual neural machine translation with universal encoder and decoder. arXiv preprint arXiv:1611.04798.
  • [Habash and Sadat, 2006] Habash, N. and Sadat, F. (2006). Arabic preprocessing schemes for statistical machine translation. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, pages 49–52. Association for Computational Linguistics.
  • [Hajič, 2000] Hajič, J. (2000). Machine translation of very close languages. In Sixth Applied Natural Language Processing Conference.
  • [Hokamp, 2017] Hokamp, C. (2017). Ensembling factored neural machine translation models for automatic post-editing and quality estimation. arXiv preprint arXiv:1706.05083.
  • [Johnson et al., 2017] Johnson, M., Schuster, M., Le, Q. V., Krikun, M., Wu, Y., Chen, Z., Thorat, N., Viégas, F., Wattenberg, M., Corrado, G., et al. (2017). Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics, 5:339–351.
  • [Klein et al., 2017] Klein, G., Kim, Y., Deng, Y., Senellart, J., and Rush, A. (2017).

    Opennmt: Open-source toolkit for neural machine translation.

    Proceedings of the 55th annual meeting of the Association for Computational Linguistics, System Demonstrations, pages 67–72.
  • [Koehn and Hoang, 2007] Koehn, P. and Hoang, H. (2007). Factored translation models. In Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL).
  • [Koehn and Knowles, 2017] Koehn, P. and Knowles, R. (2017). Six challenges for neural machine translation. arXiv preprint arXiv:1706.03872.
  • [Lee, 2004] Lee, Y.-S. (2004). Morphological analysis for statistical machine translation. In Proceedings of HLT-NAACL 2004: Short Papers, pages 57–60. Association for Computational Linguistics.
  • [Lewis and Xia, 2010] Lewis, W. D. and Xia, F. (2010). Developing odin: A multilingual repository of annotated language data for hundreds of the world’s languages. Literary and Linguistic Computing, 25(3):303–319.
  • [Ling et al., 2015] Ling, W., Trancoso, I., Dyer, C., and Black, A. W. (2015). Character-based neural machine translation. arXiv preprint arXiv:1511.04586.
  • [Luong et al., 2013] Luong, T., Socher, R., and Manning, C. (2013).

    Better word representations with recursive neural networks for morphology.

    In Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pages 104–113.
  • [Matthews et al., 2018] Matthews, A., Neubig, G., and Dyer, C. (2018). Using morphological knowledge in open-vocabulary neural language models. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1435–1445.
  • [Moeller and Hulden, 2018] Moeller, S. and Hulden, M. (2018). Automatic glossing in a low-resource setting for language documentation. In Proceedings of the Workshop on Computational Modeling of Polysynthetic Languages, pages 84–93.
  • [Nordhoff et al., 2013] Nordhoff, S., Hammarström, H., Forkel, R., and Haspelmath, M. (2013). Glottolog 2.0.
  • [Oflazer, 1994] Oflazer, K. (1994). Two-level description of turkish morphology. Literary and linguistic computing, 9(2):137–148.
  • [Papineni et al., 2002] Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 311–318. Association for Computational Linguistics.
  • [Passban et al., 2018] Passban, P., Liu, Q., and Way, A. (2018). Improving character-based decoding using target-side morphological information for neural machine translation. arXiv preprint arXiv:1804.06506.
  • [Qiu et al., 2014] Qiu, S., Cui, Q., Bian, J., Gao, B., and Liu, T.-Y. (2014). Co-learning of word representations and morpheme representations. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 141–150.
  • [Renduchintala et al., 2019] Renduchintala, A., Shapiro, P., Duh, K., and Koehn, P. (2019). Character-aware decoder for translation into morphologically rich languages. In Proceedings of Machine Translation Summit XVII Volume 1: Research Track, pages 244–255.
  • [Samardzic et al., 2015] Samardzic, T., Schikowski, R., and Stoll, S. (2015). Automatic interlinear glossing as two-level sequence classification. In Proceedings of the 9th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH), pages 68–72.
  • [Sennrich and Haddow, 2016] Sennrich, R. and Haddow, B. (2016). Linguistic input features improve neural machine translation. arXiv preprint arXiv:1606.02892.
  • [Sennrich et al., 2016] Sennrich, R., Haddow, B., and Birch, A. (2016). Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 1715–1725.
  • [Sennrich and Zhang, 2019] Sennrich, R. and Zhang, B. (2019). Revisiting low-resource neural machine translation: A case study. arXiv preprint arXiv:1905.11901.
  • [Shen et al., 2016] Shen, Q., Clothiaux, D., Tagtow, E., Littell, P., and Dyer, C. (2016). The role of context in neural morphological disambiguation. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 181–191.
  • [Stroppa et al., 2006] Stroppa, N., Groves, D., Way, A., and Sarasola, K. (2006). Example-based machine translation of the basque language. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas, pages 232–241. The Association for Machine Translation in the Americas.
  • [Tiedemann, 2012] Tiedemann, J. (2012). Character-based pivot translation for under-resourced languages and domains. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 141–151. Association for Computational Linguistics.
  • [Tsvetkov et al., 2016] Tsvetkov, Y., Sitaram, S., Faruqui, M., Lample, G., Littell, P., Mortensen, D., Black, A. W., Levin, L., and Dyer, C. (2016). Polyglot neural language models: A case study in cross-lingual phonetic representation learning. In Proceedings of the 15th Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technologies, pages 1357–1366.
  • [Wu et al., 2016] Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., et al. (2016). Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.
  • [Xia et al., 2015] Xia, F., Goodman, M. W., Georgi, R., Slayden, G., and Lewis, W. D. (2015). Enriching, editing, and representing interlinear glossed text. In International Conference on Intelligent Text Processing and Computational Linguistics, pages 32–46. Springer.
  • [Xia et al., 2014] Xia, F., Lewis, W. D., Goodman, M. W., Crowgey, J., and Bender, E. M. (2014). Enriching odin. In LREC, pages 3151–3157.
  • [Yeniterzi and Oflazer, 2010] Yeniterzi, R. and Oflazer, K. (2010). Syntax-to-morphology mapping in factored phrase-based statistical machine translation from english to turkish. In Proceedings of the 48th annual meeting of the association for computational linguistics, pages 454–464. Association for Computational Linguistics.
  • [Zoph and Knight, 2016] Zoph, B. and Knight, K. (2016). Multi-source neural translation. In Proceedings of the 15th Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technologies, pages 30–34.