FST Morphological Analyser and Generator for Mapudüngun

09/19/2021
by   Andrés Chandía, et al.
Universitat Pompeu Fabra
0

Following the Mapuche grammar by Smeets, this article describes the main morphophonological aspects of Mapudüngun, explaining what triggers them and the contexts where they arise. We present a computational approach producing a finite state morphological analyser (and generator) capable of classifying and appropriately tagging all the components (roots and suffixes) that interact in a Mapuche word form. The bulk of the article focuses on presenting details about the morphology of Mapudüngun verb and its formalisation using FOMA. A system evaluation process and its results are also present in this article.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

09/08/2021

A Formal Description of Sorani Kurdish Morphology

Sorani Kurdish, also known as Central Kurdish, has a complex morphology,...
10/06/2015

Analyzer and generator for Pali

This work describes a system that performs morphological analysis and ge...
12/02/2019

Morphological Tagging and Lemmatization of Albanian: A Manually Annotated Corpus and Neural Models

In this paper, we present the first publicly available part-of-speech an...
02/29/2020

A Finite State Transducer Based Morphological Analyzer of Maithili Language

Morphological analyzers are the essential milestones for many linguistic...
03/25/2015

Morphological Analyzer and Generator for Russian and Ukrainian Languages

pymorphy2 is a morphological analyzer and generator for Russian and Ukra...
04/17/2021

Minimal Supervision for Morphological Inflection

Neural models for the various flavours of morphological inflection tasks...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

This article explain the morphophonological aspects of Mapudüngun which have to be taken into account when developing a rule based morphological analyser, which is our purpose.

The implementation we have chosen is by means of Finite State Transducers (FST). The language that feed these machines (the FST) is made out of complex regular expressions which need to encode from the language of interest, Mapudüngun in this case, the way the different elements (roots and suffixes) interact, the conditions they have to fulfil in doing so, and the changes this very same interaction produces among the elements.

So, this work tells how we have translated Mapudüngun’s morphophonological behaviour into regular expressions, which have to be as accurate as possible in order to obtain optimal results by means of an FST analyser111The system user interface is is available on
http://www.chandia.net/dungupeyem and the code is on
http://www.chandia.net/dungupeyem/repositorio
.

Along this section (2) we present the Mapuche language, its typology and morphology are the central topics, which include the suffixes of this language and how the verbs are formed, from the stem to the final form, along with some exceptions and particularities.

Section 4, p. 4, is centred in the computational technology we use and the specific tool to achieve our goal. We begin by explaining what computational morphology implies, and how it can be handled by Finite State Transducers (FST). FOMA is the FST compiling program we use to generate our tools, so we do a review of it. And finally, we refer to the first steps in the incorporation of the Mapudüngun elements into the computational flow of work.

Section 5, p. 5 presents the embodiment of the processes and phenomena explained in the section describing Mapudüngun into the code the compiler is capable of interpret and process. The techniques applied in order to encode the different parts of Mapudüngun are also described, and the rules that manage their interaction and changes derived from it; and how the different aspects of Mapudüngun morphology are treated from the computational point of view. We will explain in detail the stems typology, and the strategies to manage them; the interaction of suffixes after the stem, verb paradigms and verb nominalisation. The mobility of some suffixes and the special behaviour of some verb roots are also presented in this section.

In section 6, p. 6, we do account of some Mapudüngun realisations that come from other sources and dialects, different from Smeets’ work which is the base of our development. We explain how and why we have incorporate them into our system.

A brief count on the FST analyser comes in section 7, p. 7

, where we display data on the amount of lexicon, suffixes, and rules, besides the compilation values.

Section 8, p. 8 brings up the subject of assessment. We explain how our machine has been evaluated, the parameters taken into account. We introduce some other machines to compare to, and also another similar system; all for the sake of an accurate comparison and subsequent evaluation of the outcome the system produces. In this section there is also a comprehensive analysis of the forms that were not recognised by the system, and the reasons for that.

The final section 9, p. 9 (before the conclusions, p. 9), is just a brief account of the web interfaces we have developed to access our tools. We simply show the elements found on these interfaces and how to operate them.

2 Mapudüngun, the Mapuche language

Along this section we present Mapudüngun, its location, typology and a basic description of its conformation, which includes the phonemes it presents and their graphic representation. Morphology comes next, where we present some information about Mapudüngun suffixes and how the verbs are formed. Finally we introduce the stems formation and its particularities. Mainly, we present the morphophonological aspect of some specific phenomena of Mapudüngun. A complete description of the language is found in the book we base our analyser upon: "A Grammar of Mapuche" by Ineke Smeets RefB:21 .

Mapudüngun is an isolated language222The relationship between Mapudüngun and others aboriginal American languages has not yet been established. spoken actively by approximately 144,000 people in Chile [Zúñiga 2006] RefB:24 , as well as by some 8,400 people in Argentina [Instituto Nacional de Estadísticas y Censos 2005], virtually all of whom are bilingual in Spanish [Sadowsky, S. 2013: 87-96] RefB:18 .

The word Mapudüngun is a compound of two nominal roots, mapu meaning ’soil, land, earth, ground, country’; düngu meaning ’language, matter, subject, tongue (as in mother tongue)’. Mapudüngun is usually translated as ’the language of the land, the speaking of earth’ or ’lengua/habla de la tierra’ in Spanish.

2.1 Mapudüngun: polysynthetic and agglutinative

Polysynthesis means that there are many elements or morphemes in (verb) forms, which is typical of the Native American languages, Mapudüngun among them.

In agglutinative languages a series of concepts are distributed in several morphemes [Zúñiga 2006: 199] RefB:24 . Agglutination is when morphemes are inside words, not altering their own form and being identifiable in different contexts, as in Basque, Turkish, Quechua and Mapudüngun. The original meaning of the stem is modified by the affixes attached to it.

In Mapudüngun, verbs may contain many morphemes, e.g., di-tu-l-me-tu-a-fi-ñ ’I will reach it, I will find it’. This word has eight significant elements.

"The fact that the language is polysynthetic means that it is rich, in terms of the ability to create new words " [Zúñiga 2006: 202] RefB:24 .

In Mapudüngun, nominal forms are simple, while verbal ones are extremely complex, presenting a good number of derivative and inflectional morphemes. They can realise as univalent, only one actant as subject; bivalent or mono-transitive verbs, two actants, subject and object; and trivalent or bi-transitive verbs, three actants, subject, primary object and secondary object with the semantic roles of agent (A), human receiver (R), and inanimate patient or theme (T), respectively.

"In verbal phrases there are morphemes that behave as verbal derivatives (verbaliser, causative, transitivizer, benefactive/malefactive, modal, locative and directional, manner; and affixes that are part of non-finite verb forms), also obligatory verbal inflectional suffixes (time, mode, person and number) and facultative (negation, aspect, passive, reflexive/reciprocal/medial and mediative)" [Fernández-Garay & Malvestitti 2002: 36-37] RefB:07 .

2.2 The Mapuche alphabet

Smeets states that 19 consonants [table 1] and 6 vowels [table 2] form the Mapuche phonemic system.

Labial
Interdental
alveolar
Palatal Retroflex Velar
Plosives p t ch tr k
Fricatives f      |    s sh
Glides w y r q
Nasals m n ñ ng
Laterals l ll
Table 1: Consonants [Smeets, I. 2008: 23] RefB:21
Front Central Back
High i ü u
Mid e o
Low a
Table 2: Vowels [Smeets, I. 2008: 25] RefB:21

Smeets also adds loaned sounds from Spanish b, d, g (as in Spanish ’bote’, ’duende’ and ’guerra’ respectively) and the voiceless fricative x (as in Spanish "jefe"). She does not include the interdental series present in some Mapudüngun variants, usually represented as l’, n’, t’; because the dialect she studied did not present it, and her data, "in agreement with Croese’s findings, do not call for a distinction between the interdental" and the alveolar series l, n, t. "A tentative conclusion might be that the distinction is dying out" [Smeets, I. 2008: 31] RefB:21 .

3 Mapudüngun morphology

In this section, and along this article, we mainly refer to the verb morphology because it is the more complex part of Mapudüngun, and virtually all morphophonological changes are found inside the verb form. Other categories of words are mentioned because they occur as verbal stems together with a verbalising suffix. In other cases they are used to bring up a special case, or because a specific suffix also interacts with nouns, adjectives, adverbs, etc. in a non verbal form. This does not mean that our work only covers the Mapuche verb; but for making this article not too extensive we do not expand the topic to all parts of speech (categories).

3.1 Verb suffixes

We begin by exposing suffixes because they occur in almost all verb stems, the only stems where a suffix do not occur are those formed by a single verbal root. But in Mapudüngun, adjective, adverb, noun and other roots need a suffix to become verbal stems.

To simplify, we call verb stem to any form, simple or complex, to which suffixes are bonded in order to form a complete verb predication. A simple stem is made of one root only, a complex stem may imply two roots, a root with some suffixes, or a combination of them all. It may be argued that these are lemmas instead, but as we say, to keep it simple, we call all these forms verb stems. More details are found in sections 3.1.8 3.1.8 Verb derivational nominalisation, p. 3.1.8; 3.2 3.2 Verb stems, p. 3.2 and 5.2.1 5.2.1 Stems codification, p. 5.2.1.

After the stem, in a Mapuche verb form, suffixes "occur in a more or less fixed position relative to one another" [Smeets, I. 2008: 17] RefB:21 . But also there are quite a few incidental factors that shape the Mapuche complex verb form.

Verb suffixes are located on one of the thirty-six slots assigned to the verb form on the basis of their relative position and function. Slot 1 occupies word final position and slot 36 is next to the root. The order of these slots determine the morphotactics of the verb forms. Some slots host a few mutually excluding affixes, some of them may present variation in their form and some others may be zero markers. Some suffixes may exclude others from different slots for grammatical or semantic reasons.

Even though it is not rare to find up to seven or eight suffixes following the root (see E1), verbs usually contain between four and six suffixes in spontaneous speech.

In the following lines we try to graphically represent three different Mapuche verb forms. S represents the stem. Every dot represents a slot; the leftmost dot is slot 36, the rightmost dot is slot 1. X is a suffix occurrence in a slot. Ø is also a suffix occurrence but with a null morpheme, which is a morpheme that has no phonemic or graphic realisation.

Minimal intransitive verb 2nd person plural
S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X X X . . .

Minimal transitive verb 2nd → 1st persons plural
S . . . . . . . . . . . X . . . . . . . . . . . . . . . . . . X Ø X . . .

Representation of example E1
S . . . . X . . . . . X . . . . . . . X X X . . . . . . . . . X . X Ø X X

Example 1

Verb with 10 suffixes [Smeets, I. 2008: 443 (76)] RefB:21
nü-nie-ñma-r-pu-tu-e-y-iñ-mu
’they continued to take it away from us’
Root: nü- -TV.nü_tomar

  1. Suffixes:

  2. -nie- Progressive persistent (+PRPS.nie32)

  3. -ñma- Indirect object (+IO.ñma26)

  4. -r- Interruptive (+ITR.r18)

  5. -pu- Locative (+LOC.pu17)

  6. -tu- Iterative/restorative (+RE.tu16)

  7. -e- Internal direct object (+IDO.e6)

  8. -y- Indicative (+IND.y4)

  9. -Ø- First person (+1.Ø3)

  10. -iñ- Plural (+PL.iñ2)

  11. -mu Dative subject(+DS3A.mew1)

In example E1 the root and suffixes are displayed as items to better identify them, but the analyser output is visualised linearly, as follows:
-TV.nü_tomar+PRPS.nie32+IO.ñma26+ITR.r18+LOC.pu17 +RE.tu16+IDO.e6+IND.y4+1.Ø3+PL.iñ2+DS3A.mew1
Analysis tags express, starting from the left, the abbreviated name of the part of speech (PoS) or suffix.
PoS are introduced by a - (minus) sign, suffixes, by a + (plus) sign. -TV is ’transitive verb’, -IV is ’intransitive verb’, -N is ’noun’, etc.
Concerning suffixes, +PRPS is ’progressive persistent’, +IDO is ’internal direct object’, +PL is ’plural’, etc. A complete list of tags meaning is found in annex 11.1 11.1 Tags meaning, p. 11.1.
After the abbreviated name of the PoS or suffix, separated by a dot, it is the root or suffix standard form. Roots are followed by their meaning in Spanish with an underscore _ as separator: .nü_tomar in E1. For the already mentioned suffixes, the forms are .nie, .e and .iñ, respectively.
The number at the end of each tag indicates the slot (the position in the verb chain) where the verb suffix is located.
For instance:
-TV.nü_tomar: "the transitive verb root which means ’tomar’ in Spanish (’take’)"
+PRPS.nie32: "the progressive persistent suffix, which form is nie, is located in slot 32".
+IDO.e6: "the internal direct object suffix, which form is e, is located in slot 6, etc.
These items will be the result from the analyser we have built, and similar structure and information will be shown in all the examples.

Slots 1 to 15 hold inflectional suffixes in fixed positions. Slots 16 to 27 hold derivational suffixes, some of which are mobile. Slots 28 to 36 hold derivational suffixes in fixed positions, except for the rather mobile suffix ‑uw-, which usually fits in slot 31 and marks reflexivity/reciprocity. Mobile suffixes are assigned to their most usual position. "A difference in order of the suffixes does not always result in a semantic difference" [Smeets, I. 2008: 177] RefB:21 (see sections 3.1.5, p 3.1.5 and 5.3.4, p. 5.3.4).

3.1.1 Verbalisers (slot 36)

Nouns, adjectives, adverbs and numerals (roots) "can be changed into verbs by means of suffixation" [Smeets, I. 2008: 304] RefB:21 . "There are six verbalising suffixes. They immediately follow the root and fill slot 36" [Smeets, I. 2008: 121] RefB:21 .

  1. Suffix -Ø- indicates the verbalisation of a noun, adjective, numeral and a number of adverb roots.

  2. Suffix -l- verbalises noun, adverb, numeral roots and the interrogative pronoun tunte- ’how much’.

  3. Suffix -nge- can verbalise noun, adjective, numeral roots and the interrogative element chum- ’how’. A verb formed with -nge- is intransitive.

  4. Suffix -ntu- verbalises adjective roots.

  5. Suffix -tu- verbalises noun roots.

  6. Suffix -ye- verbalises noun roots.

Example 2

mapu-che ’person of the land (Mapuche person)’
-NN.mapu_tierra-NN.che_persona
mapu-che-nge-n ’I am a person of the land (a Mapuche)’
-NN.mapu_tierra-NN.che_persona
+VRB.nge36-IV+IND1SG.n3

3.1.2 Stem formative (slot 36)

Reduplication is another resource in Mapudüngun, and reduplicated roots are also used to form verbs, but for doing so, they are obligatorily followed by a verbalising suffix, even when it is a reduplicated verb root, in this case Smeets calls these suffixes "stem formative in reduplicated roots (SFR)" and they are also assigned in slot 36. There are four stem formative:

  1. Suffix -Ø- occurs when the reduplicated root is an onomatopoeia or a verb. The resulting verb is intransitive.

  2. Suffix -nge- is added to reduplicated verb roots, the resulting verb is intransitive.

  3. Suffix -tu- is added to reduplicated noun or verb roots. The resulting verb of a reduplicated verb root has the same valence as the single form.

  4. Suffix -ye- is added to reduplicated verb roots, the resulting verb is transitive.

Example 3

[Smeets, I. 2008: 481 (33)] RefB:21
ñiwa-ñiwa-tu-fu-n ’I always did my best’
-IV.ñiwa_esforzar-RVBR+SFR.tu36+IPD.fu8
+IND1SG.n3

3.1.3 Derivational suffixes (slots 16 to 35)

From slot 16 to 27 the suffixes mostly act as semantic modifiers. From slot 28 to 35, they have an aspectual or valency function. Suffixes most commonly used are:

  • Causatives -l- (e.r.333e.r. stands for "examples references". In the referred section there is a list of the examples that contain the suffix being mentioned. The entire list is under annex 11.2.6 11.2.6 Examples by suffixes, p. 11.2.6. 11.2.6) and -m- (e.r. 11.2.6), slot 34. These suffixes make the event denoted by the stem to be actually applied or happen, in this sense, they operate as transitivizers also.

  • Factitive -ka- (e.r. 11.2.6) and trasitivizer -tu- (e.r. 11.2.6), slot 33. It indicates that the agent causes the event denoted by the verb to take place, often it also adds intensive value.

  • Reflexive/reciprocal -w- (e.r. 11.2.6), slot 31. It indicates reflexivity when combined with a singular subject. The reflexive morpheme -(u)w- indicates reflexivity or reciprocity when it combines with a dual or plural subject.

  • Stative -le- (e.r. 11.2.6), slot 28. It denotes a state which may or may not involve agentivity on the part of the subject. With a few verbs, it may denote either an ongoing event or the resulting state. It may be used to indicate a quality or characteristic that is not permanent or intrinsic.

  • Beneficiary -el- (e.r. 11.2.6), slot 27. It makes the (animate) patient become the beneficiary of the event.

  • Passive -nge- (e.r. 11.2.6), slot 23. It indicates that a participant, a 3rd person with the role of agent, is not found in the situation described by the sentence, but outside the speech act.

  • 1st person agent -w- (e.r. 11.2.6), slot 23. It indicates a non declared participant to be determined by the context. Which is a first person non-singular, the agent, and implicitly includes the listener who is the patient.

  • Thither -me- (e.r. 11.2.6), slot 20. It indicates that the denoted situation involves motion away from the speaker or another orientation point, with a connotation of temporariness.

  • Persistence -we- (e.r. 11.2.6), slot 19. It indicates a situation which persists after a previous event has taken place.

  • Hither -pa- (e.r. 11.2.6), slot 17. It indicates that the denoted situation either involves a movement towards the speaker or takes place at a location near the speaker. It may indicate a development towards the present.

  • Locative -pu- (e.r. 11.2.6), slot 17. It indicates that the event takes place away from the speaker. It does not imply motion and indicates a permanent situation.

3.1.4 Inflectional suffixes (slots 5 to 15)

Among these suffixes are those that indicate aspect, tense, negation and truth value:

  • Pluperfect -wye-

    , slot 15. Indicates that the event takes place before the past or future orientation moment (see following example and E

    5).

    Example 4

    [Smeets, I. 2008: 69 (62)] RefB:21
    tripa-wye-y ’he had left’
    -IV.tripa_salir+PLPF.wye15+IND.y4+3.Ø3

  • Constant feature -ke- (e.r. 11.2.6), slot 14. Indicates a constant or characteristic feature of the subject.

  • Proximity -pe- (e.r. 11.2.6

    ), slot 13. It seems to indicate an event or a feature in the recent past, a strong probability and doubt.

  • Reportative -rke-, slot 12. It indicates that the situation has not been directly witnessed; the speaker has been informed by others, has heard rumours or has deduced it (see following example).

    Example 5

    [Smeets, I. 2008: 254 (1)] RefB:21
    füta-nge-wye-rke-y ’she had been married, they say’
    -NN.füta_marido+VRB.nge36-IV+PLPF.wye15
    +REP.rke12+IND.y4+3.Ø3

  • Affirmative -lle-, slot 11. It adds emphasis (E59).

  • Non-realised situation -a- (e.r. 11.2.6), slot 9. It denotes a non-actual fact. The situation will take place after the orientation moment.

  • Impeditive -fu- (e.r. 11.2.6), slot 8. It denotes that the event does not concludes as expected or that it can not be completed.

  • Pluperfect -mu-(e.r. 11.2.6), slot 7. It indicates that an event is realised before an orientation moment in the past. It occurs in complementary distribution with the pluperfect -wye-, slot 15.

  • Constant feature -ye-, slot 5. As suffix -ke-, slot 14, it also denotes a characteristic or constant feature, and they appear in complementary distribution (E162).

3.1.5 Suffix mobility

Smeets identify suffixes from slots 28 to 36 as fixed suffixes, and from slots 16 to 27 as mobile, later ones appear in non common positions respect to other suffixes. A detailed list of the mobile suffixes with their usual position (slot) follows:

  • Repetitive/Restorative -tu- (e.r. 11.2.6), slot 16. It indicates that a situation is repeated or restored.

  • Hither -pa- (e.r. 11.2.6), slot 17. (Explained in 3.1.3).

  • Persistence -we- (e.r. 11.2.6), slot 19. (Explained in 3.1.3).

  • Thither -me- (e.r. 11.2.6), slot 20. (Explained in 3.1.3).

  • Immediate -fem-, slot 21. It denotes immediate action (see following example).

    Example 6

    [Smeets, I. 2008: 271 (20)] RefB:21
    ye-nge-fem-üy ’it was brought immediately’
    -TV.ye_traer+PASS.nge23+IMM.fem21
    +IND.y4+3.Ø3

  • Sudden -rume-, slot 21. (E162). It denotes sudden action.

  • Play -kantu-, slot 22 (E200). It denotes an action performed in jest, for fun, not in earnest, or just to pretend to be doing.

  • Simulative -faluw-, slot 22. It indicates simulation, not real intention to do something (see following example).

    Example 7

    [Smeets, I. 2008: 265 (9)] RefB:21
    illku-le-faluw-ün ’I pretended to be angry’
    --IV.illku_enojar+ST.le28+SIM.faluw22
    +IND1SG.n3

  • Passive -nge- (e.r. 11.2.6), slot 23. (Explained in 3.1.3).

  • Pluraliser -ye-, slot 24 (E91). It is especially used with intransitive verbs which take a 3rd person subject. With a 1st or 2nd person plural subject, it indicates a numerous subject. With transitive verbs, it indicates that numerous patients of the event.

  • Force -fal-, slot 25 (E155). It indicates either that there is a necessity or obligation for the subject to perform the action, or that the subject orders someone else to perform the action.

  • Beneficiary -el- (e.r. 11.2.6), slot 27. (Explained in 3.1.3).

  • Stative -le- (e.r. 11.2.6), slot 28. (Explained in 3.1.3).

  • Reflexive/Reciprocal -w- (e.r. 11.2.6), slot 31. (Explained in 3.1.3).

  • Transitivizer -tu- (e.r. 11.2.6), slot 33. It may be added to intransitive and transitive verbs, and it adds an object. With intransitive verbs, the form has one object. With transitive verbs, the form has two objects.

Mobility does not imply a semantic change, and as more suffixes a verb presents less displacement occurs. See the following examples:

Example 8

[Smeets, I. 2008: 270 (19)] RefB:21
ngilla-l-me-mu-y-iñ ’you went to buy for us’
‑TV.ngilla_comprar+BEN.el27+TH.me20+2A.mu23
+IND.y4+1.Ø3+PL.iñ2

Example 9

[Smeets, I. 2008: 263 (11)] RefB:21
i-me-we-ke-la-y ’he no longer always eats there’
‑TV.i_comer+TH.me20+PS.we19+CF.ke14+NEG.la10
+IND.y4+3.Ø3

Example 10

[Smeets, I. 2008: 421 (62)] RefB:21
pütu-yekü-me-tu-y-ng-ün ’they drank all the time’
‑TV.püto_beber+ITR.yekü18+TH.me20+RE.tu16
+IND.y4+3.ng3+PL.ün2

In the examples above, the thither suffix -me- presents three different positions respect to the other suffixes. In E8 it is close to slot 27, displaced beyond slot 23. In E9 it occurs in its usual position, just before the suffix -we-, slot 19. Finally, in E10, it appears between suffixes of slots 18 and 16, to the right of its usual position.

3.1.6 Verb paradigms

In the previous section we have skipped suffixes from slots 10 and 6, they take part in the transitive verb paradigm, we include them here. Suffixes of slot 23 are also included in this paradigm together with those of mood, person, number and dative subject of slots 4, 3, 2 and 1 respectively.

Negation, positioned in slot 10, may actually be part of transitive and intransitive forms. There are three negation morphemes, one per mood, reason to show them in the verb paradigms.

The simplest verb form is intransitive, less suffixes than in transitive forms are mandatory: mood (slot 4), person (slot 3) and number (slot 2). See examples below:

Example 11

küpa-y-m-i ’you (sg) came’
‑IV.küpa_venir+IND.y4+2.m3+SG.i2

Example 12

küpa-la-y-m-u ’you two did not come’
‑IV.küpa_venir+NEG.la10+IND.y4+2.m3+DL.u2

Example 13

küpa-no-l-m-ün ’if you (pl) do not come’
‑IV.küpa_venir+NEG.no10+CND.l4+2.m3+PL.ün2

The imperative mood have forms for 1st person singular; 2nd person singular, dual and plural; and for 3rd person (undefined number). Indicative forms of 1st person dual and plural may be used adhortatively. Negation suffix for imperative is -ki- (slot 10), which always co-occur with the conditional marker -l- (slot 4). Negation of the adhortative forms, which are indicative, is accomplished by the -ki-l- combination of imperative negation and conditional mood mark when the intention is imperative (adhortative). See examples below (the complete conjugation of the intransitive verb küpa- ’to come’ is in annex 11.3, table 13):

Example 14

küpa-m-u ’come, you both!’
‑IV.küpa_venir+IMP.Ø4+2.m3+DL.u2

Example 15

küpa-ki-l-chi ’I better not come’
‑IV.küpa_venir+NEG.ki10+CNI444Even though the form is the same, we have labelled it +CND ’conditional’ and +CNI ’conditional marker in imperative forms, to better distinguish them’..l4+IMP1SG.chi3

Example 16

küpa-y-u ind: ’we both came’ imp: ’let we both come’
‑IV.küpa_venir+IND.y4+1.Ø3+DL.u2

Example 17

küpa-ki-l-y-u ’let we both not come’
‑IV.küpa_venir+NEG.ki10+CNI.l4+1.y3+DL.u2

The transitive paradigm demands more suffixes to reflect the relations between agent, patient and object. Interacting also the suffixes -w- 1st person agent, and -mu- 2nd person agent, slot 23; -e- internal direct object and -fi- external direct object, slot 6; and -Ø- dative subject, 1st or 2nd person agent, and -mew- -ew- dative subject, 3rd person agent (no number), slot 1. A complete explanation of the transitive paradigm is in chapter 26 "Slots" of Smeets 2008 RefB:21 . See examples below (the complete conjugation of the transitive verb pi- ’to say (to tell)’ is in annex 11.4, table 14 and the negative imperative forms in annex 11.5, table 15):

Example 18

pi-e-y-u ’I told you’
‑TV.pi+IDO.e6+IND.y4+1.Ø3+DL.u2+DS12A.Ø1

Example 19

pi-mu-l-i ’if you (non-sg) tell me’
‑TV.pi+2A.mu23+CND.l4+1.i3+SG.Ø2

Example 20

pi-fi-m-u ’tell him, you both’
‑TV.pi+EDO.fi6+IMP.Ø4+2.m3+DL.u2

Example 21

pi-ki-fi-l-y-iñ ’let us (pl) not tell him’
‑TV.pi+NEG.ki10+EDO.fi6+CNI.l4+1.y3+PL.iñ2

3.1.7 Verb inflectional nominalisation

In a Mapuche sentence, subordinates are derived from verbs, nominalised by inflectional nominalisers. These suffixes share position in slot 4 with mood markers, therefore, a verb form is either finite or nominalised. Finite forms take mood, person and number. Nominalised forms can not take those suffixes, taking instead one of the inflectional nominalisers.

Besides as subordinates of verbs, nominalised verbs may also act "as subject, direct object, instrumental object or complement noun phrase, indicating an event as such, a participant, an instrument, time, place, reason, purpose or background event" [Smeets, I. 2008: 188] RefB:21 ; as noun modifiers, and as predicates in nominal sentences.

"Some nominalised forms can be used as a finite verb form. The subject of a subordinate is usually indicated by a possessive pronoun, which immediately precedes the subordinate. However, when a subordinate is used as a temporal or causal clause, or as a finite verb form, the subject is indicated by a personal pronoun" [Smeets, I. 2008: 189] RefB:21 . There are seven inflectional nominalisers:

  • Agentive verbal noun -t-
    This suffix may denote an event as such; an instrument or location, and the patient or agent of an event.

    Example 22

    [Smeets, I. 2008: 215 (186)] RefB:21
    tüfa ñi pi-e-t-ew
    -DP.tüfa_este -SP.ñi_mi_su
    -TV.pi_decir+IDO.e6+AVN.t4+DS3A.ew1
    ’this is what he told me’ lit: ’this his told me’

  • Completive subjective verbal noun -wma-
    This suffix indicates the subject of a completed event.

    Example 23

    [Smeets, I. 2008: 400 (24)] RefB:21
    füta-nge-wma-rke
    ‑NN.füta_marido+VRB.nge36‑IV+CSVN.wma4
    +REP.rke
    ’she has been married, some say’

  • Instrumental verbal noun -m
    This suffix may indicate an instrument, a location, or an event as such. In combination with -a- non-realised action (slot 9), it may indicates purpose. With -ye- constant feature (slot 5), it forms a temporal clause.

    Example 24

    [Smeets, I. 2008: 206 (137)] RefB:21
    iñchiñ ta-yiñ lleg-mu-m
    ‑NN.-PP.iñchiñ_nosotros
    -AP.ta_el-SP.yiñ_nuestro-s
    -IV.lleg_crecer+PLPF.mu7+IVN.m4
    ’where we (pl) have grown up’ lit: ’we the our have grown up place’

  • Objective verbal noun -el
    This suffix expresses a passive participle, indicating the patient of the event. It can also be used to indicate an event as such; and rarely it is also used as an instrumental or locative.

    Example 25

    [Smeets, I. 2008: 76 (16)] RefB:21
    kuyfi pichi-ka-el
    -AV.kuyfi_antes
    -AJ.pichi_pequeño+VRB.Ø36+CONT.ka16+OVN.el4
    ’long time ago when I was still young’ lit: ’before, in the still little’

  • Plain verbal noun -n
    This suffix indicates an event as such, without time mark. It can convert the form into an adjective denoting an attribute or quality of the modified noun. It can also form a noun denoting a person or thing involved in the event referred to by the verb. It is usually translated as an infinitive: küdaw ’the work’, küdaw-ün ’to work’.

    Example 26

    [Smeets, I. 2008: 192 (51)] RefB:21
    pütrem-tu-n küme-la-y
    -NN.pütrem_tabaco+VRB.tu36+PVN.n4
    -AJ.küme_bueno+VRB.Ø36+NEG.la10+IND.y4+3.Ø3
    ’smoking is not good’ lit: ’tobaccoing good not it is’

  • Subjective verbal noun -lu
    This suffix denotes the subject of an event. It may also be used as an active participle, and form a temporal or causal clause.

    Example 27

    [Smeets, I. 2008: 218 (203)] RefB:21
    pichi che kim-nu-lu
    -AJ.pichi_pequeño -NN.che_persona
    -TV.kim_saber+NEG.no10+SVN.lu4
    ’a child that does not know’ lit: ’little person not knower’

  • Transitive verbal noun -fiel
    This suffix may be used as an infinitive, passive participle, locative or instrumental.

    Example 28

    [Smeets, I. 2008: 237 (16)] RefB:21
    iñche müle-y mi pe-a-fiel
    -PP.iñche_yo -IV.müle_estar+IND.y4+3.Ø3
    -SP.mi_tuyo -TV.pe_ver+NRLD.a9+TVN.fiel4
    ’I have to see you (sg)’ lit: ’I am in your will be seen’

3.1.8 Verb derivational nominalisation

Some non-verbal suffixes can turn a verb into an adjective or a noun; the stem may be formed by a unique root, a verbal compound, a verbalised root, a verbalised compound or a reduplicated root; or even by a complex stem, a root followed by some suffixes, mainly from slots 35, 34 or 33.

  • -fal +ADJDO indicates that the event denoted by the verb can actually be done.

    Example 29

    [Smeets, I. 2008: 312 (12)] RefB:21
    pepi-l-fal ’feasible, practicable’
    -TV.pepi_poder-hacer+CA.l34+ADJDO.fal

  • -fe +NOMAG denotes a characteristic agent.

    Example 30

    [Smeets, I. 2008: 311 (1)] RefB:21
    kofke-tu-fe ’bread eater’
    -NN.kofke_pan+VRB.tu36+NOMAG.fe

  • -nten +ADJQE indicates that the event denoted by the verb may be realised quickly and/or easily.

    Example 31

    [Smeets, I. 2008: 312 (14)] RefB:21
    afü-nten ’it gets quickly cooked’
    -IV.afü_cocinar+ADJQE.nten

  • -we +NOMPI denotes a characteristic place or instrument.

    Example 32

    [Smeets, I. 2008: 312 (9)] RefB:21
    püra-püra-we ’stairs’
    -IV.püra_subir-RVBR+SFR.Ø36-IV+NOMPI.we

3.1.9 Non-verbal suffixes

In the previous section there were already presented four suffixes that can turn verbs into adjectives or nouns, these resulting forms may, in turn, be complex verb stems, i.e., a verb converted into an adjective or a noun may be used as a verb stem, it may be "re-verbalised".

There are other suffixes that act upon non-verbal forms; the final form, i.e., "non-verb + suffix", in its turn can also be a complex verb stem (see 3.2 3.2 Verb stems, p. 3.2). Some of these suffixes change the class (category) of the form they are attached to, and some others do not.

  • Class-changing suffixes (CC)

  • -chi +ADJ changes a noun or nominalised verb into an adjective.

    Example 33

    [Smeets, I. 2008: 114 (25)] RefB:21
    lef-chi che ’runner’ lit:’running person’
    -IV.lef_correr+SVN.Ø4+ADJ.chi
    -NN.che_persona

  • -tu +ADV changes a noun or nominalised verb into an adverb.

    Example 34

    [Smeets, I. 2008: 114 (b)] RefB:21
    amu-n-tu ’going’, ’on my way there’
    -IV.amu_ir+PVN.n4+ADV.tu

  • Non class-changing suffixes (NCC)

  • -ke +DISTR is affixed to adjectives, adverbs and numerals. It indicates a whole consisting of several component parts, each of which has the feature expressed by the form it accompanies.

    Example 35

    [Smeets, I. 2008: 112 (17)] RefB:21
    küla-ke ’a threesome’
    -NU.küla_tres+DISTR.ke

  • -em +EX is affixed to a noun of which indicates that is dead or no longer in function or existence.

    Example 36

    [Smeets, I. 2008: 110 (6)] RefB:21
    fey-tüfa ñi küdaw-yem ’this was my former job’
    -DP.fey_que-DP.tüfa_este -SP.ñi_mi_su
    -NN.küdaw_trabajo+EX.em

  • -ntu +GR it refers to a group as a whole or a place which is characterised by the presence of many items referred to by the noun.

    Example 37

    küra-ntu ’scree’
    -NN.küra_piedra+GR.ntu

  • -rke +REP indicates that the situation or thing expressed by the form it accompanies has not been witnessed by the speaker himself. The speaker has been informed by others, he has heard rumours or he has deduced a conclusion. It may express surprise after the sudden realisation of something.

    Example 38

    [Smeets, I. 2008: 110 (8)] RefB:21
    trewa-rke! ’a dog!’, ’what a big dog!’, ’it must have been a dog’ (at wondering about who ate the meat that disappeared)’
    -NN.trewa_perro+REP.rke

  • -we +TEMP indicates a period subsequent to an orientation moment.

    Example 39

    kechu-we antü ’in five days’
    -NU.kechu_cinco+TEMP.we -NN.antü_sol_día

  • -wen +REL refers to the people relation indicated by the noun it accompanies.

    Example 40

    kompañ-wen iñchiu ’we are partners’ lit: ’we both are partners of one another’
    -NN.kompañ_compañero+REL.wen
    -PP.iñchiu_nosotros-dos

3.1.10 Instrumental object suffix -mew

This suffix may never be part of a complex stem but it may be added to nominalised verbs (E43), nouns and pronouns. It indicates instrument, place, time, cause and is used in comparative and partitive constructions. It may also refer to the circumstances under which an event takes place. See next examples:

Example 41

[Smeets, I. 2008: 62 (5)] RefB:21
anel-tu-fi-ñ kiñe kuchillo-mew ’I threatened him with a knife’
-TV.anel_amenazar+TR.tu33+EDO.fi6+IND1SG.n3
-NU.kiñe_uno -NN.kuchillu_cuchillo+INST.mew

Example 42

[Smeets, I. 2008: 62 (9)] RefB:21
uma-pu-n ta-ñi peñi-mu ’I stayed at my brother’s’
-IV.uma_pernoctar+LOC.pu17+IND1SG.n3
-AP.ta_el-SP.ñi_mi_su
-NN.peñi_hermano+INST.mew

Example 43

[Smeets, I. 2008: 62 (7)] RefB:21
are-tu-n-mew monge-li-y ’he lives on borrowing’
-TV.are_prestar+TR.tu33+PVN.n4+INST.mew
-IV.monge_vivir+ST.le28+IND.y4+3.Ø3

3.2 Verb stems

We have classified different types of stems depending on the way they are composed. In this respect we do not strictly follow Smeets. Verbs stems are completed by the verbalising suffix (+VRB) when there is no verb root present, or a stem formative (+SFR) when there is a reduplicated root.

The following list shows the different types of stems from the simplest to the most complex ones:

  • Simple stems

  • Verb root

    Example 44

    [Smeets, I. 2008: 64 (29)] RefB:21
    amu-y-ng-ün ’they (pl) went’
    -IV.amu_ir+IND.y4+3.ng3+PL.ün2

  • Verb compound (verb root + verb root)

    Example 45

    [Smeets, I. 2008: 420 (54)] RefB:21
    amu-mayna-tu-e-n-ew ’he made me stumble’
    -IV.amu_ir-TV.mayna_atar-los-pies-CR.TV
    +TR.tu33+IDO.e6+IND1SG.n3+DS3A.ew1

  • Verb compound (verb root + non-verb root / non-verb root + verb root)

    Example 46

    [Smeets, I. 2008: 401 (38)] RefB:21
    ad-kintu-a-l ’to have a look’
    -NN.ad_forma-TV.kintu_mirar+NRLD.a9+OVN.el4

  • Non-verb root +VRB (verbaliser suffix)

    Example 47

    [Smeets, I. 2008: 68 (56)] RefB:21
    küla antü-nge-y ’it was three days ago’
    -NU.küla_tres
    -NN.antü_sol_día+VRB.nge36-IV+IND.y4+3.Ø3

  • Reduplicated root +SFR (stem formative suffix)

    Example 48

    [Smeets, I. 2008: 112 (22)] RefB:21
    aku-aku-nge-y ’continually arrive (e.g. letters)’
    -IV.aku_llegar-RVBR+SFR.nge36-IV+IND.y4+3.Ø3

  • Non-verb compound (non-verb + non-verb) +VRB

    Example 49

    [Smeets, I. 2008: 123 (11)] RefB:21
    trewa-ad-nge-y ’he has dog face’
    -NN.trewa_perro-NN.ad_cara+VRB.nge36-IV
    +IND.y4+3.Ø3

  • Complex single root stems

  • Numeral + non class-changing suffix +VRB

    Example 50

    [Smeets, I. 2008: 400 (30)] RefB:21
    kiñe-ke-l-fi-y ’he gave one to each of them’
    -NU.kiñe_uno+DISTR.ke+VRB.l36
    +EDO.fi6+IND.y4+3.Ø3

  • Adjective + non class-changing suffix or inflectional nominaliser +VRB

    Example 51

    [Smeets, I. 2008: 473 (40)] RefB:21
    pichi-n-tu-ki-y ’it was for little time’
    -AJ.pichi_pequeño+PVN.n4+VRB.tu36
    +CF.ke14+IND.y4+3.Ø3

  • Question +VRB + inflectional nominaliser +VRB

    Example 52

    [Smeets, I. 2008: 243 (52)] RefB:21
    chum-nge-n-tu-y-m-i-? ’what do you (sg) think (about it)?’
    -QC.chum_cómo+VRB.nge36-IV+PVN.n4+VRB.tu36
    +IND.y4+2.m3+SG.i2

  • Adjective + inflectional nominaliser +VRB + derivational nominaliser +VRB

    Example 53

    [Smeets, I. 2008: 375 (25)] RefB:21
    awka-n-tu-fe-nge-y ’he is playful’
    -AJ.awka_salvaje+PVN.n4+VRB.tu36+NOMAG.fe
    +VRB.nge36-IV+IND.y4+3.Ø3

  • Adverb + non class-changing suffix or inflectional nominaliser + optional class-changing suffix +VRB

    Example 54

    [Smeets, I. 2008: 383 (18)] RefB:21
    alü-n-tu-y-ng-ün ’they were more’
    -AV.alü_mucho+PVN.n4+VRB.tu36
    +IND.y4+3.ng3+PL.ün2

  • Noun + non class-changing suffix + inflectional nominaliser + class-changing suffix +VRB

    Example 55

    [Smeets, I. 2008: 411 (53)] RefB:21
    tukuyu-ke-chi-le-wü-y ’it looks like (long) fabric’
    -NN.tukuyu_tela+DISTR.ke+SVN.Ø4+ADJ.chi
    +VRB.Ø36+ST.le28+REF.w31+IND.y4+3.Ø3

  • Noun + optional transitivizer or factitive + optional reflexive + optional non-realised + class-changing suffix or non class-changing suffix or inflectional nominaliser +VRB

    Example 56

    [Smeets, I. 2008: 90 (34)] RefB:21
    as-ka-w-ün-nge-y ’he is capricious’
    -NN.ad_costumbre+FAC.ka33+REF.w31+PVN.n4
    +VRB.nge36-IV+IND.y4+3.Ø3

  • Verb + optional causative + optional transitivizer or factitive + optional reflexive + optional stative + optional hither + optional non-realised + inflectional or derivational nominaliser +VRB

    Example 57

    [Smeets, I. 2008: 225 (243)] RefB:21
    llüka-nten-nge-wma ’I was someone who easily gets afraid’
    -IV.llüka_temer+ADJQE.nten+VRB.nge36-IV
    +CSVN.wma4

  • Reduplicated verb root + causative
    (verb +CA + verb +CA

    Example 58

    [Smeets, I. 2008: 412 (68)] RefB:21
    ap-üm-ap-üm-ye-nge-y ’we have gradually been finished off’
    -IV.af_acabar+CA.m34-RVBR+SFR.ye36-TV
    +PASS.nge23+IND.y4+3.Ø3

In the last example, the form corresponds to two roots and two suffixes, but it is actually a reduplicated stem of one root with a suffix attached, reason to list it as a single root complex stem.

Complex compound stems are the most complex ones, they are not listed here but on the section Complex compound stems., p. 5.2.1, where we expose them together with the encoding expressions and rules that manage them, these stems are formed by two roots and at least one suffix apart from the verbaliser.

3.3 Special verbs

Some roots (verbs and non-verbs), due to semantic or grammatical reasons, must co-occur with certain suffixes when forming complete verb forms. There are some exceptions and/or conditions needed for these roots to behave this way. Smeets writes about the conditions, we have found the exceptions.

3.3.1 Question roots

Interrogative roots may be verbalised, but not all forms take the same verbalisers (see D25).

chem- ’what, which’ may be verbalised by suffixes -Ø- and -ye-, see following examples:

Example 59

[Smeets, I. 2008: 434 (86)] RefB:21
chem-lle-a-l-e ’whatever they would do’
-QC.chem_qué_cuál+VRB.Ø36+AFF.lle11+NRLD.a9
+CND.l4+3.e3

Example 60

[Smeets, I. 2008: 128 (39)] RefB:21
chem-ye-w-üy-m-u ’how are you both related?’
-QC.chem_qué_cuál+VRB.ye36+REF.w31
+IND.y4+2.m3+DL.u2

chuchi- tuchi- ’which’ is verbalised by the null suffix -Ø-, see following example:

Example 61

[Smeets, I. 2008: 405 (7)] RefB:21
chuchi-künu-al ’how they should carry on’
-QC.chuchi_cuál+VRB.Ø36+PFPS.künu32
+NRLD.a9+OVN.el4

chum- ’how’ is verbalised by -Ø- or -nge-, see following examples:

Example 62

[Smeets, I. 2008: 416 (15)] RefB:21
chum-la-e-n-ew ’he did not do anything to me’
-QC.chum_cómo+VRB.Ø36+NEG.la10+IDO.e6
+IND1SG.n3+DS3A.ew1

Example 63

[Smeets, I. 2008: 225 (246)] RefB:21
chum-nge-wma ’how it was’
-QC.chum_cómo+VRB.nge36-IV+CSVN.wma4

tunte- chunte- may be verbalised by suffixes -Ø-555Smeets does not specifically mention that tunte may be verbalised by -Ø-, but we have deduced it from the following text: "The interrogative tunten chunten is a quantity noun, which contains the plain verbal noun marker -n +PVN.n4" [Smeets, I. 2008: 105] RefB:21 . To be able to be bound to the +PVN.n4 suffix, the interrogative pronoun must be verbalised; since there is no realised form between the root tunte- and morpheme -n, the only possible verbaliser is -Ø-., -l- and -ntu-, see following examples:

Example 64

[Smeets, I. 2008: 114 (c)] RefB:21
tunte-n-tu666According to Smeets, the form tuntentu has two more possible analyses [Smeets, I. 2008: 559 (tunte)] RefB:21 :
1) tunte-ntu- ’to stay’, ’to be for how long’
-QT.tunte_cuánto+VRB.ntu36
2) tunte-n-tu- ’to take how much’
-QT.tunte_cuánto+VRB.Ø36+PVN.n4+VRB.tu36
’for how long?’
-QT.tunte_cuánto+VRB.Ø36+PVN.n4+ADV.tu

Example 65

[Smeets, I. 2008: 128 (33)] RefB:21
tunte-l-e-y-mew ’how much did he give to you?’
-QT.tunte_cuánto+VRB.l36+IDO.e6
+IND.y4+3.Ø3+DS3A.mew1

Example 66

[Smeets, I. 2008: 399 (19)] RefB:21
tunte-ntu-la-y ’it did not last long’
-QT.tunte_cuánto+VRB.ntu36+NEG.la10
+IND.y4+3.Ø3

3.3.2 Deictic verbs

"Deictic verbs are derived from the roots fa- ’to become like this’ and fe- ’to become like that’. These roots do not occur without a derivational suffix. A verb which is derived from the root fa- denotes a situation which is contextually determined. A verb which is derived from the root fe- denotes an instance which is situationally determined" [Smeets, I. 2008: 321] RefB:21 .

Smeets says that deictic verbs do not occur without a derivational suffix, but there is a case in which fe- directly takes a inflectional suffix without any derivational one:

Example 67

[Smeets, I. 2008: 246 (4)] RefB:21
fi-y llemay777"The particle llemay conveys certainty on the part of the speaker" [Smeets, I. 2008: 334] RefB:21 . It consists of the affirmative suffix -lle- +AFF.lle11 and the particle may which is used as a rhetoric question or a question expecting an affirmative answer. ’that is certainly so’
-IV.fe_ser-eso+IND.y4+3.Ø3
-PT.llemay_seguro_ciertamente

Compare it with the following examples (which follow the rule):

Example 68

[Smeets, I. 2008: 462 (61)] RefB:21
kom fe-le-y ’they all are that way’
-AV.kom_todo
-IV.fe_ser-eso+ST.le28+IND.y4+3.Ø3

Example 69

[Smeets, I. 2008: 321 (4)] RefB:21
fente888-nte- is an unproductive derivative suffix that can yield fe-nte- ’adv. that much’ and fa-nte- ’adv. this much’. As it is unproductive, the adverbial form is collected as such in the lexicon. -n-üy ’it is as much/big as…’
AV.fente_tanto+PVN.n4+VRB.Ø36+IND.y4+3.Ø3

Example 70

[Smeets, I. 2008: 322 (7)] RefB:21
ka fe-le-pa-tu-n ’I was in the same situation as before’
-AJ.ka_otro
-IV.fe_ser-eso+ST.le28+HH.pa17+RE.tu16
+IND1SG.n3

Example 71

[Smeets, I. 2008: 321 (1)] RefB:21
fa-le-wma iñche ’this is how I was’
-IV.fa_ser-esto+ST.le28+CSVN.wma4
-PP.iñche_yo

Example 72

[Smeets, I. 2008: 322 (10)] RefB:21
fa-m-nge-chi küdaw-ke-n ’I work this way’
-IV.fa_ser-esto+CA.m34+PASS.nge23
+SVN.Ø4+ADJ.chi
-IV.küdaw_trabajar+CF.ke14+IND1SG.n3

3.3.3 Defective verbs

Roots of posture (of the body) verbs obligatorily occur together with the perfect persistence marker +PFPS.künu32, the progressive persistence marker +PRPS.nie32 or the stative morpheme +ST.le28 when they are the only root of a stem, i.e., when they are not in compounds. Otherwise, when these verbs occur as part of compounds they are not compelled to use any of the three suffixes. Verbs identified by Smeets are:

  • kopüd- ’to lie down on one’s belly’

  • kudu- ’to lie down’

  • külü- ’to lean on one’s elbow’

  • llikosh- ’to sit down on one’s heels’, ’to squat’

  • payla- ’to lie down on one’s back’

  • potri- ’to lean over’

  • potrong- ’to bow forward’ (the head)

  • potrü- ’to bow forward’ (the body)

  • rekül- ’to lean’

  • üñif- ’to lie down on the floor’

  • wira- ’to sit down with spread legs’

  • [Smeets, I. 2008: 235] RefB:21

Example 73

[Smeets, I. 2008: 296 (21)] RefB:21
üñif-künu-a-fi-ñ ’I will spread it out’
-TV.ünif_extender+PFPS.künu32
+NRLD.a9+EDO.fi6+IND1SG.n3

Example 74

[Smeets, I. 2008: 261 (2)] RefB:21
kudu-le-me-we-la-n ’I am not going to lay down there any more’
-IV.kudu_yacer+ST.le28+TH.me20+PS.we19
+NEG.la10+IND1SG.n3

For the combination of these verbs with +PRPS.nie32 we have found no examples in Smeets or elsewhere. Smeets provides two meanings for kopüd- when taking this suffix, but no example:

  1. kopüd-nie-

  2. to hold someone on his belly [Smeets, I. 2008: 235] RefB:21

  3. to hold in a face downward position [Smeets, I. 2008: 519] RefB:21

As we are following Smeets’ description of Mapudüngun, we have implemented what she states about these verbs, but, besides the issue presented about +PRPS.nie32, there are others that do not support her statement about these verbs. Probably she have worked on not published data.

For the verb kopüd- ’to lie down on one’s belly’ there are no examples but the meaning it takes with +PFPS.künu32, +PRPS.nie32 and +ST.le28. We have found examples in other texts, but some of them show a different behaviour to what Smeets explains, i.e., they do not present the "obligatory" suffixes.

Example 75

[Febrés, A.]999All the examples that come from Augusta, F., Febrés, A. and Valdivia, L. has been consulted on-line on the CORLEXIM site RefB:03 .
kopu-n ’to be face down lying on the floor or head down, or half bent the body.’
-IV.kopüd_yacer-boca-abajo+PVN.n4

Example 76

[Valdivia, L.] RefB:03
kopu-w-ün ’to be facing the floor’
-IV.kopüd_yacer-boca-abajo+PS.we19+PVN.n4

For verbs kudu- ’to lie down’ and külü- ’to lean on one’s elbow’, Smeets herself presents examples contradicting its obligatory co-occurrence with the treated suffixes. Other texts also contradict her (Augusta, F. RefB:03 , Febrés RefB:03 , Zúñiga RefB:24 , Mösbach RefB:14 ).

Example 77

[Smeets, I. 2008: 349 (17)] RefB:21
kudu-pu-a-el ’to go to bed’
-IV.kudu_yacer+LOC.pu17+NRLD.a9+OVN.el4

Example 78

[Smeets, I. 2008: 244 (4)] RefB:21
kudu-nu-l-m-i ’if you do not go to bed’
-IV.kudu_yacer+NEG.no10+CND.l4+2.m3+SG.i2

Example 79

[Smeets, I. 2008: 526 (lüf-)] RefB:21
külü-a-y antü ’the Sun will lay down’
-IV.külü_apoyar+NRLD.a9+IND.y4+3.Ø3
-NN.antü_sol

For llikosh- Smeets gives the meaning it takes with two of the suffixes but no example. The example we show comes from Augusta, F. RefB:03 .

  • llikosh-küle- ’to squat, to crouch’

  • llikosh-künu-w- ’to squat down, to crouch down’

  • [Smeets, I. 2008: 528] RefB:21

Example 80

[Augusta, F.] RefB:03
llikod-küle-n ’to be snuggled’
-IV.llikosh_acurrucar+ST.le28+PVN.n4

For payla- there are also no examples in Smeets but the definition in combination with two of the three suffixes. Examples coming from other sources contradict Smeets’ observations.

  • payla-le- ’to be lying on one’s back’

  • payla-künu-w- ’to lie down on one’s back’

  • [Smeets, I. 2008: 543] RefB:21

Example 81

[Augusta, Febrés & Valdivia] RefB:03 . [Mösbach, E. 1936] RefB:14
payl’a-n ’to lie on one’s back’
-IV.payla_yacer-de-espalda+PVN.n4

Example 82

[Febrés, A.] RefB:03
paylla-l-ün101010This example seems to support Smeets’ statements, if the phonological changes are that stative -le- drops its vocalic element and the indicative 1st person adds an epenthetic schwa in presence of the previous consonant. Normally, -le- keeps the vocal and the following suffix, beginning in consonant, remains the same. ’to put or leave something on its own back, or in peace’
-IV.payla_yacer-de-espalda+ST.le28+IND1SG.n3

For potri- ’to lean’, Smeets gives two examples and the meaning acquired with one of the three suffixes, and in a compound. We did not find examples of potri- in other texts. We believe that potri- and potrü- ’to bow forward’ are two variants of the same verb, even though Smeets defines them differently. The interchangeability between ü and i is not rare. She provides two different translations for an example with potrü-, the second one in alignment with the same example using potri- instead. compare:

Example 83

[Smeets, I. 2008: 549 (potri-)] RefB:21
potri-tripa-n ti wangku-mu ’I toppled out of the chair’
-IV.potri_inclinar-IV.tripa_salir-CR.IV
+IND1SG.n3
-AP.ti_el -NN.wangku_silla+INST.mew

Example 84

potrü-tripa-n ti wangku-mu

’I fell backward from the chair’ [Smeets, I. 2008: 62 (11)] RefB:21

’I toppled from the chair’ [Smeets, I. 2008: 563 (tripa-)] RefB:21
-IV.potrü_inclinar-IV.tripa_salir-CR.IV
+IND1SG.n3
-AP.ti_el -NN.wangku_silla+INST.mew

  • potri-le- ’to be leaning (over)’

  • potri-tripa- ’to topple’

  • [Smeets, I. 2008: 543] RefB:21

Example 85

[Smeets, I. 2008: 296 (20)] RefB:21
potri-künu-w-ün ’I bent forward’
-IV.potrü_inclinar+PFPS.künu32
+REF.w31+IND1SG.n3

We did find examples of potrü- in other texts, and as in previous cases, they show no concordance with Smeets statements:

Example 86

[Augusta, F.] RefB:03
potrü-w-ün ’to buck’
-IV.potrü_inclinar+PS.we19+PVN.n4

For the verb rekül- ’to lean’ there are some examples in other texts contradicting Smeets.

Example 87

[Augusta, F.] RefB:03
rekül-tu-n ’to caddle up, to lie down’
-IV.rekül_apoyar+TR.tu33+PVN.n4
rekül-tu-we
’back (of something)’
-IV.rekül_apoyar+TR.tu33+NOMPI.we

Example 88

[Febrés, A.] RefB:03
rekül-ün 1.’to cuddle up, to stand or stand on something’ 2.’to lean’
-IV.rekül_apoyar+PVN.n4

For the verb üñif- ünif- ’to lie down on the floor’ we did not find examples other than the one given by Smeets, see example E73, p. 73.

For the verb wira- ’to sit down with spread legs’, Smeets presents no examples but definitions in combination with two of the three suffixes:

  • wira-künu-w- ’to adopt a position with the legs apart’

  • wira-le- ’to sit with the legs apart’

  • [Smeets, I. 2008: 576] RefB:21

Examples from other texts:

Example 89

[Augusta, F.] RefB:03
wira-le-n ’to be with legs open’
-IV.wira_sentar-con-las-piernas-abiertas
+ST.le28+PVN.n4

Example 90

[Mösbach, E. 1936] RefB:14
wira-l-küle-chi ’the open ones’
-IV.wira_sentar-con-las-piernas-abiertas
+CA.l34+ST.le28+SVN.Ø4+ADJ.chi

Example 91

[Mösbach, E. 1936] RefB:14
wira111111It looks like Pascual Koña uses wira- with the sense of "two things that spread appart", not only the legs. Mösbach collected Koña’s memoires in the book cited in reference RefB:14 . -l-künu-ye-nge-ke-fu-y ’they remain open’
-IV.wira_sentar-con-las-piernas-abiertas
+CA.l34+PFPS.künu32+PLR.ye24+PASS.nge23
+CF.ke14+IPD.fu8+IND.y4+3.Ø3

3.3.4 Verbs that need a directional

There is another group of verbs that require a directional to be expressed if they are not part of a compound or take a transitivizer or causative suffix. Directional suffixes are -me- thither (slot 20), -pa- hither (slot 17) and -pu- locative (slot 17). The affected verbs are:

  • antü- ’to spend a day’

  • fül- ’to come near’

  • küyen- ’to spend a month’

  • llekü- ’to approach’

  • nge- ’to have been’ (it only requires -me- thither or -pa- hither)

  • pülle- ’to come near’

  • ru- ’to pass, to go through’ (it only requires -me- thither or -pa- hither)

  • tripantu- ’to spend a year’

  • And the following compounds:

  • kim-kon- ’(know-enter-) to find out, to understand’ requires -pa- hither

  • kim-püra- ’(know-go_up-) to realise’ requires -me- thither or -pa- hither

  • trem-tripa- ’(grow_up-go_out-) to become conscious of while growing up’ requires -pa- hither

  • [Smeets, I. 2008: 325, 326] RefB:21

For antü- ’to spend a day’ there are some examples refuting Smeets’ observations, i.e., showing the occurrence of the verb without the directionals nor the transitivizer or causative:

  • Augusta, F. RefB:03 antü-n, antü-le-iñ, antü-y, antü-ñma-le-n, antü-ñma-n

  • Febrés, A. RefB:03 antü-n, antü-ku-n

  • Mösbach, E. RefB:14 antü-y

  • Smeets, I. RefB:21 antü-le-y, antü-a-y, antü-y, antü-le-chi

  • Zúñiga, F. RefB:24 antü-y

  • Valdivia. L. RefB:03 antü-n, antü-n-ku-n

For fül- ’to come near’, it happens as with antü-:

  • Augusta, F. RefB:03 fül-küle-n, fül-ma-n, fül-ün

  • Mösbach, E. RefB:14 fül-a-n, fül-el, fül-la-e-y-ew, fül-küle-le-n, fül-ma-nge-fu-lu, fül-e-y

  • Smeets, I. RefB:21 fül-küle-n

For küyen- ’to spend a month’, Smeets gives no examples, and we have also found contradicting ones from other authors:

For llekü- ’to approach’, Smeets gives one example, and other authors have contrary examples.

Example 92

[Smeets, I. 2008: 503 (elu-)] RefB:21
llekü-pu-el ’to come near’
-IV.llekü_acercar+LOC.pu17+OVN.el4

  • Augusta, F. RefB:03 llekü-n, llekü-le-n, llekü-ñma, llekü-ñma-le-n, llekü-ñma-nie-n

  • Febrés, A. RefB:03 lleku-n, lleku-le-n

  • Mösbach, E. RefB:14 llekü-n, llekü-ñma-nie-lu

  • Valdivia. L. RefB:03 llekü-n, llekü-le-n

For nge- ’to have been’, Smeets remarks that an exception is when the negation marker -la- co-occurs. But we have found that the exception also applies with the negation marker -no-. The first two following forms found in Smeets are also realised without directional suffixes:

  • Exceptions: nge-n, nge-y

  • Negation -la-: nge-la-y, nge-la-n, nge-we-la-y, nge-ke-la-fu-y, nge-we-tu-la-y

  • Negation -no-: nge-nu-n, nge-ke-nu-lu, nge-nu-n-mu

For pülle- ’to come near’, there are contrary examples even in Smeets:

  • Augusta, F. RefB:03 pülle-le-n, pülle-künu-n, pülle-lu, pülle-le-ye-lu, pülle-le-chi, pülle-le-nu-chi, pülle-nie-gel-chi, pülle-ke-ñma-w-küle-y, pülle-ñma-w-küle-y-u

  • Mösbach, E. RefB:14 pülle-ñma-w-ke-chi

  • Smeets, I. RefB:21 pülle-le-y, pülle-le-lu

For ru- ’to pass, to go through’ the occurrence are as Smeets describes them, except for two examples found at Febrés, A. RefB:03 (ru-n, ru-a-n), which we have no way yet to confirm as right or wrong.

Example 93

[Augusta, F.] RefB:03
ru-l-pa-nütram-pe-lu ’interpreter, translator’
-IV.ru_pasar+CA.l34+HH.pa17
-NN.nütram_conversación+PX.pe13+SVN.lu4

For tripantu- ’to spend a year’ there are also examples from other authors not supporting Smeets’ findings:

  • Augusta, F. RefB:03 tripantu-le-n, tripantu-n, tripantu-y,
    tripantu-chi

  • Febrés, A. RefB:03 tripantu-n, tripantu-a-n, tripantu-y

  • Mösbach, E. RefB:14 tripantu-a-m, tripantu-n, tripantu-el

  • Valdivia, L. RefB:03 tripantu-n

Finally, for compounds kim-kon- ’to find out, to understand’, kim-püra- ’to realise’ and trem-tripa- ’to become conscious of while growing up’, we have found no other examples than Smeets’, who also gives a contradictory example: E94.

Example 94

[Smeets, I. 2008: 447 (26)] RefB:21
kim-kon-y-iñ ’we had become aware’
-TV.kim_saber-IV.kon_entrar-CR.IV
+IND.y4+1.Ø3+PL.iñ2

Example 95

[Smeets, I. 2008: 447 (24)] RefB:21
kim-kon-pa-n ’I understood, I realised’
-TV.kim_saber-IV.kon_entrar-CR.IV+HH.pa17
+IND1SG.n3

Example 96

[Smeets, I. 2008: 381 (1)] RefB:21
kim-püra-me-n ’I bcame aware, I came to appreciate’
-TV.kim_saber-IV.püra_subir-CR.IV+TH.me20
+IND1SG.n3

Example 97

[Smeets, I. 2008: 446 (11)] RefB:21
kim-püra-me-pa-n ’I realised’
-TV.kim_saber-IV.püra_subir-CR.IV+TH.me20
+HH.pa17+IND1SG.n3

Example 98

[Smeets, I. 2008: 262 (10)] RefB:21
kim-püra-me-pa-fi-ñ ’I have come to know him’
-TV.kim_saber-IV.püra_subir-CR.IV+TH.me20
+HH.pa17+EDO.fi6+IND1SG.n3

Example 99

[Smeets, I. 2008: 415 (4)] RefB:21
trem-tripa-pa-y ’they grew up knowing (about)’
-IV.trem_crecer-IV.tripa_salir-CR.IV
+IND.y4+3.Ø3

3.4 Morphophonology

As we have mentioned in the introduction, interaction among roots and suffixes creates different contexts in which the form of these elements may be affected. Most common changes correspond to epenthesis and elision, but there are also cases of phoneme alternation, some are obligatory and others, optional. We present all of these changes in section 5.1 5.1 Phonological changes into spelling, p. 5.1, where, at the same time, we explain how these variations have been encoded to be processed by the FST analyser.

4 The computational approach

In this section, after a basic introduction to computational morphology and Finite State Transducers (FST), we explain how morphophonologic phenomena of Mapudüngun have been encoded in order to process Mapuche words through FST analysis, and obtain a proper identification of the parts (roots and suffixes) forming these words.

We do a quick review on FOMA implementations. FOMA is the FST compiling program we use to generate the analyser (see section 4.4). And finally, in section 4.5, we enter into Mapudüngun encoding, starting by the alphabet and finishing by the lexicon: roots and suffixes.

4.1 Computational morphology

Computational morphology is the branch of computational linguistics concerned with word structure121212In this section we follow Gasser [2011: 55] RefB:08 .. Two kinds of processing are of interest. One is morphological analysis, by which a surface word form is analysed into a lexical representation, consisting of the word’s component morphemes or grammatical features. The other is morphological generation, by which a lexical representation is converted to a surface word form. Consider the Mapuche verb lelien ’you looked at me’. A basic morphological analysis would simply segment the word into the morphemes that make it up, as seen below:

Example 100

lelienleli‐e-n

A more abstract lexical representation output would indicate the lexical and grammatical significance of the morphemes. The word lelien could be represented at the lexical level as shown right below:

Example 101

leli-e-n ’you looked at me / look at me!’
‑TV.leli_mirar131313The analyser lexicon is collected with the Spanish translation, that’s why all the analyses presented in this article carry the root meaning in Spanish.+IDO141414Tags meaning are found in annex 11.1 11.1 Tags meaning..e6+IND1SG.n3+DS12A.Ø1

This chain of tags represents the root of the verb leli- meaning ’look at’, the internal direct object -e-, the portmanteau suffix indicative 1st person singular -n, and the null dative subject for 2nd or 1st agent persons151515We follow Smeets’ descriptions throughout this work, but it is important to know that in this issue there is discrepancy among authors of Mapudüngun descriptive grammars. Basically, what Smeets [2008] RefB:21 identifies as "agent-patient paradigm" is what Zúñiga [2006] RefB:24 calls "verbal inversion", and Salas [2006] RefB:19 "person focalization"..

Three types of information are required to perform morphological analysis: a lexicon, the morphotactics and the (alternation) rules.

A lexicon is composed of roots or stems, which combine with grammatical morphemes to yield surface word forms.

Morphotactics

refer to constraints on the order and class of the morphemes that make up a word within a particular category. For example, the morphotactics of Mapudüngun verbs specify the following minimal sequence of morphemes: mood (indicative, conditional or imperative, slot 4), subject (slot 3) and number (slot 2).

Alternation rules

are responsible for the variation of forms that morphemes take in the presence of other morphemes. For example, the portmanteau suffix of 1st person indicative mood takes two forms, one before vowels, another before consonants, where an epenthetic appears.

Together, knowledge of alternation rules, morphotactics, and the forms of roots or stems in the lexicon represent the morphology of a given language.

Morphological analysis may be efficiently handled by finite state transducers (FST). An FST is a network of states and transitions between them, and the analysis of a word is a path through this network. Each of the transitions along the path specifies a correspondence between input characters (or phones) and output characters. The transducer161616A transducer is a device or machine that converts energy from one form into another, e.g., a microphone is a transducer that converts the vibrations captured from the air into analogous electrical impulses. An FST converts a chain of symbols into another chain of symbols. converts sequences of input characters to sequences of output characters. One very useful property of FSTs is that they may be inverted. This means that the same transducer that implements analysis (surface to lexical representation) for a given rule, it can also implement generation (lexical to surface representation) through simple reversal of the input and output characters. Another useful property is composition: a sequence of FSTs, converting a surface representation into a lexical representation with various intermediate stages, it may be merged into a single FST which behaves the same as the original sequence of FSTs (see 4.1.2 Composition, p. 4.1.2).

4.1.1 Finite state method

A finite state transducer (FST) is a piece of software that operates as an enhanced finite state machine (FSM) which in its turn is capable of representing and operating over finite state networks (FSNs)171717In this section we follow the explanation given by Ríos [2015: 18-21] RefB:17 .

4.1.2 Finite state transducers (FSTs)

There is an important distinction between FSMs that are one-sided, and FSTs that have an upper and a lower side, or more generally, an input and an output level. Since an FST has two sides, it can not only decide if a given word is part of its regular language, but it will also return the corresponding output to the given input [Beesley & Karttunen 2003: 11] RefB:01 .

An FST accordingly implements a relation between two regular languages: an upper side and a lower side regular languages, and it literally "transduces" strings from one language into the other. In a non-deterministic FST it may produce more than one possible outputs for a given string.

See figure 1 for an example of an FST that contains the relation of two of the following four word forms with the Mapudüngun root miaw- ’to wander’, and their respective morphological analysis181818miaw-ün ’I wandered’, miaw-üy-m-i ’you wandered’, miaw-a-n ’I will wander’, miaw-a-y-m-i ’you will wander’.:

Figure 1: Finite state transducer for miaw- with Ind1Sg, present/past or future
Example 102

miawünmiaw+Ind1Sg3191919The number at the end of each suffix tag indicates its assigned slot. For a complete explanation on how to read suffixes tags see the note following E1, p. 3.1. (in figure 1)
miawüymimiaw+Ind4+2p3+Sg2
miawan
miaw+Nrld9+Ind1Sg3 (in figure 1)
miawaymimiaw+Nrld9+Ind4+2p3+Sg2

Note that the transducer contains an empty transition є:є which makes the NRLD suffix -a- optional. The transducer in figure 1 may be applied in both directions:

Given miawüymi ’you wandered’ as input, applied in "upward" direction, it produces:
‑IV.miaw_merodear+IND.y4+2.m3+SG.i2 as output. This is the procedure for morphological analysis.

Given ‑IV.miaw_merodear+IND1SG.n3 as input, applied in "downward" direction, it produces miawün ’I wandered’ as output. This is the procedure for generation.

Composition

is a "hard to handle" concept in finite state processing. However, here it suffices to affirm that a cascade of rules compiled into finite state transducers may be combined into a single equivalent FST via composition. See figure 2.

Figure 2: A cascade of rules compiled into finite state transducers may be combined into a single equivalent FST via composition. This mathematical possibility, shown by Johnson, may be performed in practice using a finite state software [Beesley & Karttunen 2003: 35] RefB:01

4.1.3 Two levels morphology

The upper language, also called the abstract level, the lower language also called surface level, and the relation they establish as part of an FST are well explained in Beesley & Karttunen [2003] RefB:01 . Related images (figures 3 and 4) are presented here just to illustrate a general idea about these three concepts: upper and lower languages, and the relation among them.

Table 3 shows that relations contain pairs of strings. For analysis, the lower language is used as input, and the upper language is produced as output [figures 3 and 4]:

Upper: the+Art+Def Upper: a+Art+Indef
Lower: the Lower: a
Table 3: Article/Determiner/Quantifier distinctions.
Figure 3: Analysing canto [Beesley & Karttunen 2003: 13] RefB:01
Figure 4: Another path in the Spanish morphological analyser [Beesley & Karttunen 2003: 13] RefB:01
Regular expressions as rules

In the development of this system we mainly used two types of operators "restriction and replacement" (=>, ->), complemented with composition
(.o.), context (||), and some others operators.

Restriction is one of the fundamental functions in two-level calculus:

Example 103

[a => c _ [r|t]];

E103 denotes the language of strings that have the property that the string "a" is immediately preceded by the string "c" and immediately followed by the string "r" or "t"; so, the final strings "cat" and "car" satisfy the condition, but strings such as "cab" or "can" do not.

Example 104

[y -> {ie} || _ [r|{st}]];

E104 denotes the relation in which the string "y" is transformed into "ie" (here a condition is introduced, in this case is the context represented by the twin pipes (||)) when followed by "r" or "st"; so, "ugly" becomes "uglier" or "ugliest", and "pretty" becomes "prettier" or
"prettiest"202020The actual rule to treat this behaviour is much more complex, it is presented in this simplistic way just as an example..

Composition is the concatenation of rules, for example, E104 could be decomposed into two concatenated rules, i.e. they together form a "composition" of rules:

Example 105

[y -> {ie} || _ r] .o.
[y -> {ie} || _ {st}];

Or it could be represented stating two different contexts for the change to occur:

Example 106

[y -> {ie} || _ r, _ {st}];

In general, regular expressions may be redundant because some of the operators could easily be defined in terms of others, which means that a string and its restrictions may be expressed in different ways by means of regular expressions.

For more detailed descriptions on regular expressions operators, please consult: FOMA’s Regular Expression Reference: https://code.google.com/archive/p/foma/wikis/
RegularExpressionReference.wiki

Another excellent reference is Beesley & Karttunen’s "Finite State Morphology" book RefB:01 .

4.2 The bases of the analyser

The main script of the analyser is the series of regular expressions (regexps) encoding the Mapudüngun grammar.
This is where the different parts forming the Mapuche language are declared: roots, suffixes, particles, etc., and the rules for them to interact in the way Mapudüngun accepts it.

We follow Smeets description of Mapudüngun: "A Grammar of Mapuche" RefB:21 to implement the analyser and generator. We begun treating only one variant or dialect of this language, which is known as central Mapudüngun. The analyser, which is basically the same as the generator, has little broader rules that accept mainly spelling variants, that in some cases come from different dialects, but generally correspond to a certain syncretism (and confusion) generated after the different spelling proposals and the influence of Spanish orthography.

The spelling proposal we follow is AMU, "Alfabeto Mapuche Unificado" [Sochil 1986, 1988] RefB:23 . This "grafemario" is also known as the "academic proposal".

Having half mind in computational technology and the other half in linguistics, finding a description of a language such as the one Smeets does of Mapudüngun in her thesis work, leads straight to think in a computational implementation of her grammar. The suffixes organised in slots and the description of interaction rules are the words that reflect what regular expressions can encode. We did further research on other descriptions of Mapudüngun while implementing the analyser but only to compare descriptions (Other sources of Mapudüngun grammars we have consulted are Fernández-Garay & Malvestiti RefB:07 , Lonkon RefB:12 , Ragileo RefB:15 , Salas RefB:19 and Zúñiga RefB:24 ).

4.3 The analyser

This is a rule based morphological analyser (and generator) applied to the Mapudüngun language. It was built with finite state transducers using the algorithms developed by Mans Hulden212121https://en.wikipedia.org/wiki/Mans_Hulden for his FOMA222222https://fomafst.github.io/

project, an open source application to compile finite state transducers.

The rules that were generated for the analyser, as well as the tags used in its outputs, are based on the description and study made by Dr. Ineke Smeets of Mapudüngun, and published in her book "A Grammar of Mapuche" [Smeets, I. 2008] RefB:21 232323https://www.degruyter.com/view/product/22765.

4.4 FOMA implementations

The path we follow is Annette Ríos work on Quechua, she have developed various tools, the main one being the finite state morphological analyser and generator [Ríos 2015] RefB:17 .

Ríos development for the analyser and generator was with XFST and other tools released by Xerox [Beesley & Kartunnen 2003] RefB:01 . She used FOMA for the spell checker. We decided to use FOMA for everything, mainly because it is open source software and we wanted to develop a totally free set of linguistic tools for Mapudüngun.

FOMA is a very well-known software used by many linguists and computer developers for a wide range of tasks, but mainly for language applications. Searching in the Internet we have looked for the FOMA implementations242424This information was compiled using Google Scholar only. Some publications mention more than one FOMA implementation, some times at different years; to simplify, we have counted the publications about FOMA implementations, and we have left out those that mention FOMA only as a reference. The results vary as the search is repeated. per year, shown in table 4:

FOMA has been widely used in Basque [Alegria et al. 2009] related tools, but also in a good amount of American aboriginal languages (Quechua [Rios. 2015]; Arapaho [Kazeminejad, Cowell, & Hulden. 2017]; Nahuatl [Escobar. 2019]), Turkish [Yıldız, Avar, & Ercan. 2019], Indonesian [Larasati et al. 2011], Japanese [Sim. 2013], Arabic [Attia, Al-Badrashiny, & Diab. 2014], Kazakh [Kessikbayeva & Cicekli. 2014] and others252525There are many more FST implementations for different languages, with other available FST compilers; but as we have developed our own with FOMA, we just list those.. We list here some of the implementations counted in table 4.

  • 2020. A Finite-State Morphological Analyser for Evenki. Zueva, A., Kuznetsova, A. & Tyers, F. (Indiana University).

  • 2019. Improved Finite-State Morphological Analysis for St. Lawrence Island Yupik Using Paradigm Function Morphology. Chen, E., Park, H. & Schwartz, L. (University of Illinois Urbana-Champaign).

  • 2018. Computational syntactic analysis of Setswana. Berg, A. (North-West University. Johannesburg).

  • 2017. Creating lexical resources for polysynthetic languages — the case of Arapaho. Kazeminejad, A., Cowell, A. & Hulden, M. (University of Colorado).

  • 2016. ZeuScansion: A tool for scansion of English poetry. Agirrezabal, M., Astigarraga. A., Arrieta, B. (University of the Basque Country) & Hulden, M. (University of Colorado Boulder).

  • 2015. A Basic Language Technology Toolkit for Quechua. Ríos, A. (University of Zurich).

  • 2014. GWU-HASP: Hybrid Arabic Spelling and Punctuation Corrector. Attia, M., Al-Badrashiny, M. & Diab, M. (The George Washington University).

  • 2013. A Morphological Analyzer for Japanese Nouns, Verbs and Adjectives. Sim, Y. (Carnegie Mellon University).

  • 2012. Finite State Methods Applied to Hebrew Noun Patterns (Mishqalim). Rozenberg, F. (Eberhard Karls Universität. Tübingen, Germany).

  • 2011. Matxin, an open-source rule-based machine translation system for Basque. Mayor, A., Alegria, I., Díaz de Ilarraza, A., Labaka, G., Lersundi, M. & Sarasola, K. (University of the Basque Country).

2020: 12 2019: 17 2018: 16 2017: 14 2016: 10
2015: 11 2014: 13 2013: 09 2012: 12 2011: 13
Table 4: FOMA implementations per year (2011 to 2020)

4.5 The encoded Mapuche alphabet

We present in this section the alternative representation of some graphemes in the analyser, alternative from those of Smeets. Finally, the rules implying variation in the spelling of words, generated either by elision, epenthesis or other similar phenomena.

As it also deals with orthography, this section brings up the subject of 6.1 The spelling unifier (see 6.1, p. 6.1).

The sigma alphabet of the analyser includes every non-epsilon (not empty) symbol that appears in the network, either by itself or as a component of a symbol pair [Beesley & Karttunen 2003: 57, 58] RefB:01 . For example, the sigma alphabet of the network compiled from [cat "+Noun":0] consists of the symbols a, c, t and +Noun [Beesley & Karttunen 2003: 62] RefB:01 . Then, the sigma alphabet is a list of all the single symbols that occur either on the upper or lower side of the arcs [Beesley & Karttunen 2003: 99] RefB:01 (see 4.1.2 4.1.2 Finite state transducers (FSTs), p. 4.1.2).

The analyser’s sigma, among all the symbols it comprises, contains the Mapuche alphabet:

VOW [ a | e | i | o | u | ü ];
SVW [ w | y | g ];
CON [ {ch} | d | f | k | l | {ll} | m | n | ñ | {ng} | p | r | s | {sh} | t | {tr} ];

{VOW, SVW, CON} of the analyser

In order to apply morphotactics (see 4.1 4.1 Computational morphology, p. 4.1) and rules, the alphabet has been divided into three groups:

4.5.1 Vowels

For this group of characters (see table 2, p. 2) Smeets’ proposal was adopted without any variation. They conform the group named VOW.

It is worth to mention though, that "Mapudüngun has six vowel phonemes, / ë  ö  /… It should be noted that vowels of Mapudüngun have traditionally been treated as the five vowels of Spanish (/i e a o u/), with identical stressed and unstressed allophones, plus a high central unrounded vowel // (commonly known as the ’sixth vowel’) having a mid central allophone [] in unstressed position" [Sadowsky, S. 2013] RefB:18 .

The previous paragraph comes to say that the grapheme ü represents two sounds: // and // which realise in complementary contexts.

4.5.2 Semivowels:

Smeets identifies them as glides (see table 1, p. 1). She includes the r, and counts the glides among the consonants, while in the analyser these are separated from the consonants to form the semivowels group, except for the r. With this categorisation we can treat certain phonological phenomena that involve these semivowels, such as elision and epenthesis (see 5.1, p. 5.1). We could have called them glides as Smeets does, but following Beesley & Kartunnen [2003] RefB:01 , we have called them semivowels. Anyway, they do occur as both, semivowels and glides262626Glides immediately precede the vowel, semivowels immediately follow the vowel, both are less sonorous than the vowel..

The sound // usually represented by a g, Smeets represents it by a q to reflect a difference in sound, which is softer than the one she represents using g, for instance, in the loan from Spanish gayeta ’cookie’, so written in Smeets’ work. She spells the Mapuche word for ’seven’ as reqle, while we spell it regle. This word and all those that Smeets spells with q have been traditionally spelled with g. This distinction does not affect the meaning of words, then g spelling has been adopted by us to represent all the // variants.

For the other two graphemes, w and y, no changes were made. All three are grouped under the SVW denomination.

4.5.3 Consonants:

Contrary to Smeets, who states that there are 19 consonants, and because we have counted out the three glides identified in the previous paragraphs as semivowels, there are 16 recognised consonants.

Smeets uses   to represent the voiceless272727Not in all dialects this sound is voiceless, prove of that is in the early transcriptions made by Jesuit monks, all of them used d, which represents a voiced sound. interdental fricative // to make clear the difference with the d from Spanish loans. We use d because this distinction does not affect the meaning of words.

For the interdental series t’, n’, l’, we have eliminated the apostrophe to interpret these letters as the alveolar counterpart. Letters with apostrophe appear mainly in the older texts. In more recent texts they sometimes appear, even though not consistently. Some times the writer introduces an interdental in a word, later it does not, not even for the same word. Due to this misleading usage of the interdental variant and the "dying out" of the distinction [Smeets, I. 2008: 31] RefB:21 , the inclusion of interdentals has been avoided for morphological generation, while they are accepted as variants in the analysis.

The elimination of apostrophe broadens the possibilities of analysis. The words and roots to be analysed are collected in the lexicon; if it appears just newen ’force, strength’ in the lexicon, the variant n’ewen’ would be taken as an unknown root, while by eliminating the apostrophe before analysis, it allows the transducer to analyse any introduced variant, either n’ewen, n’ewen’, newen’ or newen as the same root.

Loaned sounds /b/ (bilabial, plosive, voiced), represented by b, and /x/ (velar, fricative, voiceless), represented by j, from Spanish, are not included in our system. They are converted into the corresponding letter of the Mapudüngun alphabet by means of the spelling unifier (see 6.1 below); b into f and j into k, which are the most usual conversions we have found reflected in some dictionaries: jabón → kafon; burro → furiku, etc. [Febrés 1882282828Consulted at http://corlexim.cl <2019-07-11>. More examples are found in Febrés dictionary [1882] RefB:06 .].

Other possible conversions are directly reflected in the lexicon: vaca → waka292929Today it is well-known that in Spanish there was never a difference in pronunciation of words spelled with b or v (see http://lema.rae.es/dpd/srv/search?id=d45ahCOicD6TkHkns8 for more information), otherwise, probably the inclusion of vaca into Mapuche lexicon should have been as faka.; vehículo → weikulo; voto → woto; jamón → kümon; junio → kunio, etc.

Within the consonants group, CON, note the representation for digraphs ({ch}, {ll}, {ng}, {sh}, {tr}) in curly braces. This indicates that these two symbols together form a single representation of a sound. In other words, the concatenation of these two symbols is invariable and univocal.

4.6 Intermediate representation symbols

These symbols are created to treat different morphophonological changes that occur in the language due to the interaction between suffixes, between roots, and among all of them together. It may be said that they are part of the alphabet, at least in an abstract level, because they are used to represent a certain stage in a change process, from which the surface form (the one we write or read) arises. A few of them are listed here as an example, the complete list of these symbols with their function is found in annex 11.2.7 11.2.7 Intermediate language symbols, p. 11.2.7:

Example 107

"@G" is used to treat epenthesis of glottal stops represented by g in the spelling, see 6.4.2, E116, E117 and R1, p. 116.

Example 108

"@Ü" is used to treat schwa insertion represented by ü in the spelling, see E110, E111 and E112, p. 112.

Example 109

"@GK" is used to treat radical alternation in some intransitive verbs which change their last consonant from g to k if they are in contact with the causative suffix -üm-, which transitivizes them, see D19 and D19, p. 19.

4.7 Roots encoding

In the FST, this section headline is "Read in roots", because the system reads the files containing the roots lexicon and incorporate them into the analyser. Lists are separated by part of speech (grammatical categories).

Definition 1

define AVROOT @re"roots/avroot.lex";
define NROOT @re"roots/nroot.lex";

The expression above introduces the lists of roots for adverb and noun categories, so any listed noun may be found throughout "NROOT", which encodes nouns as shown below:

Definition 2

Sample of nouns lexicon:
|["‑NN"{.aylen_brasa}]: ["@G"{aylen}]
|["‑NN"{.aywiñ_sombra}]: ["@G"{aywiñ}]
|["‑NN"{.chachay_papá-afectuoso}]: [{chacha}|
{chachay}|{tatay}]
|["‑NN"{.chadi_sal}]: {chadi}
|["‑NN"{.chaf_cáscara_piel-de-frutas}]: {chaf}
|["‑NN"{.chafid_bagazo)}]: {chafid}

In D2 "‑NN" stands for (simple) noun or nominal (root). "@G" is an intermediate language tag to treat glottal stop insertion, see 6.4.2, E116, E117 and R1, p. 116.

In the same way D1 shows noun and adverb roots encoding, there is a file per part of speech and other forms (verbs, adjectives, particles, interjections, etc.).

We have divided the lexicon in two major groups: the roots that may be verbalised, and the forms that can not be verbalised. Among the roots, all except for the verb roots may be independent (without suffixes) words.

  1. Roots (verbalisable)

  2. File ajroot.lex: adjectives

  3. File avroot.lex: adverbs

  4. File ivroot.lex: intransitive verb roots

  5. File names.lex: proper nouns

  6. File nroot.lex: nouns

  7. File nuroot.lex: numerals

  8. File onroot.lex: onomatopoeia

  9. File qroot.lex: question forms

  10. File tvroot.lex: transitive verb roots

  1. Other lexicon (non-verbalisable)

  2. File adverb.lex: adverbs

  3. File anaphpr.lex: anaphoric pronouns

  4. File auxv.lex: auxiliary verbs

  5. File conj.lex: conjunctions

  6. File dempr.lex: demonstrative pronouns

  7. File forexp.lex: foreign expressions (Spanish loans)

  8. File intpr.lex: interrogative pronouns

  9. File itj.lex: interjections

  10. File neg.lex: negation particles

  11. File numbers.lex: numbers

  12. File part.lex: particles

  13. File perspr.lex: personal pronouns

  14. File posspr.lex: possessive pronouns

  15. File prep.lex: prepositions

  16. File punct.lex: punctuation marks

11.1 Tags meaning is found in annex 11.1. There is a list of tags assigned to every part of speech and suffix with the name (in Spanish) identifying them, on:
http://www.chandia.net/glosas-del-dungupeyem

4.8 Suffixes encoding

As every suffix is assigned to a slot, which in turn is encoded in a file, the addition of such information to the main script is carried out by calling these files under the script section "Read in slots". Each file contains the fillers of the corresponding slot. For example:

Definition 3

define NEG @re"slots/1-15-Inflectional-Suffixes
/slot‑10.aff";

D3 is the definition of an expression named "NEG", which in turn, is composed by the regular expressions stored in the text file
"slot-10.aff", for which its complete location route is indicated: "slots/1-15‑Flexive‑Suffixes/" [See figure 5].

Figure 5: Directory tree structure for the location of a suffix file.

This file, deployed in D4 below, contains negation suffixes filling slot 10:

Definition 4

["+NEG".la10]: {la}
| ["+NEG"{.ki10}]: "@NK"
| ["+NEG"{.no10}]: [{no}|{nu}]
| ["+NEG"{.kino10}]: [{kino}|{kinu}];303030This regexp is written in only one line, but for the sake of better understanding each of its parts is exposed in separated lines.

D4 expresses four suffixes filling slot 10. They are separated by the pipe (vertical bar | ) symbol. Every suffix section indicates its upper or abstract level (right of the colon) and lower or surface level (left of the colon). Every abstract form is returned at the analysis process after a cleaning task leaves only the category tag, the suffix form and the slot number: "+NEG.no10". These same forms, before being cleaned up, are used to apply morphotactics in the abstract level at analysis and generation processes. At the surface level we find the forms given by analysis and/or the tags that trigger a process of replacement, like in the case of "@NK" (in D4 above) which further down the script has a rule replacing it for certain form in a defined context.

The complete list of files per slot containing the suffixes is given in annex 11.2, p. 11.2.

5 The Mapudüngun FST analyser

In this section we expose and explain how the different aspects of Mapudüngun morphology are treated in order to implement our FST analyser and generator. The order of presentation is not necessarily the same as in the encoding script. There are some encoding already presented in previous sections: the Mapuche alphabet [section 4.5, p. 4.5]; the spelling unifier [section 6.1, p. 6.1]; the lexicon inclusion [section 4.7, p. 4.7], and the suffixes inclusion [section 4.8, p. 4.8].

We begin by explaining phonological changes (5.1), including special cases (see 5.1.2, 5.1.3, 5.1.4, p. 5.1.2). Then we move on to the stems typology and the strategies to manage them. The interaction of suffixes after the stem comes next, this section introduces the verb paradigms and verb nominalisation. It also brings up the subject of the mobility of some suffixes and how to deal with it.

Some verb roots present a special behaviour, we treat them at the end of this section.

5.1 Phonological changes into spelling

The occurrence of roots and suffixes in a verb form generates certain phonological changes when interacting with their neighbours suffixes or roots. In Mapudüngun that interaction may be between suffixes, between the root and the consecutive suffix, between roots inside a compound stem, or between all the previous ones inside complex stems.

These changes are encoded in the "lower" or "surface" side of the language (see sections 4.1.2 and 4.1.3, p. 4.1.2), therefore, they are under the section "Lower rules" in the FST script, and they affect the word form.

There are frequent phonological changes in Mapudüngun, which are reflected in spelling, except for the epenthesis of the voiced velar fricative // represented by g. Which is optional between the sequences ii, uu, üü, and obligatory between the sequences ae, ea, ee, ai, ia, where the first vowel of the sequence is the last of a suffix, and the second vowel of the sequence is the first of a consecutive suffix313131In Smeets’ texts, this // is never transcribed, but in texts from other authors is sometimes present: kellu-ke-(g)e-n ’you helped me’ -NN.kellu_ayuda+VRB.Ø36+CF.ke14+IDO.e6+IND1SG.n3
+DS12A.Ø1
[Mösbach, E. RefB:14 ].
.

The epenthesis of schwa (sometimes i, e, u) is obligatory between a form (root or suffix) ending in consonant or semivowel and the following form beginning in consonant or semivowel.

Example 110

[Smeets, I. 2008: 353 (43)] RefB:21
kim-uw-küle-y-iñ ’we (pl) know each other’
‑TV.kim_saber+REF.w31+ST.le28+IND.y4+1.Ø3
+PL.iñ2

Example 111

[Smeets, I. 2008: 68 (56)] RefB:21
puw-ün ’to arrive’
‑IV.puw_llegar+PVN.n4

Example 112

[Smeets, I. 2008: 109 (1)] RefB:21
mamüll-entu ’grove’ ‑NN.mamüll_árbol+GR.ntu

"A schwa is optionally inserted between a consonant and the suffix sequence -l-e +CND.l4+3.e3 and between a consonant and the suffix sequence -y-iñ +IND.y4+1.Ø3+PL.iñ2 and -y-u +IND.y4+1.Ø3+DL.u2" [Smeets, I. 2008: 51].
Sometimes it is i instead of schwa.

Example 113

[Smeets, I. 2008: 52] RefB:21
kim-l-e → kim-ül-e ’if he knows’
‑TV.kim_saber+CND.l4+3.e3

Example 114

lef-y-u → lef-üy-u ’we both ran’
‑IV.lef_correr+IND.y4+1.Ø3+DL.u2

Example 115

[Smeets, I. 2008: 52] RefB:21
lef-y-iñ → lef-iy-iñ ’we (pl) ran’
‑IV.lef_correr+IND.y4+1.Ø3+PL.iñ2

Lexical forms must be collected in an intermediate form with the appropriate tags to later apply the rules transforming them into the final surface forms, e.g., the portmanteau suffix for indicative, 1st person, singular may occur as -ün-, -üñ-, -n- or -ñ-. It is encoded as:

Definition 5

["+IND1SG"{.n3}] : ["@Ü"[n|ñ]];

The intermediate form is encoding two variants: the tag "@Ü" followed by n or ñ.

The rules to generate the four forms mentioned above are:

Definition 6

["@Ü" -> ü || [CON|SVW|.#.] _ ]
.o. ["@Ü" -> 0 || VOW _ ];

These are two rules concatenated by the composition symbol .o. (see figure 2). They are applicable not only to this case, but wherever the tag "@Ü" is found. The first rule indicates that the tag "@Ü" is replaced by ü in the context (twin pipes || indicate the context) where a consonant or semivowel precedes it, or when it is at the beginning of the word (.#. "word boundary character"), the tag position being marked by the underscore "_". The second rule indicates that the tag is deleted if preceded by a vowel. In combination with the "intermediate representation", the FST compiles the four possible forms for this suffix.

The suffixes affected by these rules are the following ones:

  • More implicated object (s29) -l- ["@Ü"l]

  • Satisfaction (s25) -ñmu- ["@Ü"{ñmu}]323232Some suffixes have multiple forms, as the case presented in D5 and D6; or satisfaction -ñmu- that it may also be -ñmo-. For simplicity, we present here only the most common form of the suffix.

  • Interruptive (s18) -r- ["@Ü"r]

  • Reportative (s12 & NCC) -rke- ["@Ü"{rke}]

  • Conditional (s4) -l- ["@Ü"l]

  • Plain verbal noun (s4) -n- ["@Ü"n]

  • Indicative 1st singular (s3) -n- ["@Ü"n]

  • Adjectiviser quick & easy (NOM) -nten- ["@Ü"{nten}]

Some suffixes may trigger an -ü- or an -u- (the first two), or only an -u- (the last two):

  • Pluperfect (s15) -wye- [["@Ü"|"@U"]{wye}]

  • Completive subjective verbal noun (s4) -wma-
    [["@Ü"|"@U"]{wma}]

  • Reflexive/reciprocal (s31) -w- ["@U"w]

  • 1st person agent (s23) -w- ["@U"w]

Tags are created and assigned arbitrarily. For the rules to work it is necessary to place the corresponding tag in the appropriate position. As the rule is applied to the surface level, the tag is placed at that same level, which is encoded to the right of the colon, e.g.:

Definition 7

["+PLPF"{.üwye15}] : ["@Ü"{wye}];

The encoding above incorporates the pluperfect marker into the system. To the left of the colon is the upper or abstract level, the analysis representation. To the right, it is the lower level, the surface representation, where the tag is added preceding the suffix as the initial character. Once the tag is appropriately replaced, there is another rule that eliminates the unused tags.

Epenthesis of a glottal stop is optional between the ending vowel of a root and the initial vowel of a following root in compounds.

Example 116

[Smeets, I. 2008: 52] RefB:21
dewma-iyal-la-y → dewma-giyal-la-y ’he did not prepare food’
‑TV.dewma_hacer-N.iyal333333iyal is a lexicalized form for "food" that may be analysed as -TV.i_comer+NRLD.a9+OVN.el4 ’what will be eaten’._comida‑CR.TV+NEG.la10
+IND.y4+3.Ø3

To apply the rule described above, we have marked, in the lower level, all the roots beginning in vowel, placing a "@G" before the root; for example:

Example 117

["-AJ"{.allush_tibio}]: ["@G"{allush}]

["-AJ"{.awka_rebelde}]: ["@G"{awka}]

["-AV"{.aymüñ_bastante}]: ["@G"{aymüñ}]

["-AV"{.ina_cerca}]: ["@G"{ina}]

["-IV"{.echiw_estornudar}]: ["@G"{echiw}]

["-IV"{.uma_dormir}]: ["@G"{uma}]

["-NN"{.antü_sol}]: ["@G"{antü}]

["-NN"{.epew_cuento}]: ["@G"{epew}]

["-TV"{.ingka_defender}]: ["@G"{ingka}]

["-TV"{.üngüm_esperar}]: ["@G"{üngüm}]

The rules encoding this change are the next ones:

Exercise 1

Glottal stop insertion (between roots)
define RuGlot ["@G" (->) g || VOW _ ]
.o. ["@G" -> 0];

The first rule, to the left of the concatenation (.o.) symbol, encodes the optionality by enclosing the direction operator between parenthesis. The change applies only when the tag is preceded by a vowel, otherwise, the second rule transforms it into 0 (zero = null character), i.e., it is deleted.

A glottal stop is also optionally added between suffixes, but only in the cases where the sequence ii is generated. Instead of adding a tag to the suffixes, this change is treated with a more general rule:

Exercise 2

Glottal stop insertion (between suffixes)
define RuleEliGem343434This is a wider rule that treats elision, epenthesis and gemination of other phonemes, see R3. [{ii} (->) {igi}];

Applying this rule we can analyse words like the following one:

Example 118

[Smeets, I. 2008: 51] RefB:21
leli-l-i-iñ → leli-l-i-g ’if we look’
‑TV.leli_mirar+CND.l4+1.i3+PL.iñ2

Two equal consonants geminate in careful speech, and became a single sound in colloquial speech. And this fact is sometimes transcribed into written text.

Example 119

[Smeets, I. 2008: 51] RefB:21
kon-nu-l-i → konu-l-i ’if I do not enter’
‑IV.kon_entrar+NEG.no10+CND.l4+1.i3+SG.Ø2

To treat gemination we use the same type of rule shown in R2. The following rule not only encodes the gemination of n, but also of m and e353535In the case of e the phonological term is not gemination but lengthening.:

Exercise 3

Gemination simplification
define RuleEliGem [{nn} (->) n,
{mm} (->) m, {ee} (->) e];

Example 120

[Smeets, I. 2008: 437 (17)] RefB:21
fillem-mew → fillemew ’in every respect’
-NN.fillem_toda-clase-de-cosas+INST.mew

Example 121

[Smeets, I. 2008: 278 (1)] RefB:21
nie-e-y-u → nie-y-u ’I hold you (sg)’
-TV.nie_tener+IDO.e6+IND.y4+1.Ø3+DL.u2
+DS12A.Ø1

The non-realised affix -a- (s9) separates itself from a preceding a- inserting -y in between. In this case we have an intermediate representation of the suffix (D8), and a couple of rules treating the corresponding tag "@Y" in context (R4):

Definition 8

["+NRLD"{.a9}]: ["@Y"a];

Exercise 4

y epenthesis
define RuTrEPENTHy ["@Y" (->) y ||[a|.#.] _]
.o. ["@Y" -> 0 || \a363636Term negation (\X). Any single symbol except X. Equivalent to [? - X] [Hulden, M. in https://code.google.com/archive/p/foma/wikis/
RegularExpressionReference.wiki
].
_ ];

This concatenated rule says that "@Y" becomes y when preceded by an a or at the beginning of the word, and it becomes a null character when preceded by any character except a.

Example 122

[Smeets, I. 2008: 63 (19)] RefB:21
tripa-a-n → tripa-ya-n ’I will leave’
‑IV.tripa_salir+NRLD.a9+IND1SG.n3

Pronouns engu ’they (dl)’ and engün ’they (pl)’ are optionally realised as yengu and yengün respectively, either isolated or forming part of a compound where the previous element ends in vowel. Here also applies definition D8.

Example 123

[Smeets, I. 2008: 95 (ii)] RefB:21
tüfa-yengu ’these two’
-DP.tüfa_este-PP.engu_ellos-dos

The sequence ae is optionally simplified as a.

Example 124

[Smeets, I. 2008: 52] RefB:21
leli-la-e-y-u → leli-la-y-u ’I shall not look at you (sg)’
‑TV.leli_mirar+NEG.la10+IDO.e6+IND.y4+1.Ø3
+DL.u2+DS12A.Ø1

Example 125

[Smeets, I. 2008: 52] RefB:21
i-me-a-el → i-me-a-l ’eat there!’ (lit.: ’you will eat there’)
‑TV.i_comer+TH.me20+NRLD.a9+OVN.el4

The sequence ae is never simplified when a is followed by -e-n +IDO.e6+IND1SG.n3+DS12A.Ø1 or by e-n-ew
+IDO.e6+IND1SG.n3+DS3A.ew1.

Example 126

[Smeets, I. 2008: 48] RefB:21
elu-a-e-n ’you (sg) will give to me’
‑TV.elu_dar+NRLD.a9+IDO.e6+IND1SG.n3+DS12A.Ø1

Example 127

[Smeets, I. 2008: 485 (4)] RefB:21
ayü-la-e-n-ew ’she did not love me’
‑TV.ayü_amar+NEG.la10+IDO.e6+IND1SG.n3
+DS3A.ew1

5.1.1 Interaction between suffixes of slots 10 to 4

Most common suffixes in a verb form are those located between slots 10 and 4. There are multiple morphophonological changes depending on the suffixes occurring. The sequence a-e mentioned above is only one of them.

All the suffixes triggering morphophonological changes that need special rules and belonging to this series of slots are encoded as follows:

Definition 9

Slot 10: Negation373737Some slots of this series encode more suffixes than the ones displayed here, which are the ones having relevance for the rules generated and explained in this section.
["+NEG"{.la10}] : {la}
| ["+NEG"{.ki10}] : [k"@NK"];

Definition 10

Slot 9: Non-realised situation
["+NRLD"{.a9}]: ["@Y"a];

Definition 11

Slot 8: Impeditive
["+IPD"{.fu8}] : [f"@IP"];

Definition 12

Slot 6: Internal and external direct objects
["+EDO"{.fi6}] : "@ED"
| ["+IDO"{.e6}] : "@ID";

Definition 13

Slot 4: inflectional nominalisers
["+OVN"{.el4}] : "@EL";

Definition 14

Slot 3PTMT: Portmanteau morphs
["+IND1SG"{.n3}] : ["@Ü"[n|ñ];

The following set of rules deals with the suffixes shown above:

Exercise 5

Negation for imperative forms (ki → k)
define RuNegKi ["@NK"(->)[i|e] || _[e|"@ID"]]
.o. ["@NK" -> i]

From slot 10, there are two negation suffixes that can play a role in this set of interconnected rules: one is the negation for indicative -la-, the final a has to be taken into account when interacting with +IDO suffix -e-, slot 6. The other negation suffix is -ki- for imperatives. It may drop its final i, or replace it by e when followed by e, which is also the form of the +IDO suffix encoded as "@ID" in the intermediate representation. This is what previous rule R5 manages. "When the negative marker -ki- (slot 10) is followed by e, the sequence ie is optionally replaced by ee or contracted to e" [Smeets, I. 2008: 52 (8.1.4.3)] RefB:21 .

Example 128

[Smeets, I. 2008: 52 (8.1.4.3)] RefB:21
sungu-we-ki-e-l-i sungu-we-ke-e-l-i sungu-we-k-e-l-i
’don’t speak to me any more’
-NN.düngu_palabra+VRB.Ø36+PS.we19
+NEG.ki10+IDO.e6+CNI.l4+1.i3+SG.Ø2+DS12A.Ø1

Another suffix that has the same conditions of interaction as the negation suffix -la-, with +IDO suffix, is +NRLD suffix -a-, slot 9. This suffix may also be realised as -ya- (see D8 and R4). We will recall -la- +NEG and -a- +NRLD further down in R10.

The occurrence of the suffixes sequence +IPD -fu- and +EDO -fi- may yield -fufi- or -fwi- in Smeets’ texts, but also -fui- in some other texts.

Example 129

[Smeets, I. 2008: 39 (c)] RefB:21
angkad-fu-fi-nangkad-fwi-n ’the one I had taken on the back of my horse’
‑TV.angkad_llevar-en-ancas+IPD.fu8+EDO.fi6
+IND1SG.n3

Example 130

[Mösbach, E. 1936: 16] RefB:14
kim-la-fu-fikim-la-fui ’I didn’t know (about) that’
‑TV.kim_saber+NEG.la10+IPD.fu8+EDO.fi6
+IND1SG.n3

Exercise 6

Impeditive + EDO
define RuTrIPDEDO

[["@IP" -> [u"@1"|w"@2"] || _ "@ED"] .o.

["@ED" -> [fi|i] || "@1" _ ] .o.

["@ED" -> i || "@2" _ ]];

R6 encodes the changes exemplified in E129 and E130. RuTrIPDEDO is composed of three concatenated rules. First sub-rule states that "@IP" is either transformed into "u@1" or "w@1" when followed by "@ED", which is expressed in the realisation context (the underscore marks the position of the treated element): || _ "@ED". It must be taken into account that rules are applied sequentially, so, when the transformation of "@IP" is carried out, this very same tag, which indicates the context for the subsequent change of "@ED", is lost (it has been transformed in something else). This is why when changing "@IP", new context marks are given for the subsequent "@ED" change. These new tags ("@1" and "@2") are only used as context marks to continue processing the forms. In a later step, context tags are cleared out.

Second sub-rule of RuTrIPDEDO, (concatenated by .o.), states that "@ED" is either transformed into fi or i when preceded by "@1".

Third sub-rule states that "@ED" is replaced only by i when preceded by "@2". "@1" and "@2" were established as contextual marks by the previous rule. Rules are applied sequentially.

Whenever +IPD (impeditive) is followed by +EDO (external direct object), the context is given, so the rule is applied accepting three combinations for analysis.

The process just described is illustrated in figure 6 below. It is important to be aware that it shows the generation direction because it is easier to explain and understand. Also note that the FSTs are reversible, therefore rules may be applied backwards, i.e., in the analysis direction. The resulting analysis for any of the three possible spellings -fufi-, -fwi-, -fui- will yield the analysis "+IPD.fu8+EDO.fi6".

Figure 6: Concatenated rules for generation process: simplified view of rule RuTrIPDEDO that processes the interaction between suffixes impeditive -fu- and external direct object -fi-.

When +IPD -fu- occurs followed by +IDO -e- they yield the form -fue-, but they may optionally yield the contracted form -fe-.

Example 131

[Smeets, I. 2008: 52 (8.1.6)] RefB:21
ellka-l-ke-rke-fu-e-y-ewellka-l-ke-rke-f-e-y-ew ’she used to hide it, they say’
‑TV.ellka_ocultar+CA.l34+CF.ke14+REP.rke12
+IPD.fu8+IDO.e6+IND.y4+3.Ø3+DS3A.ew1

+IPD and +OVN may co-occur in a sequence yielding -fu-el or -f-el. Smeets gives no examples to this respect, but she writes "The suffix -fu- may occur in indicative and conditional forms and in subordinates except those marked with the plain verbal noun suffix -n +PVN (s4) or the completive subjective verbal noun suffix -wma +CSVN (s4)" [Smeets, I. 2008: 231] RefB:21 . Then, the sequence +IPD +OVN is feasible, and we do have found examples where -el contracts with a previous form ending in e, and examples of -fu-el from other authors:

Example 132

[Smeets, I. 2008: 245 (10)] RefB:21
nie-elniel ’to have had’
-TV.nie_tener+OVN.el4

Example 133

[Smeets, I. 2008: 249 (7)] RefB:21
küdaw-pe-elküdaw-pel ’the own job’
-IV.küdaw_trabajar+PX.pe13+OVN.el4

Example 134

[Smeets, I. 2008: 411 (53)] RefB:21
fende-ke-el-chifende-kel-chi ’a sold thing’
-TV.fende_vender+CF.ke14+OVN.el4+ADJ.chi

Example 135

[Zúñiga, F. 2006: 144 (54)] RefB:21
nge-we-ke-no-fu-elnge-we-ke-no-fel ’to be no more’
-IV.nge_ser_estar+PS.we19+CF.ke14+NEG.no10
+IPD.fu8+OVN.el4

Exercise 7

Impeditive + Internal Direct Object or
Objective Verbal Noun

define RuTrIPDIDOOVN
[["@IP" -> [[u"@3"]|"@3"] || _ ["@ID"|"@EL"]]
.o. ["@ID" -> e, "@EL" -> el || "@3" _ ]];

R7 encodes the changes exemplified in E131 and E135. RuTrIPDIDOOVN is composed of two concatenated rules. The first rule states that "@IP" is either transformed into "u@3" or "@3" (the last one being the elision of u) when followed by "@ID" or "@EL". Once that conversion is done, if the sequence is completed by "@ID", this one is transformed into e, yielding two possible intermediate representations, fu@3e or f@3e. If the sequence is completed by "@EL", this one is transformed into el, also yielding two possible intermediate representations, fu@3el or f@3el. Then "@3" is wiped out giving fue or fe for "+IPD+IDO", and fuel or fel for "+IPD+OVN".

R8 converts "@IP" into u in any other context, yielding fu, as shown in E136:

Exercise 8

Impeditive
define RuTrIPD ["@IP" -> u];

Example 136

[Smeets, I. 2008: 63 (17)] RefB:21
kutran-fu-n ’to have been ill’
-NN.kütran_enfermedad+VRB.Ø36+IPD.fu8
+IND1SG.n3

R9 converts "@ED" into fi when is preceded by anything but "@IP", as shown in E137:

Exercise 9

External direct object
define RuTrEDO ["@ED" -> {fi} || \"@IP" _ ];

Example 137

[Smeets, I. 2008: 65 (31)] RefB:21
allkü-tu-nie-fi-n ’I am listening to him’
-TV.allkü_oir+TR.tu33+PRPS.nie32+EDO.fi6
+IND1SG.n3

Exercise 10

Internal direct object
define RuTrIDO [["@ID" -> e || [CON|SVW] _ ]
.o. ["@ID" -> e || a _ "@Ü"n[{ew}|.#.]]
.o. ["@ID" (->) e || [VOW|"@NK"] _ ]];

R10 contemplates other possible contexts of realisation for +IDO. It becomes e when preceded by any consonant or semivowel.

Example 138

[Smeets, I. 2008: 87 (21)] RefB:21
kim-e-y-u ’I recognised you’ lit: ’I knew you’
-TV.kim_saber+IDO.e6+IND.y4+1.Ø3+DL.u2
+DS12A.Ø1

+IDO also becomes e when is preceded by a and is followed by the intermediate form "@Ü"n, which corresponds to the portmanteau suffix for indicative, 1st person, singular; which in turn, it either ends the verb form (because it implies the presence of the null suffix +DS12A following it), or it is followed by -ew +DS3A.

Example 139

[Smeets, I. 2008: 157 (17)] RefB:21
pe-e-n ’you saw me’
-TV.pe_ver+IDO.e6+IND1SG.n3+DS12A.Ø1

Example 140

[Smeets, I. 2008: 94 (63)] RefB:21
pe-me-e-n-ew ’there he saw me’
-TV.pe_ver+TH.me20+IDO.e6+IND1SG.n3+DS3A.ew1

The last context of realisation for +IDO says that "@ID" is optionally transformed into e, which means that it may be elided, when preceded by a vowel or by the tag for negation in imperatives "@NK" (see R5 and examples E128 and E141).

Example 141

[Smeets, I. 2008: 94 (63)] RefB:21
ina-ni-a-Ø-lu-mu ’they have been followed’
-AV.ina_detrás+VRB.Ø36+PRPS.nie32
+NRLD.a9+IDO.e6+SVN.lu4+DS3A.mew1

The last rule of the set treating suffixes between slots 10 and 4, deals with the objective verbal noun suffix +OVN -el.

Exercise 11

Objective Verbal Noun
define RuOVN
["@EL" -> [l|{el}] || [a|e|"@ID"] _ ]
.o. ["@EL" -> {el}];

R11 specifies that "@EL", which is how +OVN is encoded, may be converted into l or el when preceded by a, e or "@ID" (e.r. 11.2.6). And in any other context it will be converted into el, see examples below:

Example 142

[Smeets, I. 2008: 114 (26)] RefB:21
lang-üm-el-chi ufisha ’killed sheep’
-IV.la_morir+CA.m34+OVN.el4+ADJ.chi
-NN.ufisha_oveja

Example 143

[Smeets, I. 2008: 189 (45)] RefB:21
pi-el-mew ’of what is said’
-TV.pi_decir+OVN.el4+INST.mew

Example 144

[Smeets, I. 2008: 189 (46)] RefB:21
entu-el ’what is taken (out)’
-TV.entu_sacar+OVN.el4

5.1.2 Special case: suffix -nge-

Corresponding to the verbaliser (see 3.1.1) or stem formative (see 3.1.2) located in slot 36, or to the passive suffix, slot 23, the form -nge- may alternate with -ngi- when followed by the indicative suffix -y-, and the verb form corresponds to the 3rd person non-specified for number. All of which is encoded by the next rule:

Exercise 12

Alternative nge form
define RuNGE ["@EY" (->) i || _ "@Ü"y]
.o. ["@EY" -> e];

For this rule to work, all mentioned suffixes were encoded as follows:

Definition 15

["+VRB"{.nge36}"-IV"] : [{ng}"@EY"];
["+SFR"{.nge36}"-IV"] : [{ng}"@EY"];
["+PASS"{.nge23}] : [{ng}"@EY"];

An example of each case is rendered below:

Example 145

[Smeets, I. 2008: 456 (3)] RefB:21
wentru-ngi-y ’they were men’
-NN.wentru_hombre+VRB.nge36-IV+IND.y4+3.Ø3

Example 146

[Smeets, I. 2008: 305 (2)] RefB:21
weyel-weyel-ngi-y ’he always swims’
-IV.weyel_nadar-RVBR+SFR.nge36-IV+IND.y4+3.Ø3

Example 147

[Smeets, I. 2008: 445 (3)] RefB:21
elu-ngi-y mapu ’he was given land’
-TV.elu_dar+PASS.nge23+IND.y4+3.Ø3
-NN.mapu_tierra

5.1.3 Special case: verb i- ’to eat’

The verb i- may be realised as i-, iy- or yi- depending on the context. To apply the rules (R13) that regulate this verb form in the different contexts, the verb i- has been encoded as follows:

Definition 16

["-TV".i_comer]: "@i";

First sub-rule of R13 avoids i- to be recognised and analysed as the final i of any word by deleting it.

Exercise 13

Forms of verb i- ’to eat’
define Verbi

["@i" -> 0 || _ .#.] .o.

["@i" -> "@G"{iy} || _ [a|e|"@Y"|"@ID"|
"@EL"]] .o.

["@i" -> {yi} || _ k [i|ü|"@NK"], "@i" _ ,
_ "@i"] .o.

["@i" (->) {yi} || _ w] .o.

["@i" -> "@G"i]];

Second sub-rule converts "@i" into the intermediate form "@G"{iy} which set the verb ready to be part of a compound where "@G" will optionally become g if preceded by a vowel (see E116, E117, p. 116; R1 and R2, p. 2). This change is carried out when "@i" is followed by a, e or the intermediate forms "@Y", "@ID", "@EL".

Example 148

dewma-iy-a-l-mew → dewma-giy-a-l-mew ’while preparing food’
-TV.dewma_hacer-TV.i_comer-CR.TV+NRLD.a9
+OVN.el4+INST.mew

Example 149

[Smeets, I. 2008: 204 (125)] RefB:21
i-el → iy-el ’what had been eaten’
-TV.i_comer+OVN.el4

Third sub-rule in R13 converts "@i" into yi in different contexts:

1) when "@i" is followed by ki, kü or the intermediate form "@NK" (see R5 and E128, p. 5):

Example 150

[Smeets, I. 2008: 445 (3)] RefB:21
i-ki-fi-l-nge → yi-ki-fi-l-nge ’you need not eat it’
-TV.i_comer+NEG.ki10+EDO.fi6+CNI.l4
+IMP2SG.nge3

2) "@i" is converted into yi when preceded by itself and followed by itself, this way the rule is managing reduplication of the verb root i- ’to eat’:

Example 151

[Smeets, I. 2008: 307 (8)] RefB:21
i-i-künu-fi-ñ → yi-yi-künu-fi-ñ ’I ate it quickly’
-TV.i_comer-RVBR+SFR.Ø36-IV+PFPS.künu32
+EDO.fi6+IND1SG.n3

In the fourth sub-rule, "@i" is optionally converted into yi when followed by w:

Example 152

[Smeets, I. 2008: 263 (11)] RefB:21
i-we-me-ke-la-y ’he no longer goes there to eat (as he used to)’
-TV.i_comer+PS.we19+TH.me20+CF.ke14+NEG.la10
+IND.y4+3.Ø3

Example 153

[Smeets, I. 2008: 260 (4)] RefB:21
i-we-la-n → yi-we-la-n ’I eat no more’
-TV.i_comer+PS.we19+NEG.la10+IND1SG.n3

Finally, fifth sub-rule states that "@i" is converted in the intermediate representation "@G"i in any other context. So, if it forms part of a compound being the second member, it can up bring a g before itself when the previous element of the compound ends in vowel (see E148).

Example 154

[Smeets, I. 2008: 309 (4)] RefB:21
i-püra-fi-ñ ’I ate it reluctantly’
-TV.i_comer+AIML.püda+EDO.fi6+IND1SG.n3

Example 155

[Smeets, I. 2008: 43 (f)] RefB:21
i-fal-ün ’I must eat’
-TV.i_comer+FORCE.fal25+IND1SG.n3

5.1.4 Special case: verb entu- ’to take out’

The verb entu- may be realised as ntu-, entu- or nentu- depending on the context. To apply the rules (R14) that regulate this verb form in the different contexts, the verb entu- has been encoded as follows:

Definition 17

["-TV".entu_sacar_quitar]: ["@VE"{ntu}];

Exercise 14

Forms of verb entu- ’to take out’
define Verbentu

["@VE" -> [["@G"e]|{ne}] || [.#.|a|i] _ ].o.

["@VE" -> [{ne}|0] || ü _ ] .o.

["@VE" -> e || [d|f|m] _ ] .o.

["@VE" -> {ne} || [e|u] _ ] .o.

["@VE" -> e ];

There are 5 different contexts shaping the form of this verb. First sub-rule of R14 indicates that "@VE" may be converted into (the intermediate form) "@G"e or ne at word beginning, or after a or i, in both last cases the tag "@G" is optionally converted into g (see R1 and E116, p. 1):

Example 156

[Smeets, I. 2008: 448 (32)] RefB:21
entu-fi-y-iñ ’we took him out’
-TV.entu_sacar+EDO.fi6+IND.y4+1.Ø3+PL.iñ2

Example 157

[Smeets, I. 2008: 318 (8)] RefB:21
nentu-antü-y ’they fixed a date’
-TV.entu_sacar-NN.antü_día+IND.y4+3.Ø3

Example 158

[Smeets, I. 2008: 407 (24)] RefB:21
tayma-entu-nge-pa-y ’they were taken out there’
-TV.tayma_eliminar-TV.entu_sacar-CR.TV
+PASS.nge23+HH.pa17+IND.y4+3.Ø3

Example 159

[Smeets, I. 2008: 315] RefB:21
witra-nentu-n ’I pulled out’
-TV.witra_levantar-TV.entu_sacar-CR.TV
+IND1SG.n3

Example 160

[Smeets, I. 2008: 409 (40)] RefB:21
dulli-entu-a-y-iñ ’we will choose him’
-TV.dulli_elegir-TV.entu_sacar-CR.TV
+NRLD.a9+IND.y4+1.Ø3+PL.iñ

Example 161

[Smeets, I. 2008: 553 (rapi-)] RefB:21
rapi-nentu-y ’he threw up’
-IV.rapi_vomitar-TV.entu_sacar-CR.TV
+IND.y4+3.Ø3

Second sub-rule of R14 converts "@VE" into ne after ü or eliminates it, which means that the verb may be realised as ntu- when the tag is eliminated, or nentu- when the tag is converted:

Example 162

[Smeets, I. 2008: 318 (8)] RefB:21
wemü-ntu-nge-rume-ye-m ’they were suddenly expelled
without realising (it)’
-TV.wemü_perseguir-TV.entu_sacar-CR.TV
+PASS.nge23+SUD.rume21+CF.ye5+IVN.m4

Example 163

[Smeets, I. 2008: 556 (rüfü-)] RefB:21
rüfü-nentu-me-ki-y ’he is busy serving out there’
-TV.rüfü_servir-comida-TV.entu_sacar-CR.TV
+TH.me20+CF.ke14+IND.y4+3.Ø3

Third sub-rule converts "@VE" into e after d, f or m yielding the entu- form of the verb:

Example 164

[Smeets, I. 2008: 405 (7)] RefB:21
ad-entu-a-l ’how to settle’
-NN.ad_forma-TV.entu_sacar+NRLD.a9+OVN.el4

Example 165

[Smeets, I. 2008: 201 (102)] RefB:21
ütrüf-entu-fi-n ’I have thrown it away’
-TV.ütrüf_tirar-TV.entu_sacar-CR.TV
+EDO.fi6+IND1SG.n3

Example 166

[Smeets, I. 2008: 486 (16)] RefB:21
kim-entu-a-n ’I shall declare’
-TV.kim_saber-TV.entu_sacar-CR.TV
+NRLD.a9+IND1SG.n3

Fourth sub-rule converts "@VE" into ne when preceded by e or u yielding the nentu- form of the verb:

Example 167

[Smeets, I. 2008: 88 (23)] RefB:21
weñe-nentu-nge-r-pu-y ’it would eventually be robbed’
-TV.weñe_robar-TV.entu_sacar-CR.TV
+PASS.nge23+ITR.r18+LOC.pu17+IND.y4+3.Ø3

Example 168

[Smeets, I. 2008: 556 (rüfü-)] RefB:21
utru-nentu-y ’she spilled it out’
-TV.utru_derramar-TV.entu_sacar-CR.TV
+IDO.e6+IND.y4+3.Ø3+DS12A.Ø1

The last sub-rule converts "@VE" into e in any other context not considered in the R14 set of rules, yielding the entu- form of the verb.

5.1.5 Special case: radical consonant alternation before causative -üm-

Mapuche verb roots which have an intransitive meaning may be transitivized by adding causatives suffixes -el-, -ül-, -üm-, slot 34, the factitive ‑ka- or transitivizer -tu- suffixes, slot 33. Few roots undergo a change through this process, actually, Smeets says that it is an "unproductive relic phenomena" [Smeets 2008: 53] RefB:21 . She gives the following exhaustive list:

  • af- ’to come to an end’ → ap-üm- ’to finish’

  • lef- ’to run’ → lep-üm- ’to make run (animals)’

  • traf- ’to fit in/on’ → trap-üm- ’to cause to fit in/on’

  • lleg- ’to come up (plants)’ → llek-üm- ’to plant’ (tr.),
    but lleg-üm- ’to make come up’

  • nag- ’to go down’ → nak-üm- ’to carry down’,
    but nag-üm- ’to take down’

  • la- ’to die’ → lang-üm- ’to kill’

We have also found some other cases:

  • trof- ’to explode, crack’ (itr.) → trop-üm- ’to crack’ (tr.)

  • nel- ’to get loose’ → nel(k)-üm- ’to let loose, to set free’

  • lüf- ’to burn’ (itr.) → p-üm- ’to burn’ (tr.), ’to set fire’

As the goal of the FST is analysis, the system was set for the maximum analysis possible. So, instead of introducing both forms (intransitive and transitive), only the intransitive verb form was introduced in the lexicon, together with the creation of a rule to handle the radical change.

Definition 18

Encoding of forms with radical change:

["‑IV".af_acabar]: ["@G"a"@FP"];

["‑IV".la_morir]: [la"@NG"];

["‑IV".lef_correr]: [le"@FP"];

["‑IV".lleg_crecer]: [lle"@GK"];

["-IV".lüf_quemar]: [lü"@FP"];

["‑IV".nag_bajar]: [na"@GK"];

["-IV".nel_soltar]: [nel"@GK"];

["‑IV".traf_encajar]: [tra"@FP"];

["-IV".trof_romper]: [tro"@FP"];

The forms on the list have a tag on the right side. "@FP" when the root has to end in f for the intransitive meaning and in p for the transitive sense. "@NG" appears when the root does not change anything regarding intransitiveness, and add ng when transitive. "@GK", intransitive ending in g, transitive ending either in g or k. The later are the ones with a "but" on the "list of forms with radical change" (18).

Definition 19

Causative -üm- encoding:
["+CA".m34]: ["@ÜC"m];

Exercise 15

Radical consonant alternation before -üm-
define RuCAlt01

[["@NG" -> {ng}"@4", "@FP" -> p"@5",
"@GK" -> [[k|g]"@6"] || _ "@ÜC"] .o.

["@ÜC" -> ü || ["@4"|"@5"|"@6"|CON|SVW] _ ]
.o.

["@4"|"@5"|"@6" -> 0]];

define RuCAlt02

[["@NG" -> 0, "@FP" -> f, "@GK" -> [g|0]]
.o.

["@ÜC" -> 0 || ["@NG"|VOW] _ ] .o.

["@ÜC" -> ü || ["@FP"|"@GK"|CON|SVW] _ ] .o.

["@NG"|"@FP"|"@GK"|"@ÜC" -> 0]];

The above set of rules is similar to the one defined by RuTrIPDEDO (R6 p. 6), in the sense that it follows the same logic. Basically, when any of the tags "@NG", "@FP" or "@GK" enters in contact with "@ÜC", the transitivizing option is activated implying a new context tag "@4", "@5", "@6" to allow the subsequent change of "@ÜC" into ü. After these two steps, context tags ("@4", etc.) are wiped out.

Rule RuCAlt02 operates on the intransitive change, i.e. it either eliminates the tag or transforms it into the intransitive form. The following analyses show that processes described above are successfully carried out:

Example 169

[Smeets, I. 2008: 192 (52)] RefB:21
af-a-y ’it will stop’
‑IV.af_acabar+NRLD.a9+IND.y4+3.Ø3

Example 170

[Smeets, I. 2008: 313 (15)] RefB:21
ap-üm-fal-iy ’it can be finished’
‑IV.af_acabar+CA.üm34+ADJDO.fal383838Smeets labels -fal as a nominaliser putting it under the category of derivative nominalisers as a broad term for non-verbal suffixes (see chap. 28.1 of ’A Grammar of Mapuche’), but we have tagged it as adjectivizer because -fal indicates that the action denoted by the verb is applicable to the subject of the phrase (e.g., edible) [Smeets 2008: 312] RefB:21 .+VRB.Ø36
+IND.y4+3.Ø3

Example 171

[Smeets, I. 2008: 34] RefB:21
lef-iy ’he ran’
‑IV.lef_correr+IND.y4+3.Ø3

Example 172

[Smeets, I. 2008: 265 (6)] RefB:21
lep-üm-kantu-nge-y ’they made it run’ (they made a mare run for exercise)
‑IV.lef_correr+CA.üm34+PLAY.kantu22
+PASS.nge23+IND.y4+3.Ø3

Example 173

[Smeets, I. 2008: 304 (23)] RefB:21
traf-me-n ’I went to meet’ (somebody)
‑IV.traf_encajar+TH.me20+IND1SG.n3

Example 174

[Smeets, I. 2008: 560 (traf-)] RefB:21
trap-üm-a-fi-n ’I will gather’ (it)
‑IV.traf_encajar+CA.üm34+NRLD.a9
+EDO.fi6+IND1SG.n3

Example 175

[Smeets, I. 2008: 206 (137)] RefB:21
lleg-mu-m ’where it had grown up’
‑IV.lleg_crecer+PLPF.mu7+IVN.m4

Example 176

[Smeets, I. 2008: 528 (lleg-)] RefB:21
llek-üm-fi-ñ ’I grew it’
‑IV.lleg_crecer+CA.üm34+EDO.fi6+IND1SG.n3

Example 177

[Zúñiga, F. 2006: 306 (parir)] RefB:24
lleg-üm-ün ’I grew’
‑IV.lleg_crecer+CA.üm34+IND1SG.n3

Example 178

[Smeets, I. 2008: 49] RefB:21
nak-üm-fi-y-u ’we brought him down’
‑IV.nag_bajar+CA.üm34+EDO.fi6+IND.y4
+1.Ø3+DL.u2

Example 179

[Smeets, I. 2008: 137 (37)] RefB:21
nag-ün ’it went down / the going down’
‑IV.nag_bajar+PVN.n4

Example 180

[Smeets, I. 2008: 243 (1)] RefB:21
la-le-la-y ’she is not dead’
‑IV.la_morir+ST.küle28+NEG.la10+IND.y4+3.Ø3

Example 181

[Smeets, I. 2008: 243 (2)] RefB:21
lang-üm-ki-fi-l-nge ’don’t kill it’
‑IV.la_morir+CA.üm34+NEG.ki10+EDO.fi6+CNI.l
+IMP2SG.nge3

Example 182

[Guevara 1913: 77] RefB:09
trof-lu ’the exploding one’
‑IV.trof_explotar+SVN.lu4

Example 183

[Augusta, F. (tropümün)] RefB:03
trop-üm-ün ’snap, shoot’
‑IV.trof_explotar+CA.üm34+PVN.n4

Example 184

[Augusta, F. (nel-)] RefB:03
nel-ün kawellu ’loose horse’
-IV.nel_soltar+IND1SG.n3
-NN.kawellu_caballo

Example 185

[Smeets, I. 2008: 441 (60)] RefB:21
nelk-üm-nge-nu-a-l ’not to get fired’
‑IV.nel_soltar+CA.üm34+PASS.nge23
+NEG.no10+NRLD.a9+OVN.el4

Example 186

[Smeets, I. 2008: 526 (lüf-)] RefB:21
f-a-y ’it will burn’
‑IV.lüf_quemar+NRLD.a9+IND.y4+3.Ø3

Example 187

[Augusta, F. (encender)] RefB:03
p-üm-ün ’to set fire to’
‑IV.lüf_quemar+CA.üm34+PVN.n4

5.2 Morphotactics: constructing the verb form

As it was explained in Morphotactics, p. 4.1, morphotactics is the set of constraints that regulates the co-occurrence of morphemes. Once the lexicon and suffixes that interact in the verb form are declared (see 4.7 4.7 Roots encoding, p. 4.7 and 4.8 4.8 Suffixes encoding, p. 4.8.), it is necessary to regulate their interaction.

We have introduced the Mapuche verb form in section 3, p 3. In a concise way, the verb is a stem followed by a series of suffixes that complete the verb form.

5.2.1 Stems codification

Section 3.2 3.2 Verb stems, p. 3.2 exposes different stem configurations. Most simple stem type is formed by a single verbal root. Verb suffixes may be added immediately to this type of stem. See example E44, p. 44.

Other type of stem that accepts verb suffixes immediately is the simple (implying no suffixes) compound where one of the members is a verbal root, the other member may be another verbal root, an adjectival, adverbial, nominal, numeral or a question root. See examples E45, E46, p. 45 and following table.

Stem Suffixes
Verbal root +Suffixes
Verbal root + Verbal root +Suffixes
Verbal root + Non-verbal root +Suffixes
           Non-verbal root + Verbal root +Suffixes
Table 5: Simple stem forms
Definition 20

Simple stems encoding
define CMPVBVAL
[CAjVbVSTEM|CAvVbSTEM|CNnVbSTEM|CQtVbSTEM|
CVbAjSTEM|CVbAvSTEM|CVbNnSTEM|CVbVbSTEM];

  • CAjVbVSTEM Complex adjective+verb compound stem

  • CAvVbSTEM Complex adverb+verb compound stem

  • CNnVbSTEM Complex noun+verb compound stem

  • CQtVbSTEM Complex question+verb compound stem

  • CVbAjSTEM Complex verb+adjective compound stem

  • CVbAvSTEM Complex verb+adverb compound stem

  • CVbNnSTEM Complex verb+noun compound stem

  • CVbVbSTEM Complex verb+verb compound stem

Definition D20 above, defined as CMPVBVAL, digests, among others in the FST script, the possible Mapuche simple stems. CMPVBVAL stands for "verbal compounds with their corresponding valence" (see 5.2.1 below).

Compounds encoding.

In D21, formCNnVbROOT encodes the form of a "noun + verb" compound. The whole form is enclosed in brackets and the tag "-NVCR" is attached to it. Then, sub-rule CNnVbSTEM applies neutralisation of tags (see 5.2.1 below) and verb valence (see 5.2.1 below) to the compound.

Definition 21

Noun + verb compound
define formCNnVbROOT
[[NROOT [IVROOT|TVROOT]]"-NVCR"];
define CNnVbSTEM [RuIVCNnVb .o. RuTVCNnVb .o.
RuCNnVb01 .o. [neutCNnVb .o. formCNnVbROOT]];

Neutralisation of tags.

Compound stems, complex stems (see Complex single root stems., p. 5.2.1) and complex compound stems (see Complex compound stems., p. 5.2.1), are qualified as "complex" because they incorporate suffixes into the stem, and have their own rules of interaction. For this reason, PoS and suffixes tags are converted into different tags while applying the inner compound rules (R16). We call this process "neutralisation" because it makes general rules not affect stems. This change is reverted before the analysis output, so the user does not have to interpret a wider set of tags. Neutralisation is applied first, and then compound rules are applied to the resulting form, therefore the rules are generated taking into account the converted tags (see CNnVbSTEM in D21). R16 is an example of how neutralisation is applied:

Exercise 16

Neutralisation of PoS and suffixes tags (sample)
define NeutAj ["-aj0" <- "-AJ"];
define NeutNn ["-nn0" <- "-NN"];
define NeutIv ["-iv0" <- "-IV"];
define NeutTv ["-tv0" <- "-TV"];
define NeutAdjdo ["+adjdo0" <- "+ADJDO"];
define NeutCa ["+ca0" <- "+CA"];
define NeutDistr ["+distr0" <- "+DISTR"];
define NeutHh ["+hh0" <- "+HH"];
define NeutNomag ["+nomag0" <- "+NOMAG"];
define NeutPvn ["+pvn0" <- "+PVN"];
define NeutRef ["+ref0" <- "+REF"];
define NeutTh ["+th0" <- "+TH"];
define NeutTr ["+tr0" <- "+TR"];

Valence in compounds.

When one of the roots in a compound is a verb and the other is not, the resulting compound gets the valence from the verb root. When both members of a compound are verb roots, the valence is derived from the second. This needs to be encoded because transitive verbs take suffixes that intransitive ones do not.

Exercise 17

Valence in verbal compounds (sample)
define RuIVCNnVb ["-CR.IV" <- "-NVCR" ||
"-nn0" $["-iv0"]393939This notation is equivalent to ?* "-iv0" ?*: "-iv0" surrounded by none or any amount of elements to the right and to the left. _ ];
define RuTVCNnVb ["-CR.TV" <- "-NVCR" ||
"-nn0" $["-tv0"] _ ];
define RuIVCVbVb ["-CR.IV" <- "-VCR" ||
["-tv0"|"-iv0"] $["-iv0"] _ ];
define RuTVCVbVb ["-CR.TV" <- "-VCR" ||
["-tv0"|"-iv0"] $["-tv0"] _ ];

R17 has two examples of valence application, one for "noun + verb" compounds and another for "verb + verb" compounds. Each of them have a rule for transitive and another for intransitive valences. The tag "-NVCR", that was added to the compound in rule formCNnVbROOT (D21), is transformed into "-CR.IV" when preceded by the sequence of neutralised tags "-nn0" ?* "-iv0" (?* indicates zero or more elements in between). This process establishes the intransitive valence for this compound. Transitive process is analogous.

When the compound is made up of two adjectives, two nouns or two verbs, we need to process the compound in a way to not accept equal roots, in which case it would not be a compound but a reduplicated root.

Definition 22

Verb + verb compound
define formCVbVbROOT [[%< [TVROOT|IVROOT] %#
%< [IVROOT|TVROOT] %#]"-VCR"];
define neutCVbVb [NeutIv .o. NeutTv];
define preCVbVbROOT [_eq(formCVbVbROOT,
%< , %#)];
define CVbVbSTEM [RuIVCVbVb .o. RuTVCVbVb .o.
[neutCVbVb .o. formCVbVbROOT - preCVbVbROOT]];

As in the case of the "noun + verb" compound (D21), in D22 the first rule formCVbVbROOT defines the elements and their order in the compound, but it also adds some marks to the roots. Both roots are marked with < on the left and # on the right: "%< ROOT %#". The % (percentage) escapes the symbols to read them literally. Then, the whole form is enclosed in brackets and the tag "-VCR" is attached to it. The next rule neutCVbVb, defines the tags to be neutralised (see Neutralisation of tags., p. 5.2.1).

Rule preCVbVbROOT filters from the output side of
formCVbVbROOT all those strings where some sub-strings occurring between the delimiters < and # are different. This rule404040"_eq(X,L,R)
Filters from the output side of X all those strings where some sub-strings occurring between the delimiters L and R are different. Example:
Consider the language %< a* b %> %< a b* %>, which contains an infinite number of strings:
<b><a> <b><ab> <ab><a> <ab><ab> <ab><abbb> ...
However, only one of the strings in this language has identical sub-strings between all instances of < and >, namely <ab><ab>. Hence, the language containing the single string
<ab><ab>
is produced by the regular expression:
_eq(%< a* b %> %< a b* %> , %< , %>) ;
This operation is mostly used to model reduplication in natural language lexicons. Usually, the bare words to be reduplicated are marked with delimiters, say < and >, after which one can produce the reduplicated forms. For example:
define Lexicon {cat}|{dog}|{horse};
define RLexicon %< Lexicon %> (%- %< \[%<|%>]+ %>);
regex _eq(RLexicon, %<, %>) .o. %<|%> -> 0 ;
and now we get:
foma[1]: lower-words cat cat-cat dog dog-dog horse horse-horse."
[Hulden, M. in https://code.google.com/archive/p/foma/wikis/
RegularExpressionReference.wiki
].
is meant to treat reduplicated roots, but we have modified it a little, so we can apply it to the compounds in order to not analyse reduplication as composition. Actually, what we do is subtract from the form (defined by formCVbVbROOT) the result of the calculus made at preCVbVbROOT, obtaining only those forms where both members are different.

Finally, CVbVbSTEM holds the result of applying neutralisation, valence definition (see Valence in compounds., p. 5.2.1) and the subtraction explained in the previous paragraph.

All type of stems will be later collected under the rule VERBSTEM (D28, p 28), where CMPVBVAL (see table 5 and D20) is one of them.

Stems formed with a verbaliser suffix.

One more degree of complexity is given by the necessity of some single roots or compounds of adding a verbalising suffix in slot 36 (see section 3.1.1 3.1.1 Verbalisers (slot 36), p. 3.1.1) to be used as verb stems. Single roots that need this kind of suffix are adjectives, adverbs, nouns, numerals, onomatopoeia, proper nouns and question forms. Reduplicated roots of any category also need these suffixes, which are called "stem formative" in this case (see section 3.1.2 3.1.2 Stem formative (slot 36), p. 3.1.2). Compounds where none of the two roots forming them is a verb, also need a verbaliser in slot 36.

Table 6 summarises what have been explained in the previous paragraph.

Stem
Verbalisers
Slot 36
Suffixes
Non-verbal root +VRB +Suffixes
           Non-verbal compound +VRB +Suffixes
Reduplicated root +SFR +Suffixes
Table 6: Simple stem forms: 1o complexity
Single non-verbal roots.

They need a verbaliser to become verbal stems; see them collected in definition D24, encoded as
SPNVBROOT. Verbalisers, slot 36, are encoded in D23 below.

Definition 23

Verbalisers (slot 36)
["+VRB"{.Ø36}] : 0
| ["+VRB"{.nge36}"-IV"] : [{ng}"@EY"]
| ["+VRB"{.tu36}] : {tu}
| ["+VRB"{.ntu36}] : ["@N"{tu}]
| ["+VRB"{.l36}] : l
| ["+VRB"{.ye36}] : {ye};

Definition 24

Single non-verbal roots + verbaliser
define SPNVBROOT [AJROOT|AVROOT|NROOT|NUROOT|
PROPN|QROOT] SVRB;

SPNVBROOT states that any of the single roots it collects must be followed by a verbaliser (collected under SVRB) in order to occur with verbal suffixes. R18 exposes the rules that regulate the suffixation of verbalisers by category (see section 3.1.1, p 3.1.1):

Exercise 18

Non-verb roots forming verb stems
define RuAj [["-AJ"|"-CAJ"] =>
_ ?* [{.Ø36}|{.l36}|{.nge36}|{.ntu36}]];
define RuAv [["-AV"|"-CAV"] =>
_ ?* [{.Ø36}|{.l36}|{.nge36}|{.ntu36}]];
define RuNn [["-NN"|"-PN"|"-CNN"|"-CPN"] =>
_ ?* [{.Ø36}|{.nge36}|{.tu36}|{.ye36}]];
define RuNu ["-NU" =>
_ ?* [{.Ø36}|{.l36}|{.nge36}]];
define RuQc [["@Q1" => _ ?* [{.Ø36}|{.ye36}]]
.o. ["@Q2" => _ ?* {.Ø36}]
.o. ["@Q3" => _ ?* [{.Ø36}|{.nge36}]]];
define RuQt ["-QT" => _ ?* [{.l36}|{.ntu36}]]

Rule RuAj in R18 allows adjectives, compounds made of two adjectives and complex adjective stems (see reftp:59 Complex single root stems., p. 5.2.1) to be completed as verbal stems by suffixes -Ø-, -l-, -nge- or -ntu-, slot 36 (see section 3.1.1 3.1.1 Verbalisers (slot 36), p. 3.1.1).

Rule RuAv allows adverbs and complex adverb stems (see 5.2.1, p. 5.2.1) to be completed as verbal stems by suffixes -Ø-, -l-, -nge- or -ntu-, slot 36.

Rule RuNn allows nouns, proper nouns, nominal compounds and complex noun stems (5.2.1, p. 5.2.1) to be completed as verbal stems by suffixes -Ø-, -nge-, -tu- or -ye-, slot 36.

Rule RuNu allows numerals to form verbal stems with suffixes -Ø-, -l- or -nge-, slot 36.

Rules RuQc and RuQt regulate verbalising suffixes for question roots, there are only four question roots and they have diverse behaviour, so they have been encoded distinctively, as shown in D25:

Definition 25

Question roots
["-QC""@Q1"{.chem_qué_cuál}]:{chem}
|["-QC""@Q2"{.chuchi_cuál}]:[{chuchi}|{tuchi}]
|["-QC""@Q3"{.chum_cómo}]:{chum}
|["-QT"{.tunte_cuánto}]:{tunte};

Question root tagged "@Q1" is verbalised by suffixes -Ø- and -ye- (see E59 and E59, p. E59).

Question root tagged "@Q2" is verbalised by suffix -Ø- (see E61, p. E61).

Question roots tagged "@Q3" are verbalised by suffixes -Ø- and -nge- (see E62 and E63, p. E62).

Question root identified by -QT is verbalised by suffixes -Ø-, -l- and -ntu- (see E64, E65 and E66, p. E64).

All restrictions encoded in R18 are applied to
SPNVBROOT (D24) by means of a new rule, SPNVBSTEM (D26) displayed below, which in turn is collected by VERBSTEM (see D28, p 28).

Definition 26

Verbalisable single non-verbal roots
define SPNVBSTEM [RuAj .o. RuAv .o. RuNn .o.
RuNu .o. RuQc .o. RuQt .o. SPNVBROOT];

Non-verbal compounds.

One type is made up by two adjectives, which is recognised as an adjectival compound; another types are "adjective + noun", or two nouns, both recognised as nominal compounds. Another compound, not registered by Smeets, but present in other authors’ texts, is "numeral + noun", also recognised as nominal compound.

As single non-verbal roots, these compounds may be verbalised by a suffix of slot 36. The same suffixes that verbalise single adjectives, verbalise also adjective compounds. The same suffixes that verbalise single noun roots, verbalise nominal compounds. These are collected in their own rule: CPNVBROOT in D27. Then, CPNVBSTEM applies verbalisation restrictions:

Definition 27

Non-verbal simple compounds + verbaliser
define CPNVBROOT [CAjAjROOT|CAjNnROOT|
CNnNnROOT|CNuNnROOT] SVRB;
define CPNVBSTEM [CLEANu .o. RuAj .o. RuNn
.o. CPNVBROOT .o. CLEANd];

Forms resulting from CPNVBSTEM are also collected in VERBSTEM (see D28, p 28).

Reduplicated root stems.

As shown in table 6 (p 6), reduplication, even verbal one, needs what Smeets calls a stem formative (slot 36) to further attach verb suffixes. We explain the case of nominal reduplication encoding, which is analogous to the other two types, verbal and onomatopoeic.

Exercise 19

Nominal root reduplication
define NROOTNT [NeutNn .o. NROOT];
define NROOTx2 [%< NROOTNT %>"-RNNR"];
define InsNRoot [[..] -> %< NROOTNT %> ||
%> $[_] "-RNNR"];
define APPLYNN [NROOTx2 .o. InsNRoot];
define REDNNROOT [0 <- %<|%> .o.
_eq(APPLYNN, %<,%>) .o. %<|%>|"-RNNR" -> 0];

First rule in R19 neutralises the nominal tag (see 5.2.1, p. 5.2.1). Second rule marks the reduplicated element and adds a tag to the entire structure. In InsNRoot, [..] (Epsilon modifier414141Epsilon modifier [..]
The LHS of a rule may be wrapped in the epsilon modifier, in which case any epsilons on the LHS get a special interpretation, where only one empty string is assumed to exist between each symbol in the input string. For example, the rule:
[.a*.] -> x will produce a transducer that maps the input string a unambiguously to xxx.
Also, [..] will simply produce a rule that inserts one instance of the RHS whenever the context is matched:
[..] -> x will map aaa to xaxaxax.
[Hulden, M. in https://code.google.com/archive/p/foma/wikis/
RegularExpressionReference.wiki
].
) produces a rule that inserts one instance of < NROOTNT > in between > and "-RNNR", which is the right side of the form defined in the previous rule NROOTx2. Rule APPLYNN combines and applies previous configurations. Finally REDNNROOT cleans < and > from the grammatical representation, filter the form out of the previous rule, and clean any tag from the lexical side, to end up in a clean analysis (see examples E32, p. 32; E48, p. 48; E146, p. 146 and E151, p. 151):

Exercise 20

Reduplicated roots stem formation
define REDSTEMS [[REDONROOT|REDVBROOT|
REDNNROOT] SSFR];
define RuRdOnSt ["-RONR" => _ ?* {.Ø36}];
define RuRdVbSt ["-RVBR" => _ ?* [{.Ø36}|
{.nge36}|{.tu36}|{.ye36}]];
define RuRdNnSt ["-RNNR" => _ ?* [{.nge36}|
{.tu36}]];
define RDROOTSTEM [RuRdOnSt .o. RuRdVbSt .o.
RuRdNnSt .o. REDSTEMS];

R20 assigns the appropriate +SFR to each type of reduplicated root to convert them into verbal stems. Reduplicated noun, onomatopoeia and verb roots forming stems are collected in RDROOTSTEM, which in turn will be part of the VERBSTEM definition (see D28, p 28).

Complex single root stems.

As we have explained before (Neutralisation of tags., p. 5.2.1), stems made up by a single root, a compound or a reduplicated root that incorporates at least one suffix (rarely more than three) into the structure are considered "complex stems".

Complex single root stems (one root plus one or more suffixes forming a verb stem, see 3.2, p. 3.2) that we encode are adjectival, adverbial, nominal, numeral, questions and nominalised verbs (for the later see sections 3.1.7 3.1.7 Verb inflectional nominalisation, p. 3.1.7; and 3.1.8 3.1.8 Verb derivational nominalisation, p. 3.1.8).

We explain here the complex nominalised verb stem. The other ones follow the same procedure with the appropriate rules for their category; they were listed in item 3.2, p. 3.2 as "Complex single root stems".

Exercise 21

Complex nominalised verb stem: 1st step
define formCXVBROOT [[IVROOT|TVROOT] (CA)
(TRFAC) (REF) (ST) (HH) (NRLD)
[FLECNOM|NMZ] SVRB];

In R21 we have a composition of 9 rules, some of them including two or three sub-rules. As in the treatment of compounds, we first define the form and order of elements in the stem. In this case formCXVBROOT states that the stem begins with a transitive or intransitive verb root. Then there is a series of suffixes that are optional, which means that they can co-occur (rarely more than three) in any combination, respecting the order. These suffixes are causative -l- or -m-, slot 34; factitive -ka- or transitivizer -tu-, slot 33; reflexive/reciprocal -w-, slot 31; stative -le-, slot 28; hither -pa-, slot 17; and non-realised situation -a-, slot 9. Then come the obligatory nominalisers, those may be inflectional (see 3.1.7, p. 3.1.7) or derivational (see 3.1.8, p. 3.1.8). A verbaliser, slot 36, completes the stem.

Exercise 22

Complex nominalised verb stem: 2nd step
define neutCXVb [NeutAdjdo .o. NeutAdjqe .o.
NeutCa .o. NeutFac .o. NeutHh .o. NeutNrld .o. NeutNomag .o. NeutPvn .o. NeutRef .o. NeutSt
.o. NeutSvn .o. NeutTr .o. NeutIv .o. NeutTv]; define CXVBROOT [neutCXVb .o. formCXVBROOT];
define RuCxV01 [$[["-iv0"|"-tv0"] ?* ["+OVN"
|"+IVN"|"+TVN"|"+AVN"|{.Ø4}|"+CSVN"|"+NOMPI"
|"+NOM"]]];

Tag neutralisation of all the members in the stem, and the application to the stem form comes in the second and third rules. Rule RuCxV01 specifies which of the nominalising suffixes do not form part of this stem. Those are not neutralised because while forbidding them, they need no further interaction rules.

Exercise 23

Complex nominalised verb stem: 3rd step
define RuCxV02 [["+pvn0"|"+nomag0"|"+adjqe0"]
=> _ ?* [{.Ø36}|{.nge36}]];
define RuCxV03 [[{.lu4}|"+adjdo0"]
=> _ ?* {.Ø36}];
define RuCxV04 [["+ca0" => _ ?* "+adjdo0"]
.o. ["+ref0" => _ $["+pvn0"] {.nge36}] .o.
["+tr0" => _ $["+nomag0"|"+pvn0"] {.nge36}]
.o. ["+st0" => _ ?* "+svn0"]];
define RuCxV05 [$["+ca0" ?* ["+tr0"|"+fac0"|
"+ref0"|"+st0"|"+nrld0"|{.Ø4}]]] .o.
[$[["+tr0"|"+ref0"] ?* ["+st0"|"+nrld0"|
{.Ø4}]]];

Rules RuCxV02, RuCxV03, RuCxV04, RuCxV05 regulate the interaction of all possible suffixes in the stem, including the verbalisers.

Exercise 24

Complex nominalised verb stem: 4th step
define CXVBSTEM [RuCxV01 .o. RuCxV02 .o.
RuCxV03 .o. RuCxV04 .o. RuCxV05 .o. RuCCXVbSt
.o. RuPr50 .o. CXVBROOT];

Final rule CXVBSTEM compiles all together producing the final possible forms for this type of stem. All complex single root stems are also collected in
VERBSTEM (see D28).

Complex reduplicated root stem.

As it was explained with example E58, p. 58, this stem is listed as a single root complex stem because it is "one" root and "one" stem that reduplicate, i.e., the whole stem reduplicates. This form was not encoded as a compound nor as a single root, but in the section that deals with reduplicated roots. The difference between the rule for this case and the one presented in 19 for nominal reduplication, is that the root is encoded together with the suffix, and that the identifying tag suits with the category of the stem, see below:

Exercise 25

Verbal root reduplication
define VBROOTNT [[IVROOT|TVROOT] (CA)];
define VBROOTx2 [%< [IVROOT|TVROOT] (CA)
%>"-RVBR"];
define RuVbCA [$[["-IV"|"-TV"] ?* {.l34}]];
define InsVBRoot [[..] -> %< VBROOTNT %> ||
%> $[_] "-RVBR"];
define APPLYVB [RuVbCA .o. VBROOTx2 .o.
InsVBRoot];
define REDVBROOT [0 <- %<|%> .o. _eq(APPLYVB,
%<,%>) .o. %<|%>|"-RVBR" -> 0];

The differences we have mentioned in the paragraph above are found in the line starting with "define VBROOTx2", where there is an optional suffix CA (causative), and the corresponding tag for the reduplicated verb root "-RVBR". The causative suffix -üm- is the only one found in a reduplicated stem, at least in Smeets’ texts.

Complex compound stems.

Basically, this type of stem is formed in the same way as the "complex single root stem" (p. 3.1.5), but implicating two roots. Complex compound stems (see p. 3.2) that we encode are:

  • adjective (+ suffixes)424242Parenthesis express optionality. + noun (+ suffixes): see E192;

  • adjective (+ suffixes) +verb +nominaliser, see E193;

  • adverb (+suffxes) + verb, see E188;

  • noun (+suffxes) + noun (+suffxes), see E194;

  • noun (+suffxes) + verb, see E189;

  • verb (+suffxes) + noun, see E191;

  • verb (+suffxes) + verb, see E190;

All complex compound stems are collected together with simple compounds in CMPVBVAL, see D20. And as for the previous types of stems, the later ones are also collected in rule VERBSTEM (D28), which is summarised in table 7:

Definition 28

Verb stems
define VERBSTEM [IVROOT|TVROOT|CXVBSTEM|
CXNNSTEM|CXNNSTEM2|CXAJSTEM|CXAJSTEM2|
CXAVSTEM|CXNUSTEM|CXQUSTEM|RDROOTSTEM|
CMPVBVAL|CPNVBSTEM|SPNVBSTEM];

  • IVROOT: Intransitive verb root

  • TVROOT: Transitive verb root

  • CXVBSTEM: Complex verb root stem (R21)

  • CXNNSTEM: Complex noun root stem

  • CXNNSTEM2: Complex noun root stem (form 2)

  • CXAJSTEM: Complex adjective root stem

  • CXAJSTEM2: Complex adjective root stem (form 2)

  • CXAVSTEM: Complex adverb root stem

  • CXNUSTEM: Complex numeral root stem

  • CXQUSTEM: Complex question root stem

  • RDROOTSTEM: Reduplicated root stems (R20)

  • CMPVBVAL: Verbal compound stem with valence (D20)

  • CPNVBSTEM: Verbalised non-verbal compounds (D27)

  • SPNVBSTEM: Verbalised single non-verbal roots (D26)

Stem
Verbalisers
Slot 36
Suffixes
Verbal root +Suffixes
Verbal root + Verbal root +Suffixes
Verbal root + Non-verbal root +Suffixes
Non-verbal root + Verbal root +Suffixes
Non-verbal root +VRB +Suffixes
Non-verbal compound +VRB +Suffixes
Reduplicated root +SFR +Suffixes
Root + suffixes +VRB + Suffixes
Root + suffixes + Root +VRB + Suffixes
(Root + suffix) reduplicated +SFR + Suffixes
Root + suffixes + Root + suffixes +VRB + Suffixes
Table 7: Mapudüngun stems

Different conformations of stems where identified in section 3.2 3.2 Verb stems, p. 3.2; in this point, we expose the rules that regulate the interaction among the elements introduced above, roots (section 4.7) and suffixes (section 4.8), which take part of the different types of stems.

Complex compound stems

"Adverb + optional causative + optional transitivizer + verb"
Rule:434343Rules are presented here in a simple way, just to show the elements involved, but actually, rules are much more complex in the system because they have to deal with the generation of the compounds, the addition of tags to carry out the processes, and the elimination of these tags once used. See this example of one of the simplest rules in the FST script, which does not have to deal with the addition of a verbaliser because there is a verb root implied:
### Question / Verb
define ensCQtVbROOT
[[%< QROOT %# %< [IVROOT2|TVROOT2]]"-QVCR"];
define neutCQtVb
[NeutIv .o. NeutQc .o. NeutQt .o. NeutTv];
define preCQtVbROOT [_eq(ensCQtVbROOT, %< , %#)];
define CQtVbSTEM [RuIVCQtVb .o. RuTVCQtVb .o. RuCCXVbSt .o. [neutCQtVb .o. formCQtVbROOT]];
define CMPVBVAL [CLEANu .o. CQtVbSTEM .o. CLEANd];
AVROOT (CA) (TR) [IVROOT2|TVROOT2];

Example 188

[Smeets, I. 2008: 387 (26)] RefB:21
ñi pülle-tu-pe-lu ’he came close to see’
-SP.ñi_mi_su
-AV.pülle_cerca+TR.tu33-TV.pe_ver+SVN.lu4

"Noun + optional transitivizer or factitive + verb"
Rule: NROOT (TRFAC) [IVROOT2|TVROOT2];

Example 189

[Smeets, I. 2008: 358 (5)] RefB:21
trari-ntuku-künu-nge-ke-fu-y ’they were caught and left tied up’
-NN.trari_amarra-TV.tuku_poner+PFPS.künu32
+PASS.nge23+CF.ke14+IPD.fu8+IND.y4+3.Ø3

"Verb + optional experiencer + optional causative + optional transitivizer or factitive + optional reflexive + optional hither or locative + verb"
Rule: [TVROOT|IVROOT] (EXPOO) (CA) (TRFAC)
(REF) (HHLOC) [IVROOT2|TVROOT2];

Example 190

[Smeets, I. 2008: 408 (28)] RefB:21
ñi ru-pa-aku-lu ’he has gone by’
-SP.ñi_mi_su
-IV.ru_pasar+HH.pa17-IV.aku_llegar-CR.IV
+SVN.lu4

"Verb + optional causative + optional transitivizer or factititve + optional reflexive + optional hither + noun"
Rule: [TVROOT|IVROOT] (CA) (TRFAC) (REF)
(HH) NROOT2;

Example 191

[Smeets, I. 2008: 456 (8)] RefB:21
kim-el-tu-che-ke-fu-y ’he used to teach people’
-TV.kim_saber+CA.l34+TR.tu33-NN.che_persona
+CF.ke14+IPD.fu8+IND.y4+3.Ø3

"Adjective + transitivizer or factitive + noun + optional derivational nominaliser +VRB"
Rule: AJROOT (TRFAC) NROOT2 (NMZ) SVRB;

Example 192

[Smeets, I. 2008: 90 (36)] RefB:21
wisa-ka-sungu-n, ta eymi ’what a dirty talker you [are]!’
-AJ.weda_malo+FAC.ka33-NN.düngu_palabra
+VRB.Ø36+PVN.n4
-AP.ta_el
-PP.eymi_tu

"Adjective + optional transitivizer or factitive + verb + derivational nominaliser +VRB"
Rule: AJROOT (TRFAC) [IVROOT2|TVROOT2] NMZ SVRB;

Example 193

küme-ka-puru-fe-nge-y ’he is (always) a good dancer’
-AJ.küme_bueno+FAC.ka33-IV.puru_bailar
+NOMAG.fe+VRB.nge36-IV+IND.y4+3.Ø3

"Noun + optional transitivizer or factitive or non class-change suffixes + noun + optional non class-change suffixes or derivational nominalisers +VRB"
Rule: NROOT (TRFAC|NCC) NROOT2 (NCC|NMZ) SVRB;

Example 194

[Smeets, I. 2008: 459 (36)] RefB:21
ta-yiñ pu peñi-wen-lamngen-wen-nge-n ’we are all related as brothers and sisters’ lit: ’this is our brothers relation sisters relation’
-AP.ta_este-SP.yiñ_nuestro-COLL.pu
-NN.peñi_hermano+REL.wen-NN.lamngen_hermana
+REL.wen+VRB.nge36-IV+PVN.n4

5.3 Morphotactics of verb suffixes

In section "3.1 3.1 Verb suffixes", p. 3.1, we have explained that suffixes belonging to the same slot are mutually exclusive. There are about eighty verbal suffixes spread in thirty-six slots. Some suffixes exclude others for grammatical or semantic reasons, for example, once a verb has taken an inflectional nominaliser, slot 4, it can not take suffixes of mood (slot 4), person (slot 3) and number (slot 2).

To start treating suffixes co-occurrence, we first established the suffix sequence with all the possible variants generated by suffix mobility (see 3.1.5, p. 3.1.5 and 5.3.4, p. 5.3.4), see next rule:

Definition 29

Verb suffixes
define VERBSUFFIX [(REF) (EXPOO) (PASS) (REF)
(TR) (CA) (REF) (TRFAC) (FORCE) (BEN) (FORCE)
(PRPSPFPS) (REF) (HH) (CIRCINT) (PLAYSIM)
(MIO) (STPR) (BEN) (OS) (IMMSUD) (PLR) (IO)
(PASS) (FORCESAT) (PLR) (FORCE) (TH)
(PASS1A2A) (PLAYSIM) (IMMSUD) (TH) (PS) (ITR)
(HHLOC) (TH) (PS) (REF) (RE) (RECONT) (PLPF15) (CF14) (PX) (REP) (RE) (AFF) (NEG) (NRLD)
(IPD) (PLPF07) (EIDO) (CF05)
[[[(MOOD) [PERSON|PTMT] (NUMBER)] (DS)]|
[[FLECNOM|NMZ] (DS) (NCC) (CC) (INST)]]];

The names or tags appearing in D29 encode the suffixes assigned to each slot (see 4.8 4.8 Suffixes encoding, p. 4.8).

The first thing that may call the attention is repetition of some tags in different positions, e.g. PASS, REF, FORCE, IMMSUD, etc. This is to deal with suffix mobility (see 3.1.5, p. 3.1.5 and 5.3.4, p. 5.3.4).

Also note that almost all suffixes are marked as optional, they are between parenthesis, except for PERSON, PTMT, FLECNOM and NMZ. The Mapuche verb is either finite (PERSON and PTMT) or nominalised (FLECNOM or NMZ). These are the obligatory suffixes for those forms.

Methodology.

To encode suffixes occurrence in the Mapuche verb form, we started incorporating the minimal verb form, i.e., an intransitive verb root plus suffixes expressing mood, person and number (see annex 11.3 "11.3 Conjugation of the intransitive verb küpa- ’to come’", p. 11.3), which are obligatory in a finite verb form. We continued adding the transitive verb related suffixes. So, we first established a set of rules dealing with the minimal forms for both, intransitive and transitive verbs (see annex 11.4 "11.4 Conjugation of the transitive verb pi- ’to say (to tell)’", p. 11.4).

5.3.1 Verb paradigms

In D29 above, the last two lines reflect the two forms a verb may take. Penultimate line corresponds to finite forms; in slot 4 is mood, in slot 3 is person or the portmanteau morphs444444"Portmanteau morphs which include a subject marker are assigned subject position (slot 3)" [Smeets 2008: 152] RefB:21 . We have also seen that assigning portmanteau morphs in this position allows the conditional marker, obligatory in negative imperative forms, appears in its natural position, slot 4 for mood. (see slot-03PTMT.aff in annex 11.2.1, p. 8.2.1); in slot 2 is number, and dative subject (used in transitive forms) is in slot 1.

It was also necessary to incorporate suffixes assigned to slots 23 and 6, as they complete the transitive verb paradigm (see section 3.1.6 3.1.6 Verb paradigms, p. 3.1.6), and negation suffixes in slot 10, even though they are not strictly obligatory and occur in transitive and intransitive forms, they complement with mood suffixes and have a particular incidence in the case of imperative negative forms, (see annex 11.5 "11.5 Negative imperative forms of the transitive verb pi- ’to say (to tell)’", p. 11.5).

Slot 23 10 6 4 3 2 1
Itr - neg. - mood
pers.
ptmt
num. -
Tr agent neg. obj. mood
pers.
ptmt
num.
dative
subj.
Table 8: Intransitive and transitive suffixes per slot

Table 8 shows suffixes per slot454545"It is remarkable that the subject-object paradigm is completed with suffixes which occupy a position in between derivational suffixes, away from the inflectional block at the end of a verb form. The suffixes -mu- +2A and -w- +1A share their position, slot 23, with the passive marker -nge- [Smeets, I. 2008: 161] RefB:21 . implied in transitive and intransitive Mapuche verbs. Not all suffixes in table 8 co-occur in a transitive form, for instance, agent markers (slot 23) do not co-occur with direct objects (slot 6) or dative subjects (slot 1).

To regulate the verbal paradigms, thirty-three rules were necessary, some of them containing sub-rules, and some including the interaction with inflectional nominalisers, slot 4. No reference to mood, person or number may be made when a verb takes one of the nominalisers, but nominalised verbs may include agents (E195, E196) or objects with the corresponding dative subject (E195, E198):

Example 195

[Smeets, I. 2008: 269 (11)] RefB:21
mütrüm-uw-lu ’his calling to’
-TV.mütrüm_llamar+1A.w23+SVN.lu4

Example 196

[Smeets, I. 2008: 269 (14)] RefB:21
fey-pi-mu-a-fiel ’what you will tell me’
-TV.feypi_decir+2A.mu23+NRLD.a9+TVN.fiel4

Example 197

[Smeets, I. 2008: 394 (38)] RefB:21
chem-pi-e-t-ew ’what they where told by’
-QC.chem_qué-TV.pi_decir-CR.TV
+IDO.e6+AVN.t4+DS3A.ew1

Example 198

[Smeets, I. 2008: 485 (5)] RefB:21
pe-fi-lu iñche ’at my seeing her’
-TV.pe_ver+EDO.fi6+SVN.lu4
-PP.iñche_yo

Exercise 26

Dependency rule 1
define RuDp01
[["+DS3A"|"+DS12A"] => "+IDO" ?* _ ];

R26 is what we call a "dependency" rule, it says that for suffixes +DS3A and +DS12A to occur it must previously occur the suffix +IDO, i.e., +DS3A and +DS12A depend on +IDO occurrence.

Exercise 27

Prohibition rule 10
define RuPr10 [$["+CND" ?* [["+1"{.Ø3}]
| ["+3"[{.Ø3}|{.ng3}]]]]];

R27 is a prohibition rule. The combination of symbols $464646X calculates the complement of X, i.e. finds all the elements in the group that are not part of X, or that are not X. $X denotes the language that contains a sub-string drawn from the language X [Hulden, M. in https://code.google.com/archive/p/foma/wikis/
RegularExpressionReference.wiki
].
may be read as "it can not be the case that", and the rest of this regexp is read as "the conditional is followed by a 1st person suffix in its null form -Ø-, or the 3st person suffix in its forms null or -ng-.

Exercise 28

Obligation rule 9
define RuOb09 [[["+NEG"[{.ki10}|{.kino10}]]
=> _ ?* "+CNI"];

R28 is an obligation rule which regulates the obligatory occurrence of the conditional marker when there is a negation in the imperative form (see e.r. 11.2.6 and annex 11.5 11.5 Negative imperative forms of the transitive verb pi- ’to say (to tell)’, p. 11.5).

5.3.2 Nominalised verbs

Last line of D29 (p. 29), reflects the form of a nominalised verb, either by inflectional (see section 3.1.7 3.1.7 Verb inflectional nominalisation, p. 3.1.7) or derivational (see section 3.1.8 3.1.8 Verb derivational nominalisation, p. 3.1.8) nominalisers. In both cases a nominalised verb may be followed by a dative subject (see E197), a non class-changing suffix (see NCC.aff in annex 11.2.4, p. 11.2.4), a class-changing suffix (see CC.aff in annex 11.2.4, p. 11.2.4), or the instrumental suffix (see section 3.1.10 3.1.10 Instrumental object suffix -mew, p. 3.1.10).

To regulate verb nominalisation twelve rules were added. Note that these rules regulate co-occurrence among the suffixes of "[FLECNOM|NMZ] (DS) (NCC) (CC) (INST)",
and some times with suffixes from other slots; but in general, there are other rules to deal with co-occurrence of these suffixes, or the ones belonging to the transitive and intransitive paradigms, and the derivational ones.

Exercise 29

nominalisation prohibition for completive subjective verbal noun
define RuPr19 [$["+CSVN" ?* ["+DS3A"|
"+DS12A"|"+INST"|"+ADJ"]]];

R29 forbids dative subject suffixes (slot 1), instrumental, or adjectivizer (class-changing suffix), to appear when the verb has been nominalised by the "completive subjective verbal noun" (slot 4).

Exercise 30

Obligation for agentive verbal noun
define RuOb12 ["+AVN" =>
"+IDO" $[_] ["+DS3A".ew1]];

R30 forces the "agentive verbal noun" (slot 4) to occur together with the "internal direct object" (slot 6) and the "dative subject for 3rd person agent" (slot 1) in its form -ew (see E22, p. 22 and E197, p. 197).

Exercise 31

Only plain verbal noun may be adverbialized
define RuDp05 ["+ADV" => "+PVN" ?* _ ];

R31 states that the class-changing suffix -tu may only adverbialize a verb nominalised by the "plain verbal noun" -n- (see E34, p. 34). In other words the adverbializer depends on the "plain verbal noun" to occur with a verb.

Examples of inflectionally nominalised verbs may be found through e.r. 11.2.6, e.r. 11.2.6, e.r. 11.2.6, e.r. 11.2.6, e.r. 11.2.6, e.r. 11.2.6, e.r. 11.2.6 and e.r. 11.2.6.

Examples of derivationally nominalised verbs are: E29, E30, E31, E32, E87, and the following ones:

Example 199

[Smeets, I. 2008: 314 ()] RefB:21
anü-m-ka ’planting’
-IV.anü_sentar+CA.m34+FAC.ka33+NOM.Ø

Example 200

[Smeets, I. 2008: 314 ()] RefB:21
ül-kantu ’song’
-NN.ül_canto+VRB.Ø36+PLAY.kantu22+NOM.Ø

Example 201

[Smeets, I. 2008: 314 ()] RefB:21
yall-tuku ’illegitimate child’
-NN.yall_hijo-de-un-hombre
-TV.tuku_poner-CR.TV+NOM.Ø

Example 202

[Smeets, I. 2008: 314 ()] RefB:21
ru-pa ’time’
-IV.ru_pasar+HH.pa17+NOM.Ø

Example 203

[Smeets, I. 2008: 312 (8)] RefB:21
angkü-m-tu-we ’poison’, ’device to dry things’
-IV.angkü_secar+CA.m34+TR.tu33+NOMPI.we

5.3.3 Occurrence of suffixes between slots 5 and 35

There are thirty more rules to regulate the occurrence of suffixes that are not obligatory in the minimal transitive or intransitive forms. Most of the rules come from descriptions of the suffixes made by Smeets, for example the rule for the reflexive reciprocal -w-: "The suffix -w- does not combine with a suffix in slot 23, 6 or 1. The reflexive morpheme -w- may occur with intransitive verbs, i.e. with verbs which do not take a suffix in slot 6" [Smeets, I. 2008: 291] RefB:21 ; R32 reflects the previous description:

Exercise 32

Reflexive do not occur in transitive forms
define RuPr48 [$["+REF" ?* ["+REF"|"+PASS"
|"+1A"|"+2A"|"+IDO"]]];

In the description of (non) combinations of +REF, Smeets also mentions suffixes of slot 1. These are not collected in R32 because there is a previous dependency rule (R26, p. 26) stating that dative subjects (slot 1) need the +IDO suffix (slot 6) to occur; as this one is forbidden to occur with +REF the condition does not fulfil for the +DS (slot 1) to occur.

Exercise 33

More involved object obligatory contexts
define RuOb16 ["+MIO" => "+CIRC" ?* _ ,
_ ?* ["+PASS"|"+EDO"|"+TVN"]];

The rule presented in R33 derives from what we have found in Smeets’ examples, there are no explicit combination rules for the more involved object suffix labelled +MIO. It is important to rule this suffix due to its form -l- after vowel, -ül- after consonant or semivowel, sometimes -el- after r. These forms coincide with other suffixes forms like stative’s or causative’s ones, which are proximate in their occurrence position, therefore, they may be erroneously identified.

There are fourteen examples given by Smeets, where +MIO (slot 29) is present. In six of them is preceded by +CIRC (slot 30), circular (erratic) movement suffix -iaw-. There are another six where it co-occurs with +EDO (slot 6), external direct object suffix -fi-. One where it co-occurs with +PASS (slot 23), passive -nge-. And one more where it co-occurs with +TVN (slot 4), transitive verbal noun -fiel-.

+MIO "indicates a more direct, intense or complete involvement of the patient in the event" [Smeets, I. 2008: 287] RefB:21 ; +CIRC "denotes an ongoing event which involves movement in no particular direction" [Smeets, I. 2008: 288] RefB:21 .

It is not clear for us the semantic or grammatical relation between +MIO and +CIRC, but when these two suffixes co-occur, nor +PASS nor +EDO occur. On the other hand, the other three suffixes in the rule have a grammatical relation with +MIO. For +EDO, the external direct object, +MIO gives a further degree of prominence to the object, see following example:

Example 204

[Smeets, I. 2008: 288 (6)] RefB:21
koyla-tu-künu-l-fi ’I lied to him’
-NN.koyla_mentira+VRB.tu36+PFPS.künu32
+MIO.l29+EDO.fi6+IND1SG.n3

The same happens with the objects denoted in a passive +PASS construction (E205), or in a transitive +TVN clause (E206). In the case of a verb nominalised by the transitive verbal noun suffix, +CIRC and +TVN may co-occur (E207), rule R33 does not prevent it.

Example 205

[Smeets, I. 2008: 397 (62)] RefB:21
yiñ ngünen-ka-l-nge-we-no-a-m ’we are no longer deceived’
-SP.yiñ_nuestro
-NN.ngünen_engaño+VRB.Ø36+FAC.ka33+MIO.l29
+PASS.nge23+PS.we19+NEG.no10+NRLD.a9+IVN.m4

Example 206

[Smeets, I. 2008: 288 (5)] RefB:21
eymi mi wirar-ül-meke-ke-fiel-mew iñche ’you are always shouting at me’
-PP.eymi_tu -SP.mi_tu_tuyo
-IV.wirar_gritar+MIO.l29+PR.meke28+CF.ke14
+TVN.fiel4+INST.mew

Example 207

[Smeets, I. 2008: 398 (5)] RefB:21
ñi küdaw-kiaw-ül-el-fiel pu ülmen ’he worked around for the rich people’
-SP.ñi_mi_su
-IV.küdaw_trabajar+CIRC.iaw30+MIO.l29
+BEN.el27+TVN.fiel4
-COLL.pu -NN.ülmen_adinerado

Exercise 34

Circular movement context restrictions
define RuPr41 [$[["+CIRC"|"+INT"|"+ST"|"+PR"] ?* ["+ST"|"+PR"]]];

5.3.4 Treating suffix mobility

As it was explained in section 3.1.5, p. 3.1.5, some suffixes may occur in different positions, this is call "mobility". To deal with it, we have declared the slot containing the suffix in all the positions it may appear along the suffixes chain (see D29, p. 29). When a slot holds more than one suffix and only one of them is mobile, we have created a new file containing only the mobile suffix. For example, slot 17, encoded in file slot-17.aff, holds hither -pa- and locative -pu- and only +HH is mobile. We have created the file slot-17M.aff that holds the hither only, and is declared as HH in the script. In D29, HH is found once alone and once together with LOC, as HHLOC, these are the two positions where it may occur; the same stands for the other mobile suffixes.

In R34 above, also in R32, p. 32, there are suffixes repeated on both sides of the expression, left and right of the ?* symbols. These are prohibition rules, interpreted as "it can not be the case that", as we have explained for R27, p. 27. So, "it can not be the case that +ST co-occurs with +ST, or +PR with +PR", in R34. And "it can not be the case that +REF co-occurs with +REF", in R32, p. 32. All of this is to avoid that the same suffix may be recognised twice in a verb form, due to being declared in two different positions. These rules are also used to exclude the co-occurrence of these suffixes with different ones for grammatical or semantic reasons, e.g., R34 states that "it can not be the case that +CIRC co-occurs with +ST, one expresses the opposite idea of the other: circular movement / stative; or +INT (intensifier) with +PR (progressive)" .

5.3.5 Over-generation

One of the problems derived from encoding homograph suffixes, or suffixes that do not phonetically realise (null suffixes Ø) is that first ones may be mistakenly recognised, and the seconds could be virtually recognised at any position. To solve these issues we have generated some rules that do not have to do with the Mapudüngun morphotactics, but help in preventing wrong recognition. These rules have derived from the observation of analysing results (see Ambiguity., p. 8.2).

Exercise 35

Forbidden null morpheme sequence occurrence
define RuPr46 [$["+NOM" ?* [["+OVN".Ø4]|
["+SVN"{.Ø4}]|"+IMP"|["+1"{.Ø3}]|["+3"{.Ø3}]|
["+SG"{.Ø2}]|"+DS12A"]]];

Rule R35 forbids the nominaliser which has a null form -Ø- to be followed by other suffixes that also have a null realisation: +OVN, +SVN, +IMP, +1, +3, +SG, +DS12A

Complete forms.

There is another set of eight rules that acts upon the entire verb forms when the stems are verbalised roots, complex stems, compounds or complex compounds. Some suffixes in the stem condition the entire form.

Exercise 36

nominalisation of verbalised noun
define RuPr51 [$[["-nn0"|"-pn0"] $["+VRB"]
$["+OVN"|"+IVN"|"+TVN"|"+AVN"|"+SVN"
|"+CSVN"] "+ADV"]];

The prohibition rule R36 states that a noun or proper noun once verbalised may be nominalised by any of the inflectional nominalisers but not adverbialized. Note that in this rule coexist neutralised tags belonging to the stem with non-neutralised ones that belong to suffixes chain of the verb.

5.3.6 Special roots

In this section we explain how roots of 3.3.2 Deictic verbs, p. 3.3.2, and 3.3.3 Defective verbs, p. 3.3.3, are encoded. Also, some other special cases are explained:

  • List of special roots

  • ["-AV""@SC01"{.fül_cerca}]: {fül}

  • ["-AV""@SC01"{.pülle_cerca}]: {pülle}

  • ["-IV""@SC01"{.llekü_acercar}]: {llekü}

  • ["-IV""@SC02"{.chekod_encuclillar}]: {chekod}

  • ["-IV""@SC02"{.kopüd_yacer-boca-abajo}]: {kopüd}

  • ["-IV""@SC02"{.kudu_yacer}]: {kudu}

  • ["-IV""@SC02"{.külü_apoyar}]: {külü}

  • ["-IV""@SC02"{.llikosh_encuclillar}]: {llikosh}

  • ["-IV""@SC02"{.payla_yacer-de-espalda}]: {payla}

  • ["-IV""@SC02"{.potrong_inclinar-la-cabeza}]:{potrong}

  • ["-IV""@SC02"{.potrü_inclinar}]: [{potrü}|{potri}]

  • ["-IV""@SC02"{.rekül_apoyar}]: {rekül}

  • ["-TV""@SC02"{.ünif_extender}]: ["@G"[{ünif}|{üñif}]]

  • ["-IV""@SC02"{.wira_sentar-en-ancas}]: {wira}

  • ["-IV""@SC03"{.trem_crecer}]: {trem}

  • ["-TV""@SC03"{.kim_saber}]: {kim}

  • ["-IV""@SC04"{.kon_entrar}]: {kon}

  • ["-IV""@SC04"{.tripa_salir}]: [{tripa}|{chipa}]

  • ["-IV""@SC05"{.püra_subir}]: [{püra}|{ñpüra}]

  • ["-IV""@SC06"{.müle_estar_vivir}]: [{müle}|{müli}]

  • ["-TV""@SC06"{.meke_ocupar}]: [{mek}"@EI"]

  • ["-TV""@SC06"{.nie_tener}]: [{nee}|{ne}|[{ni}"@E0"]]

  • ["-IV""@SC07"{.miaw_merodear}]: {miaw}

  • ["-IV""@SC08"{.nge_ser_estar}]: {nge}

  • ["-IV""@SC09"{.pepi_ser-capaz-de}]: {pepi}

  • ["-IV""@SC10"{.ru_pasar}]: {ru}

  • ["-IV""@FA"{.fa_ser-esto}]: {fa}

  • ["-IV""@FE"{.fe_ser-eso}]: {fe}

  • ["-AJ""@CF"{.küme_bueno}]: {küme}

  • ["-AJ""@CF"{.weda_malo}]: [{weda}|{wesha}]

  • ["-IV""@CF"{.aye_reír}]: ["@G"{aye}]

  • ["-IV""@CF"{.lladkü_entristecer_enojar}]: {lladkü}

  • ["-IV""@CF"{.llüka_asustar_temer}]: {llüka}

  • ["-IV""@CF"{.welu_intercambiar}]: {welu}

  • ["-NN""@CF"{.lofo_lobo_salvaje}]: {lofo}

  • ["-TV""@CF"{.yewe_avergonzar_respetar}]: {yewe}

Exercise 37

Realisation contexts for 1st type of defective
verbs

define RuVSC01 ["@SC01" =>
_ $["+CA"|"+TR"] ["+HH"|"+TH"|"+LOC"],
_ ?* ["+CA"|"+ca0"|"+TR"|"+tr0"|"+ST"],
_ ?* ["+HH"|"+TH"|"+LOC"]];

Defective roots,

marked @SC01 have three obligatory contexts of realisation. In the first possible context they have to be followed by causatives -l-, -m- (slot 34) or transitivizer -tu- (slot 33), which in turn have to co-occur with hither -pa-, locative -pu- (slot 17), or thither -me- (slot 20).

In the second possible context for roots tagged @SC01, they have to be followed by causatives -l-, -m- (slot 34), transitivizer -tu- (slot 33) or stative -le- (slot 28). Note that in this second context is found the neutralised tag for causative +ca0 and transitivizer +tr0, this is because the rule also applies when these roots form a complex stem or a complex compound stem:

Example 208

[Smeets, I. 2008: 316 (5)] RefB:21
fül-üm-tuku-fi-n ’I put it closer to’
-AV.fül_cerca+CA.m34-TV.tuku_poner-CR.TV
+EDO.fi6+IND1SG.n3

In the third possible context, these roots must co-occur with hither -pa-, locative -pu- (slot 17), or thither -me- (slot 20):

Example 209

[Smeets, I. 2008: 419 (46)] RefB:21
pülle-pu-el ’going to a near place’
-AV.pülle_cerca+VRB.Ø36+LOC.pu17+OVN.el4

Exercise 38

Realisation contexts for 2nd type of defective
verbs

define RuVSC02 ["@SC02" =>
_ ?* ["-CR.IV"|"-CR.TV"],
_ ?* ["+ST"|"+PRPS"|"+PFPS"|"+LOC"|{.no10}|
"+NRLD"]] .o.
$["@SC02" ?* ["-CR.IV"|"-CR.TV"]
?* ["+ST"|"+PRPS"|"+PFPS"]];

Roots tagged @SC02 have different contexts of realisation too, but they also have forbidden contexts, this is why R38 is a concatenation of two rules. The first rule treats @SC02 roots in compounds, when this roots are before tags -CR.IV, -CR.TV is because they form part of a compound, -CR stands for "compound root":

Example 210

[Smeets, I. 2008: 522 (külü-)] RefB:21
külü-ru-pa-n antü ’when the sun is going down’
-IV.külü_apoyar-IV.ru_pasar-CR.IV+HH.pa17
+PVN.n4 -NN.antü_sol

The second context, obligatorily places @SC02 roots together with stative -le- (slot 28), progressive persistent -nie- or perfect persistent -künu- (slot 32), locative -pu- (slot 17), negation for conditional -no- (slot 10) or non-realised situation -a- (slot 9):

Example 211

[Smeets, I. 2008: 281] RefB:21
rekül-künu-w-üy ’then, he leaned over’
-IV.rekül_apoyar+PFPS.künu32+REF.w31
+IND.y4+3.Ø3

The concatenated prohibition rule states that @SC02 roots can not be part of a compound and form a verb taking the suffixes stative -le- (slot 28), progressive persistent -nie- or perfect persistent -künu- (slot 32).

Exercise 39

Realisation contexts for 1st type of compounded defective verbs
define RuVSC03 ["@SC03" ?* "@SC05"] =>
_ ?* ["+TH"|"+HH"];

R39 obliges compounds made with @SC03 and @SC05 roots to occur together with markers for thither -me- (slot 20) or hither -pa- (slot 17):

Example 212

[Smeets, I. 2008: 262 (9)] RefB:21
kim-püra-me-pa-n ’there I realised’
-TV.kim_saber-IV.püra_subir-CR.IV
+TH.me20+HH.pa17474747In this example both suffixes appear, but only one is obligatory.+IND1SG.n3

Example 213

[Smeets, I. 2008: 381 (1)] RefB:21
kim-püra-me-n ’I came to appreciate’
-TV.kim_saber-IV.püra_subir-CR.IV+TH.me20
+IND1SG.n3

Exercise 40

Realisation contexts for 3rd type of defective
verbs

define RuVSC04
$["@SC06" ?* ["+ST"|"+PR"|"+PRPS"|"+PFPS"]];

Prohibition rule R40 forbids roots tagged @SC06 to take suffixes stative -le- or progressive -meke- (slot 28), progressive persistent -nie- or perfect persistent -künu- (slot 32).

Exercise 41

Realisation contexts for 4th type of defective
verbs

define RuVSC05 $["@SC07" ?* ["+CIRC"|"+ST"|
"+PR"|"+PRPS"|"+PFPS"]];

Prohibition rule R41 forbids roots tagged @SC07 to take suffixes of circular movement -iaw- (slot30), stative -le- or progressive -meke- (slot 28), progressive persistent -nie- or perfect persistent -künu- (slot 32).

Exercise 42

Realisation contexts for verb nge- ’to be’
define RuVSC06 ["@SC08" => _ ?* ["+CA"|"+TR"],
_ ?* ["+HH"|"+TH"],
_ ?* "+NEG"] .o.
$["@SC08"?*["+HH"|"+TH"]?*["+NEG"{.la10}]];

R42 is another rule made by composition, the first sub-rule defines the contexts where the verb root nge- ’to be / to have’ must happen. The first context for this root, tagged @SC08, demands it to be followed either by +CA causative (slot 34) -l- or -m-, or by +TR transitivizer (slot 33) -tu-:

Example 214

[Smeets, I. 2008: 126 (28)] RefB:21
nge-l-me-fi-ñ ’I have taken them’
-IV.nge_ser+CA.l34+TH.me20484848The obligatory suffixes of this and next rule co-occur in this verb, which is another possible context.+EDO.fi6+IND1SG.n3

The second possible context for nge- makes it occur with hither -pa- (slot 17) or thither -me- (slot 20):

Example 215

[Smeets, I. 2008: 231 (1)] RefB:21
nge-me-fu-n ’I was there’
-IV.nge_ser+TH.me20+IPD.fu8+IND1SG.n3

Example 216

[Smeets, I. 2008: 534 (mungel)] RefB:21
nge-pa-yaw-ki-y-m-i ’You were hanging around here’
-IV.nge_ser+HH.pa17+CIRC.iaw30+CF.ke14
+IND.y4+2.m3+SG.i2

The last possible context for nge- says that it must be followed by a negation marker, slot 10. This rule may be read as an exception for the previous rules: nge- must be followed by -l-, -m-, -tu-, -pa- or -me-, except when it takes a negation suffix, making previous suffixes optional:

Example 217

[Smeets, I. 2008: 407 (17)] RefB:21
nge-ke-la-fu-y ’they were not’
-IV.nge_ser+CF.ke14+NEG.la10+IPD.fu8
+IND.y4+3.Ø3

Example 218

[Smeets, I. 2008: 194 (64)] RefB:21
nge-nu-n ’there was not’
-IV.nge_ser+NEG.no10+PVN.n4

Finally, the prohibition concatenated rule for nge- ’to be’ prevents its co-occurrence with +HH, +TH and the negation for indicative -la-.

Exercise 43

Realisation contexts for verb pepi- ’to be able to’
define RuVSC07 ["@SC09" =>
_ ?* [["+CA"{.ül34}]|"+FAC"]];

The root pepi- ’to be able to’ in a single root stem must always be followed by -l- causative form or by factitive -ka-:

Example 219

[Smeets, I. 2008: 402 (45)] RefB:21
pepi-ka-w-ün ’the setting of preparations’
-TV.pepi_poder-hacer+FAC.ka33+REF.w31+PVN.n4

Example 220

[Smeets, I. 2008: 545 (pepi)] RefB:21
pepi-l-fal-la-y ’it can not be done’
-TV.pepi_poder-hacer+CA.l34+FORCE.fal25
+NEG.la10+IND.y4+3.Ø3

Exercise 44

Realisation contexts for verb ru- ’to go through’
define RuVSC08 ["@SC10" =>
_ ?* ["+HH"|"+hh0"|"+TH"|"th0"]];

Root ru- ’to go through’ does not occur without direction markers -me- (thither slot 20) or -pa- (hither slot 17), even in complex stems or compounds.

Example 221

[Smeets, I. 2008: 247 (3)] RefB:21
amu-ru-me-y mawün-mew ’he went through the rain’
-IV.amu_ir-IV.ru_pasar-CR.IV+TH.me20
+IND.y4+3.Ø3
-NN.mawün_lluvia+INST.mew

Example 222

[Smeets, I. 2008: 515 (kata-)] RefB:21
kata-ru-l-me-y ’it pierced through’
-TV.kata_perforar-IV.ru_pasar-CR.IV
+CA.l34+TH.me20+IND.y4+3.Ø3

Example 223

[Smeets, I. 2008: 555 (ru-)] RefB:21
külü-ru-pa-n antü ’after noon’
-IV.külü_apoyar-IV.ru_pasar-CR.IV
+HH.pa17+PVN.n4

Example 224

[Smeets, I. 2008: 462 (63)] RefB:21
ru-l-pa-antü-le-y-iñ ’we spent the day’
-IV.ru_pasar+CA.l34+HH.pa17-NN.antü_día-CR.IV
+ST.le28+IND.y4+1.Ø3+PL.iñ2

Deictic roots.

See section 3.3.2, p. 3.3.2. Root fa- is tagged @FA and root fe- is tagged @FE, this allow us to apply rules that verbs formed from these roots require.

Exercise 45

Realisation contexts for deictic verbs
define RuVSC09 ["@FA" => _ ?* ["+CA"|"+ST"]]
.o. ["@FE" => _ ?* ["+CA"|"+ST"],
_ $["+IND"] "+3"];

R45 states that verbs containing fa- ’to be like this’ or fe- ’to be like that’ must also contain causative suffixes -l-, -m-, or stative suffix -le-. However, verbs derived from fe- do not obligatory fit this rule when they end in -y which corresponds to indicative mood, 3rd person (see examples E67 to E72, p. 67).

Verbs with causative and factitive.

Finally, there is a number of verbs, which roots are tagged @CF, that "do not take the causative suffix -l- +CA (slot 34) without simultaneously taking the factitive morpheme -ka- +FAC (slot 33)" [Smeets, I. 2008: 301] RefB:21 :

Exercise 46

Realisation contexts for deictic verbs occurring with causative
define RuVSC10 [["@CF" ?*[["+CA"|"+ca0"].l34]]
=> _ ?* ["+FAC"|"+fac0"]];

R46 do not force roots marked @CF to be followed by causative suffix, instead, it states that the sequence "@CF-l-" must co-occur with factitive -ka-. All of this may be read as "if a @CF root is followed by +CA it must also follows +FAC. This rule also covers complex stems an complex compounds by means of "neutralised" tags (see Neutralisation of tags., p. 5.2.1):

Example 225

[Smeets, I. 2008: 66 (42)] RefB:21
küme-y ’it is good’
-AJ.küme_bueno+VRB.Ø36+IND.y4+3.Ø3

Example 226

[Smeets, I. 2008: 255 (3)] RefB:21
küme-l-ka-le-tu-n ’I am well’
-AJ.küme_bueno+VRB.Ø36+CA.l34+FAC.ka33
+ST.le28+RE.tu16+IND1SG.n3

Example 227

[Smeets, I. 2008: 349 (17)] RefB:21
llüka-le-n ’I am afraid’
-IV.llüka_temer+ST.le28+IND1SG.n3

Example 228

[Smeets, I. 2008: 375 (25)] RefB:21
llüka-l-ka-che-ke-y ’he frightens people’
-IV.llüka_asustar+CA.l34+FAC.ka33
-NN.che_persona-CR.IV+CF.ke14+IND.y4+3.Ø3

Example 229

[Smeets, I. 2008: 572 (welu2)] RefB:21
ti lifru welu-y ’the book was exchanged’
-AP.ti_el -NN.lifru_libro
-IV.welu_intercambiar+IND.y4+3.Ø3

Example 230

[Smeets, I. 2008: 572 (welu2)] RefB:21
welu-l-ka-ñma-fi-ñ ’I exchanged it’
-IV.welu_intercambiar+CA.l34+FAC.ka33
+IO.ñma26+EDO.fi6+IND1SG.n3

6 Beyond "A grammar of Mapuche"

"A grammar of Mapuche" [Smeets, I. 2008] RefB:21 , our development base, describes the central Mapuche dialect. We have added into the analyser words that are not in Smeets’ work. Compounds that she have not found throughout her study but some other authors mention. Also some minor dialectal variations.

6.1 The spelling unifier

There is a significant variation in Mapudüngun spelling, mainly due to the existence of different spelling proposals, together with the strong influence of Spanish orthography. Some texts may present a mixture of these orthographic proposals along with Castilianized orthography. This is something to sort out before analysing a text, either with rule based analysers or statistical ones, because the divergence in input means more rules for the first ones and poor results for the second ones.

The task of the unifier is to replace characters; from a series of possible inputs, a single output is returned. This process is called "unification". Initially, the idea was to change different graphemic proposals for the Mapuche language into one single spelling. But the strong influence of Spanish orthography in written Mapudüngun was noticed along the way. Therefore, the final implementation reflects mainly this fact, including anyway a couple of rules related to some of the graphemic proposals. This process is embedded in the analyser.

Mapudüngun vowels should not have accentuation marks, if there is any, it is transformed into its non accentuated version:

á → a, é → e, etc.

The morphological analysis is performed on words in lowercase letters. The unifier section makes the main FST interpret every uppercase letter as a lowercase one, which does not mean it gives a lowercase output, as it is demonstrated below:

Kasinta → ‑PN.Kasinta_Jacinta

Other changes are:

b → f
ca → ka, co → ko, cu → ku
ce → se, ci → si
gue → ge, gui → gi
hua → wa, hue → we, hui → wi, huo → wo, huu → wu, when h is not preceded by c
ha → a, he → e, hi → i, ho → o, hu → u when h is not preceded by c (che)
j → k, qu → k
v → ü when v is between consonants or semivowels, v → f otherwise
q → g when q is not followed by u
tx → tr, x → tr
z → d

From all these changes, only the last three and the first part of the fourth backwards (v into ü) do not have to do with Spanish but with some graphemic proposal for Mapudüngun.

The Mapuche alphabet we encode has twenty-five graphemes, five of which are digraphs. There are six vowels, three semivowels and seventeen consonants:

a, ch, d, e, f, g, i, k, l, ll, m, n, ng, ñ, o, p, r, s, sh, t, tr, u, ü, w, y

There are additional graphemes that we accept for analysis, which belong to other graphemic proposals for Mapudüngun or to Spanish:

b, c, h, j, l’, n’, q, t’, tx, x, v, z

6.2 Lexicon

We have augmented our lexicon mostly from Augusta’s dictionaries RefB:03 . But not only new words have been introduced, also many variants to the already collected words from Smeets, do not forget that Mapudüngun spelling is not fixed yet. But what most variants generates are the differences in pronunciation of some sounds, for instance, final -n is usually interchanged by final ; ü in any positions is commonly interchanged with u, i and sometimes e, or the other way around; tr with ch, d with s, etc., see table 9494949Spaces with a dash in Smeets’ column of table 9 mean the introduction of a new term not found in her work. for some examples:

PoS Smeets Variant Meaning
-AJ arken arkeñ evaporated
-AJ kolükollü kolikolli
brown, reddish brown,
beige
-AJ - liuke clean, clear, pure (water)
-AV kisukishu kidu alone, self, own
-AV - kashill near
-AV kütu küto even, also
-IV - yawa make noise
-IV witra wütra get up
-IV chekodllikosh llikod
to squat,
to sit down on one’s heels
-NN achawall
achaw
achawüll
chicken
-NN chafo chafa cough, catarrh, cold
-NN -
dagllu
dawllu
river shrimp
-TV ingka inka to defend
-TV -
kedin
kediñ
to shear
-TV ütrüf itrüf to throw
Table 9: Spelling variants in lexicon

6.3 Williche verb forms

"In Williche505050-NN.willi_sur-NN.che_persona ’Southern people’, the southernmost dialect of Mapudüngun, transitive verbs expressing the 1 → 2 relationship (with a total number of participants greater than two) is indicated by the combination of -e- and a second person subject marker in slot 3 (Mösbach 1962: 80, and Augusta 1903: 84–86 (cited by Salas 1979a: 307)), e.g. pe-e-y-m-i ’I saw you (sg)’, pe-e-y-m-u ’I saw you (dl)’; pe-e-y-m-ün ’I saw you (pl)’" [Smeets, I. 2008: 160] RefB:21 . This very same relations are included in a single form of central Mapudüngun when participants are more than two +1A.w23+IND.y4+1.Ø3+PL.iñ2, which may be disambiguated by means of personal or possessive pronouns. But Smeets also gives as example pe-e-y-m-i ’I saw you (sg)’ which are actually two participants, central Mapudüngun and Williche also differ in this form. Note that Williche forms are ended by the null morpheme of dative subject +DS12A demanded by the +IDO marker -e-, see the following examples to compare:

Example 231

central Mapudungün 1s → 2s [Smeets, I. 2008: 157 (20)] RefB:21
pe-e-y-u ’I see you (sg)’
-TV.pe_ver+IDO.e6+IND.y4+1.Ø3+DL.u2+DS12A.Ø1

Example 232

Williche 1s → 2s [Smeets, I. 2008: 160] RefB:21
pe-e-y-m-i ’I see you (sg)’
-TV.pe_ver+IDO.e6+IND.y4+2.m3+SG.i2+DS12A.Ø1

Example 233

central Mapudungün 1 → 2 (more than two participants) [Smeets, I. 2008: 572 ()] RefB:21
kellu-w-y- ’I helped you (d/p), we (d/p) helped you (s/d/p)’
-NN.kellu_ayuda+VRB.Ø36
+1A.w23+IND.y4+1.Ø3+PL.iñ2

Example 234

Williche 1s → 2d [Smeets, I. 2008: 160] RefB:21
pe-e-y-m-u ’I see you (dl)’
-TV.pe_ver+IDO.e6+IND.y4+2.m3+DL.u2+DS12A.Ø1

Example 235

Williche 1s → 2p [Smeets, I. 2008: 160] RefB:21
pe-e-y-m-ün ’I see you (pl)’
-TV.pe_ver+IDO.e6+IND.y4+2.m3+PL.ün2+DS12A.Ø1

Rules generated for central Mapudüngun correctly analyse the Williche form pe-e-y-m-u ’I saw you (dl)’. For the other two forms we had to change two rules in the system, compare:

Exercise 47

Central Mapudüngun imperative/plural form
define RuPr06 $[["+IMP1SG"|"+PL"] ?*
[["+DS3A"{.ew1}]|"+DS12A"]]

Exercise 48

Williche imperative/plural form
define RuPr06 $[["+IMP1SG"|"+PL"] ?*
["+DS3A".ew1]] .o. $["+IMP1SG" ?* "+DS12A"];

In central Mapudüngun, the plural +PL -ün- can not be followed by the dative subject for 1st or 2nd person agent -Ø-. On the contrary, it is necessary in the Williche dialect, and correctly analysed as shown in E235.

Exercise 49

Central Mapudüngun dative subject occurrence
define RuPr12 $[["+EDO"|"+PVN"|"+TVN"
|["+SG"{.i2}]] ?* ["+DS3A"|"+DS12A"]]

Exercise 50

Williche dative subject occurrence
define RuPr12 $[["+EDO"|"+PVN"|"+TVN"
|["+SG"{.i2}]] ?* "+DS3A"] .o.
$[["+EDO"|"+PVN"|"+TVN"] ?* "+DS12A"]

As in the previous case, in central Mapudüngun, the singular +SG -i- can not be followed by the dative subject for 1st or 2nd person agent . Which is also necessary in the Williche dialect, and correctly analysed as shown in E232.

6.4 Following Zúñiga

Even though Zúñiga seems to base "Mapudüngun. El habla mapuche" [Zúñiga, F. 2006] RefB:24 in central Mapudüngun, texts included in his work present some variations respect Smeets. We have included these divergence in the analyser.

6.4.1 Different indicative form

"The mark for indicative mood is -i-. It appears as a vowel if the root ends in a consonant, as a semivowel -y- if the root ends in a vowel other than i-, and it does not appear if the root ends in i-"515151Translation is ours, the original is in Spanish [Zúñiga, F. 2006: 105] RefB:24 . Instead of the "root ending", it should be said "the preceding sound", because this one may actually belong to the root, but also to a previous suffix. We have found that it realises as -i- after semivowel too (E237); and is a null suffix when found either preceded or followed by i (E239). In Smeets work, the indicative is either -y- after vowel or semivowel, or -üy- after consonant; there is a -iy- variant for the later.

To treat the variant presented by Zúñiga, we have added an @IZ tag to the already existent encoding of +IND suffix, D30. And added a new rule to deal with contexts of conversion of the intermediate representation @IZ (R51):

Definition 30

["+IND"{.y4}]: [[["@ÜI"|"@Ü"]y]|"@IZ"]

Exercise 51

Zúñiga’s indicative form
define RuIndZu ["@IZ" -> 0 || i _ , _ i]
.o. ["@IZ" -> i];

D30 encodes indicative mood suffix, we have introduced the @IZ tag to treat Zúñiga’s variant. Preceding this new tag is the suffix as Smeets presents it (E238). is for "CON-üy-", "VOW|SVW-y-"; @ÜI is for "CON-iy-".

R51 defines, by context, the form @IZ should take, transforming it always into i, except when occurs either before (E239) or after (E240) i, in which case it is transformed into 0, acting as a null suffix. Otherwise, the tag remains until the end of the process, when it is cleared out (E236, E237 and E238).

Example 236

[Zúñiga, F. 2006: 105 (Cuadro III-3a / 2a)] RefB:24
kon-i-m-i ’you enter’
-IV.kon_entrar+IND.y4+2.m3+SG.i2

Example 237

[Zúñiga, F. 2006: 283 (pewma)] RefB:24
chum-yaw-i-m-i ’what are you doing around?’
-QC.chum_cómo+VRB.Ø36+CIRC.iaw30
+IND.y4+2.m3+SG.i2

Example 238

[Zúñiga, F. 2006: 105 (Cuadro III-3b / 2a)] RefB:24
tripa-y-m-i ’you leave’
-IV.tripa_salir+IND.y4+2.m3+SG.i2

Example 239

[Zúñiga, F. 2006: 227 (84)] RefB:24
amu-a-iñ ’let us go’
-IV.amu_ir+NRLD.a9+IND.Ø4+1.Ø3+PL.iñ2

Example 240

[Zúñiga, F. 2006: 105 (Cuadro III-3c / 2a)] RefB:24
pi-m-i ’you say’
-TV.pi_decir+IND.Ø4+2.m3+SG.i2

6.4.2 Glottal stop before +Ido

When the internal direct object -e- is preceded by a-, there is an optional epenthesis of a glottal stop in between, reflected as -g- in spelling. As there is already a rule that treat glottal stop epenthesis in compounds where the second root starts in vowel. We just added the appropriate tag @G to the +IDO suffix, D31 (see E116, E117, E118, R1 and R2, p. 1):

Definition 31

Encoding of the +IDO suffix
["+IDO"{.e6}]: ["@G""@ID"]

Example 241

[Zúñiga, F. 2006: 274 (56)] RefB:24
kulli-a-g-e-y-u ’I will pay you both’
-TV.kulli_pagar+NRLD.a9+IDO.e6
+IND.y4+1.Ø3+DL.u2+DS12A.Ø1

Example 242

[Zúñiga, F. 2006: 279 (109)] RefB:24
elu-tua-g-e-n ’give it back to me’
-TV.elu_dar+RE.tu16+NRLD.a9+IDO.e6
+IND1SG.n3+DS12A.Ø1

Example 243

[Zúñiga, F. 2006: 130 (note 10)] RefB:24
elu-la-g-e-n ’you did not give me’
-TV.elu_dar+NEG.la10+IDO.e6+IND1SG.n3+DS12A.Ø1

6.4.3 Nominal compounds

We have only explained the Mapuche verb throughout this article. We have not introduced the nominal forms as such, but as one of the possible verbal stems. Nominal forms are much more simpler than the verbal ones. We do not want to make this article too extensive adding encoding details that are well explained with the verb form, it is suffice to say that there is nominal compounding in Mapudüngun.

Zúñiga defines püle ’by, towards’ as a post-position [Zú-ñiga, F. 2006: 195] RefB:24 . Smeets defines it as a post-position too, ’side’; but also as a noun [Smeets, I. 2008: 69 (10.4)] RefB:24 , which is how we have incorporated püle ’side, direction’ into our lexicon.

Even classifying püle as a post-position, in many occasions this word forms compounds in Zúñiga’s texts, which are not recognised by the Smeets derived rules of the analyser.

Exercise 52

Nominal compound (püle)
define formCXNN
[[%<[DEMPR|IVROOT|INTPR]%# %<NROOT%#]"-CNN"];
define preCXNN
[_eq(formCXNN, %< , %#)];
define RuleCXNN1
["-NN" => ["@TYA"|"@FA"|"@T"] ?* _ ];
define CXNN
[CLEANu .o. RuleCXNN1 .o.
formCXNN - RuleCXNN1 .o. CLEANNVFd];

The mechanism for compounding was already explained at Compounds encoding., p. 5.2.1, so here we add that in R52 the form of these nominal compounds is defined as having a first member that may be a demonstrative pronoun, an intransitive verb or an interrogation pronoun. RuleCXNN1 specifies which forms, out of these categories, are actually accepted to form the compound with the noun as a second member. The specific forms has been tagged to filter them out, @TYA, @FA525252The verb root tagged @FA is also identified as a deictic verb with the same tag (see Deictic roots., p. 5.3.6). and @T respectively; actually, only one member of each category is tagged. In the examples we show compounds that are not recognised following Smeets, compounds that take püle as a second member (In Smeets, püle may be the second noun in a nominal compound):

Example 244

[Zúñiga, F. 2006: 275 (65)] RefB:24
fa-püle ’around here’
-IV.fa_ser-esto-NN.püle_lugar

Example 245

[Zúñiga, F. 2006: 191 (Cuadro III-17)] RefB:24
kañ535353This is an epenthetic ñ.-püle ’somewhere else’
-AJ.ka_otro-NN.püle_lugar545454There is a colloquial expression in Chile, ’salta pal lao’ which means something like ’I don’t believe you’, ’are you kidding’, ’you better not…’, depending on the situation. Payllafilu says that this expression is equivalent to kañpüle in Mapudüngun.

Example 246

[Zúñiga, F. 2006: 273 (49)] RefB:24
tie-püle ’over there’
-DP.tüye_aquel-de-allá-NN.püle_lugar

Example 247

[Zúñiga, F. 2006: 182 (95.d)] RefB:24
tuchi-püle ’wherever’
-IP.chuchi_qué_cuál-NN.püle_lugar

Example 248

[Zúñiga, F. 2006: 275 (67)] RefB:24
üye-püle ’over there’
-AV.üye_allí_allá-NN.püle_lugar

Augusta’s nominal compound.

Another nominal compound integrated into the analyser was found in Augusta, F. RefB:03 . It has a numeral as first member and a noun as second member:

Example 249

[Augusta, F. (epuange)] RefB:24
epu-ange ’two faces555555Two faces, (be of) two faces, a designation of a wekufü which owns the sea or lake and is also called Millalongko ’golden head’ or Kawekufü ’water daemon’. Kutranelenew EpuangeEpuange has made me sick’. Epithet that in some places Mapuche give as first name to a god, e.g. Epuange ngünechen ’two faces father regulator’, either because they represent two sexes, or because with this expression they allude to the benign and serene heaven and to the unfavourable heaven; or to the severity and benignity that the supreme being can demonstrate to men. In addition, the idea is applied to both God and Mayorwekufu ’major daemon’. V. Augusta (1910, p. 227) [Augusta, F. epuange] RefB:03
-NU.epu_dos-NN.ange_cara

Example 250

[Zúñiga, F. 2006: 319 (kiñepüle)] RefB:24
kiñe-püle ’by/towards this side/place’
-NU.kiñe_uno-NN.püle_lugar

6.4.4 Instrumental and ad-position mew

Smeets classifies -mew as an instrumental suffix that follows nouns, deverbal nouns and pronouns (see Smeets, I. 2008: 61 - 67, "10.1 The instrumental -mew -mu" RefB:21 ).

Zúñiga defines mew as an ad-position that is realised separated from the noun or deverbal noun that follows, but together with pronouns or adverbs (see Zúñiga, F. 2006: 194 - 197. "4.1 Las adposiciones y los sustantivos relacionales" RefB:24 ).

In order to recognise mew as an independent form, but still as the instrumental suffix, it was declared as such in the non-verbal section of the script:

Example 251

[Zúñiga, F. 2006: 195 (105.a)] RefB:24
müle-ka-n ruka mew ’I am still at home’
-IV.müle_estar+CONT.ka16+IND1SG.n3
-NN.ruka_casa +INST.mew

Example 252

[Zúñiga, F. 2006: 195 (105.b)] RefB:24
amu-tu-n waria mew ’I went back to the city’
-IV.amu_ir+RE.tu16+IND1SG.n3
-NN.wariya_ciudad +INST.mew

Example 253

[Zúñiga, F. 2006: 195 (105.c)] RefB:24
waria mew küpa-n ’I came from the city’
-NN.wariya_ciudad +INST.mew
-IV.küpa_venir+IND1SG.n3

Example 254

[Zúñiga, F. 2006: 201 (2.b)] RefB:24
fey-mew kintu‐ke‐y‐ng‐ün meli mamüll ’then they looked for four trees’
-AV.fey_entonces+INST.mew
-TV.kintu_buscar+CF.ke14+IND.y4+3.ng3+PL.ün2
-NU.meli_cuatro -NN.mamüll_árbol

6.5 Wüño as auxiliary

According to Smeets, in Mapudüngun there are five auxiliary verbs. These are elements separated from the main verb. They are verbal stems without inflection, which immediately precede the main verb, without any other element in between [Smeets, I. 2008: 175 (25.4)] RefB:21 . These are:

  • kalli ’enabling’

  • kim ’knowing how to’

  • küpa ’wishing’

  • pepi ’being able’

  • shinge ’moving up/along’.

Lonkon calls these elements, "modal prefixes". She identifies four of them: kalli, kim-, küpa- and pepi-, giving them the same values as Smeets. But Lonkon considers them prefixes, therefore, attached to the verb they are moulding, except for the permissive/enabling kalli [Lonkon, E. 2011: 249] RefB:12 .

Zúñiga identifies two of these modal elements: kim- and pepi-. He says that they form part of "complex verb stems", which means that both verb roots, the modal and the moulded one, form a compound. However, he says, they can also be expressed as separated elements, then he calls them "pre-verbal particles" [Zúñiga, F. 2006: 136] RefB:24 .

Zúñiga also displays a list of verb roots that form complex verb stems, and, in fact, he devotes a paragraph to explain that they may be formed by radical concatenation, and by nominal incorporation. In the list is found the verb root wüño- ’re-’, ’return, come back’. In the list are also kalli- and küpa-, but not shinge- [Zúñiga, F. 2006: 179] RefB:24 .

Zúñiga explains that these forms are frequently found as pre-verbal particles, i.e., separated from the main verb, which is reflected in spelling. He treats them as radical concatenation, though.

Salas calls them "modals", and he identifies kim- ’know’, küpa- ’wish’ and pepi- ’be able’. He says that they function as prefixes of single and complex stems [Salas, A. 2006: 192] RefB:19 .

Augusta defines wüño- as a suffix equivalent to the prefix morpheme ’re-’, which mainly expresses the idea of redoing the action indicated by the following verb. We have realised that wüño also expresses the idea of retrospective and/or backwards action [Augusta, F. wüño] RefB:03 .

Among the examples given by Augusta, there are some that show wüño separated from the main verb, and others, forming a compound with the moulded verb.

Many Mapudüngun native speakers perceive wüño separated from the verb it is moulding, and they reflect it as Zúñiga shows in its examples.

An affix does not realise isolated in Mapudüngun, it must be attached to a verb or another root (adjective, noun, etc.). It can neither work as a root accepting suffixes to be attached. wüño, on the other hand, also works as a verb root that may be inflected by attaching suffixes to it. Therefore, wüño meets the auxiliary definition given by Smeets.

wüño wiño either forms a compound, originates an inflected verb, or is a separated element. If some native speakers perceive it as a separate element, it may indicate that it is a modal element, thus, following Smeets, this verb would fulfil the "auxiliary" function as she defines it. Or the "pre-verbal particle" function, as Zúñiga calls it.

Anyway, not all Mapudüngun native speakers spell wüño wiño separated from the main verb. Many of them use it in the verb as a compound, i.e., as a "modal prefix", as Lonkon calls it.

We have added wüño as auxiliary into the lexicon list through the following entry:

Definition 32

["-XV"{.wüño_re-_volver-a}]: [{wüño}|{wiño}]

Example 255

[Zúñiga, F. 2006: 148 (63)] RefB:24
wüño-witra-me-tu-a-fiel ’to go there to recover them’
-IV.wüño_volver-IV.wütra_levantar-CR.IV
+TH.me20+RE.tu16+NRLD.a9+TVN.fiel4

Example 256

[Augusta, F. contestar] RefB:03
wüño fey-pin ’to answer’
-XV.wüño_re-_volver-a -TV.feypi_decir+PVN.n4

Example 257

[Lonkon, E. 2017] RefB:13
wiño wütra-m-püra-m-nge-tu-a-fu-y ’it revitalised’
-XV.wüño_re-_volver-a
-IV.wütra_levantar+CA.m34-IV.püra_subir-CR.IV
+CA.m34+PASS.nge23+RE.tu16+NRLD.a9
+IPD.fu8+IND.y4+3.Ø3

6.6 Proposing -ñma as adverbializer

Smeets lists -ñma as an unproductive suffix [Smeets, I. 2008: 116] RefB:21 , giving a series of examples with this suffix:

  1. fücha-ñma ’very long’ (fücha ’long’)

  2. we-ñma ’very new’ (we ’new’)

  3. wesha-ñma ’very bad’ (wesha ’bad’)

  4. rume-ñma ’extremely’ (rume ’very’)

  5. welu-ñma ’wrong, reversely’ (welu ’but, wrong, reversely’)

  6. alü-ñma ’for a long time’ (alü ’much’), cf. alü-ñma-mew ’much later, a long time after that’

  7. fentre-ñma fentre-yma ’for a long time’ (fentre ’much’)

  8. epu-ñma ’with the two of us’ (epu ’two’)

  9. ka-ruka-ñma ’neighbour’ (ka-ruka ’neighbour’)

Except for the examples 5, 8 and 9, it is quite evident the part of the meaning contributed by -ñma, ’very’. In other texts we have found:

  1. weda-ñma ’evil, too bad’ (weda ’bad’) [Zúñiga, F. 2006: 270 (15)] RefB:24

  2. pichi-ñma ’just now, recently’ (pichi ’little’) [Zúñiga, F. 2006: 271 (30)] RefB:24

  3. fücha-ñma ’very big’ (fücha ’big, old’) [Augusta, F.
    (füchañma)] RefB:03

  4. llekü-ñma ’very close’ (llekü ’close, near’) [Augusta, F. (llekü-ñma)] RefB:03

We have not gone through an exhaustive research of this suffix, that is why it is just a proposal. We think it is quite clear that -ñma adds the ’very’ part of the meaning, but we have observed that it is applied only to adjectives and certain adverbs with this sense, and only as part of the nominal compounding. There is only one case, mentioned by Smeets, where it seems to be part of a verbal stem, but there are no examples of it; so in the cases of E258 it could be either the indirect object marker, slot 26, or the experience suffix, slot 35, which share the form -ñma:

Example 258

[Smeets, I. 2008: 574 (wesha)] RefB:21
wesha-ñma-nge- (Vi) to be a bad person;
wesha-ñma-w- (Vi) to break down, to fall apart, to become a bad person;
wesha-ñma-w-küle- (Vi) to be broken/in pieces, to feel awful;

Definition 33

["+ADV"{.ñma}]: ["@ÜÑ"{ma}];

Exercise 53

Prevent adverbializer to appear as verbal suffix
define RuPr51 $[$["+ADV"{.ñma}]]];

Exercise 54

Apply adverbializer only to adjectives and adverbs
define NvsOpRu01 [["+ADV"{.ñma}] =>
["-AJ"|"-AV"] ?* _ ];

D33 encodes -ñma as adverbializer. R53 prevents it to appear along the verb sequence of suffixes. R54 restricts it to co-occur only with adjectives and adverbs. The other forms, nouns, numerals, that occur with this suffix are collected as lexicalized forms in the lexicon:

Definition 34

["-AV".epuñma_con-nosotros-dos]:["@G"{epuñma}]

7 Analyser dimensions

This section exposes data referent to the amount of each type of element interacting in the system: lexicon, suffixes, rules, states, etc.

There is a flow chart (deployed in figures 9, 10 and 11) showing the interconnection of all the analyser elements along the process in annex 11.6, p. 11.6.

  • Roots (verbalisable lexicon): 2,096

  • Adjectives: 128

  • Adverbs: 24

  • Intransitive verbs: 257

  • Proper nouns: 68

  • Nouns: 1,325

  • Numerals: 14

  • Onomatopoeia: 12

  • Questions: 4

  • Transitive verbs: 264

  • Non verbalisable lexicon: 266

  • Adverbs: 88

  • Anaphoric pronouns: 5

  • Auxiliaries: 8

  • Conjunctions: 9

  • Demonstrative pronouns: 6

  • Foreign expressions: 7

  • Interrogative pronouns: 8

  • Interjections: 27

  • Negations: 1

  • Numbers: 10565656["-NBR"]: [[%0|1|2|3|4|5|6|7|8|9]+]; This regex declares the unities and the + sign encodes any combination formed by one to infinite unities, all of which would be tagged "-NBR".

  • Particles: 19

  • Personal pronouns: 9

  • Possessive pronouns: 6

  • Prepositions: 5

  • Punctuation marks: 58

  • Suffixes: 116

    • Verb suffixes: 101

    • Inflectionals: 56

    • Mobile derivationals: 20

    • Fix575757In this category not all derivational are fix, but most of them; as in the previous category not all suffixes are mobile, but many of them. derivationals: 24

    • Non-slot assigned: 1

    • Non-verb585858These suffixes may actually be added to nominalised verbs, but never to finite verb forms, i.e. verbs that have mood, person and number. suffixes: 15

    • Class-changing: 3

    • Instrumental: 1

    • Non class-changing: 6

    • nominalisers: 5

  • Rules: 472

  • Regexs (files595959These are files containing regular expressions encoding the lexicon and suffixes, which a are separated from the main script.) location: 76

  • Character definitions606060These are the lists of consonants, vowels and semivowels.: 3

  • Phonological: 44

  • Morphological: 345

  • Cleaning616161These rules clear symbols used as marks when processing morphological and phonological changes.: 4

  • Compilation values

  • Size: 200.3 MB.

  • States: 2,858,426

  • Arcs: 13,128,696

  • FST type: cyclic

8 Evaluating the analyser

8.1 Corpora in use

8.1.1 Gold standard

We have collected a corpus made of sentences coming from "A grammar of Mapuche" [Smeets, I. 2008] RefB:21 , which is our "Gold standard" corpus. Words in the corpus were analysed and disambiguated by Smeets.

The Gold Standard corpus includes all the sentences from chapters 10 to 18, and 21 [Smeets, I. 2008: 61-116, 121-128] RefB:21 . These chapters deal with nouns, adjectives, adverbs, numerals, demonstratives and anaphoric pronouns, personal pronouns, possessive pronouns, interrogative pronouns, suffixation and verbalisation. The corpus also contains all the seventeen texts of "Part VIII - Texts" [Smeets, I. 2008: 369-487] RefB:21 , which is obviously the most abundant source of Mapuche writings in the Gold standard. Texts titles are:

  • Text 1. Demons

  • Text 2. Work

  • Text 3. Youth

  • Text 4. Missionary

  • Text 5. The war

  • Text 6. An old man

  • Text 7. Olden times

  • Text 8. Conversation about demons

  • Text 9. Conversation about youth

  • Text 10. Conversation about work on big farms

  • Text 11. Conversation about land disappropriation

  • Text 12. Our reservation

  • Text 13. My father

  • Text 14. Brick

  • Text 15. Song 1

  • Text 16. Song 2

  • Text 17. Song 3

8.1.2 Control corpus

Out of the Gold standard we have extracted a control corpus of 240 sentences containing a total of 1,671 words, which correspond to 650 forms.

Example 259

ñi trewa, ñi ñarki ka ñi kawell ’my dog, my cat and my horse’

In example E259 there are seven words, but 5 forms which are ka, kawell, ñarki, ñi and trewa. The 3 ñi words count as 1 form.

The control corpus is used to check results obtained from the analyser. A correct analysis for every word of this corpus must appear in the output.

8.1.3 Comparison corpus

This corpus is collected from Zúñiga’s texts [Zúñiga, F. 2006: 270 (15)] RefB:24 . It is made of 170 sentences that contain 1,256 words which correspond to 511 forms (see previous section 8.1.2 and example E259). The texts are extracted from "Mapudüngun. El habla Mapuche", chapter V. Textos en mapudüngun [Zúñiga, F. 2006: 266 - 288] RefB:24 . Text titles are:

  1. Feychi ngürü afngünengelu ’That crafty fox’

  2. Mawün ’Rain’

  3. Pewma ’A dream’

  4. Ngillañ mawün ’Asking for rain’

  5. We tripantu ’New year’

  6. Pausa_Historia ’Pause_History’

  7. Abuela_Voz ’Granny_Voice’

8.2 Establishing the ambiguity parameter

Ambiguity.

In any language some words present ambiguity. These word forms, known as homographs, correspond to different meanings, which are disambiguated by context; for example, the English word drop means ’a little amount of liquid’ in "a drop of coffee stained my letter". And it means ’to fall’ or ’let fall’ in "do not drop papers on the floor". The only way to know which meaning drop is referring to, i.e., the only way to "disambiguate" it, is by putting it into context.

In Mapudüngun there are quite few ambiguous words (homophones which once written become homographs) that appear very often in texts, like ka, fey and most of the words ending in n which is the shared form between the "1st person, singular" suffix and the "plain verbal noun" suffix, examples of them are below (see 5.3.5 5.3.5 Over-generation, p. 5.3.5):

  • ka → -AJ.ka_otro

  • ka → -CJ.ka_y

  • ka → -PT.ga_ciertamente_indignación_cinismo

  • fey → -AV.fey_entonces

  • fey → -DP.fey_que_aquel_ese

  • fey → -IV.fe_ser-eso+IND.y4+3.Ø3

  • fey → -PP.fey_él_ella_ellos

  • küdawün → -IV.küdaw_trabajar+IND1SG.n3

  • küdawün → -IV.küdaw_trabajar+PVN.n4

Mapudüngun ambiguity calculus.

If we add up all the analysis of three previous words: 9, and divide it by the number of words we are taking into account: 3, we obtain the "average Mapudüngun ambiguity" (ama) of these words: 3. We use this result as a reference parameter when comparing analysis results.

As we have explained before, the development of the analyser arises from Smeets’ description of Mapudüngun morphology [Smeets, I. 2008] RefB:21 . We have first developed an analyser that strictly fits with Smeets work, we have generated an FST from it, we call it "SmeetsAnalyser". Then, we have expanded the analyser in lexicon and rules that fit another variants of Mapudüngun, provisionally, we call it Düngupeyem626262düngu ’word, speech, language’; pe ’proximity suffix, it indicates physic and or temporal proximity with the action expressed by the verb; ye ’constant feature suffix’; m ’it indicates instrument or location’. All together it is something like ’instrument always used to do things with language’. We intended to say ’language tool’. That is one of the reasons to be a provisional name. ’Tool’ in Mapudüngun is küdaw-ka-we, so another possibility is to call our analyser Düngu-ka-we. Another reason to be a provisional name is that we have found no native Mapuche yet, who validates the name..

To calculate the average Mapudüngun ambiguity, and establish it as a reference parameter, we have analysed the control corpus with the "SmeetsAnalyser".

  • Control corpus analysed with "SmeetsAnalyser"

  • word-forms = 650

  • produced analyses = 2,232

  • unknown words = 2

Following this calculus we obtain 3.59 ama, this means that every word-form has an average of 3.59 possible analyses, this is our reference parameter.

Increased ambiguity calculus.

The incorporations we have made in the system, those mentioned in section 6, [p. 6], increase the ambiguity in analysis results because they imply more ways of analysis for every form. Increasing the lexicon also produces this effect.

Now we calculate the "average increased ambiguity" (aia) to be able to compare the results and verify if the system is still reliable.

  • Control corpus analysed with Düngupeyem

  • word-forms = 650

  • produced analyses = 2,477

  • unknown words = 2

As expected, ambiguity rises, but not too much, less than 0.24 points per word-form, which indicates that the reliability of the analyser is quite good.

8.2.1 Comparison corpus results

To check if the additions (see 6, p. 6) we have made to the FST really worth it, we analyse the comparison corpus.

  • Comparison corpus analysed with "SmeetsAnalyser"

  • word-forms = 511

  • produced analyses = 1,368

  • unknown words = 120

  • Comparison corpus analysed with Düngupeyem

  • word-forms = 511

  • produced analyses = 1,828

  • unknown words = 10

At first glance, these results confirm a good performance of Düngupeyem, the difference in the average ambiguity (aa) even diminishes from 0.24 to 0.15. But there is a factor that we did not considerate in the calculus with the control corpus, it would have made no difference in results. There were too few unrecognised words (only 2), and the same amount for both analysers.

Analysing the comparison corpus, we obtain 120 unrecognised words from "SmeetsAnalyser", and only 10 from Düngupeyem. To measure the impact of this factor in the ambiguity index, we have used a "root guesser636363The root guesser is a tool derived from the analyser, which have no lexicon of roots, but the possible root structures in terms of consonants, vowels and semivowels, e.g., CVC, CVSV, CVCVCVC are valid Mapuche root structures. The lexicon of suffixes is included, also their combination rules. This machine first check the possible root structures and then the possible suffixes combinations. This FST is not described in this article because that would have made it too extensive." to count the possible analyses the unknown words generate. Then, we add the resulting possible analyses to the known analysis and recalculate the ambiguity index.

  • 120 unrecognised words from "SmeetsAnalyser"

  • words producing no possible analyses = 49

  • words producing possible analyses = 71

  • total number of possible analyses = 636

  • 10 unrecognised words from Düngupeyem

  • words producing no possible analyses = 3

  • words producing possible analyses = 7

  • total number of possible analyses = 56

These calculi confirm that the analyser is reliable, even more after adding the considerations for other dialects of Mapudüngun and more lexicon. The average analyses (aa) have raised, but too little as to consider it a problem, only 0.06 between the analyses that does not take into account the possible analyses for unknown roots (3.64), and the one that does take them into account (3.70).

8.3 Comparing against other analysers

For this purpose we have used the Gold standard corpus (see 8.1.1, p. 8.1.1) to train two different analysers, and we have analysed the comparison corpus (see 8.1.3, p. 8.1.3) with these tools and our analyser to compare results.

8.3.1 Trainable systems to compare against

A trainable system receives a disambiguated corpus to learn from, i.e., it stores in its memory (or data base) the correct forms that have been introduced in the training process. Then, using different algorithms, it compares an input text to the stored data and tags input accordingly. The trainable systems we have compared our analyser against are RFTagger646464

We wanted to use another tagger, one built with neuronal networks that we have trained last year. But we could not make it work now, it seems that Python modules have change too much and we only get errors with it.

[Schmid & Laws 2008] RefB:20 and Morfette [Chrupala et al. 2008] RefB:04 .

RFTagger.

This is a tool for the annotation of text with fine-grained part of speech tags656565This is the developers own definition found on the RFTagger site: https://www.cis.uni-muenchen.de/schmid/tools/RFTagger/ <22/08/2020>.

. It is a Hidden-Markov-Model tagging method that is particularly suitable for PoS tag sets with numerous detailed tags [Schmid & Laws 2008: 1]

RefB:20 . An HMM part of speech tagger calculates the most likely sequence of PoS tags for a given word sequence. The difference with our analyser is that RFTagger identifies a complete form and tags it accordingly, while our morphological analyser breaks down the input form and gives a tag for each part that construct the full form.

Morfette.

This is a tool for supervised learning of inflectional morphology. Given a corpus of sentences annotated with lemmas and morphological labels, and optionally a lexicon, Morfette learns how to morphologically analyse new sentences

666666This is the developers own definition found on the Morfette site: https://hack-age.haskell.org/package/morfette <22/08/2020>

. This is a data-driven modular probabilistic system that learns to perform morphological tagging from morphologically annotated corpora. The system is composed of two learning modules that use a maximum entropy classifier to predict morphological tags. The third module dynamically combines the predictions of the Maximum-Entropy models and generates a probability distribution over the sequences of tag-lemma pairs [Chrupala et al. 2008: 1]

RefB:04 .

Training RFTagger and Morfette.

The text of the Gold standard corpus has to be tokenized, one token per line. For each token, a tag is entered after a tab space. For verbs, the tag shows 38 slots for attributes. 36 verbal slots and two additional ones for nominal, temporal, instrumental or nominalising suffixes; each unavailable attribute676767Grammatical categories are collected with attributes, which are the aspects that detail the category, for example, the word "plastic" has the category Noun; a fine-grained tag for it would be NSM, where N stands for "noun", S for "singular" and M for "masculine", which are the attributes., which fits into a slot, is indicated by a 0 (zero), as shown in the following example:

Example 260

amu-ke-fu-y ’he used to go’
amukefuy IV.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.
0.0.0.0.0.+CF@ke14.0.0.0.0.0.+IPD@fu8.0.0.0.
+IND@üy4.+3@Ø3.0.0.0.0

For non-verbal forms, the tag holds four positions for attributes, the first one occupied by the category of the form and the last three by possible nominal suffixes, as shown in the following example:

Example 261

peñi-mu ’at my brother’s’
peñimu NN.0.0.+INST@mew

Morfette requires a bit more information. There is an additional column between the form and the tag to enter the lemma, where the root was added instead. Unlike RFTagger, attributes of the tag are not preceded by dots, compare E260 with the following example:

Example 262

amu-ke-fu-y ’he used to go’
amukefuy amu IV0000000000000000000000+CF@ke14
00000+IPD@fu8000+IND@üy4+3@Ø30000

Both machines were trained in their standard way, that is, with their default settings, except that in the case of Morfette we chose the vector representations of words (see E

260, E261, E262). The vector representation for tags uses a 0 (zero) for each possible attribute, where the realised ones replace them in the appropriate position. It is the only configuration that RFTagger accepts.

8.3.2 Results from the three analysers

Results from the two comparison tools are not ambiguous, they output only one tag per form, which gives us a binary value, either right or wrong tag assignation. They always set a tag for every form, so there are no unknown forms neither. We plan to add a disambiguator to the system so we can analyse words in context and, instead of outputting all possible analyses per word, only output the most suitable one.

The training corpus, the "Gold standard", is made up of 1,220 sentences, which in turn are made up of 9,209 tokens of which 1,998 are punctuation marks; when subtracting the latter, the total of unambiguously annotated word-forms is 7,211. This material is actually insufficient686868In data driven tools the more training data receive the tool the better it performs its tasks. Text applied tools use to be trained with thousands of sentences. For the tools we have used (RFTagger and Morfette) we believe that about 15 thousand sentences would give good quality results; in Smeets’ work there are about 1,800 morphologically tagged sentences. for an accurate training that can obtain acceptable results. Note that this is one of the main reasons for developing a morphological analyser based on rules, as there are not enough Mapuche annotated corpora available to carry out the data-driven type of approach with computational tools.

Data type RFTagger Morfette Düngupeyem
Tokens 1,471 - 100% 1,471 - 100% 1,471 - 100%
Right 723 - 49.1% 786 - 53.4% 1,460 - 99,2%
Wrong 749 - 50.8% 685 - 46.5% 0
Unknown - - 11 - 0.75%
Analyses 1,471 1,471 4,533
Table 10: Analysing results for the comparison corpus (see 8.1.3)
RFTagger results.

This tool can not be blamed for this low performance (nor Morfette either). The insufficient amount of training material is the cause of the poor results. RFTagger was not developed for agglutinative languages explicitly, or the complexities of Mapudüngun. Also keep in mind that agglutinative languages have potentially an infinite number of words, as English, for example, has potentially an infinite number of phrases. Therefore, it is more difficult for the system to learn from such a variety of forms how to tag accurately.

Morfette results.

This tool can handle the analysis process a little better than RFTagger. More words are correctly analysed, and less words are incorrectly analysed.

Throughout the results validation we have noticed that Morfette breaks down the forms to apply its inferred assumptions on analysed suffixes. But both machines fail though, in assuming almost all the words starting by a capital letter as a proper noun.

Düngupeyem results.

To make the results comparable, we have generated a mathematical formula that approach the two types of data and give us a way to explain the numbers.

The average ambiguity (aa) of the comparison corpus (see 8.2.1, p. 8.2.1) is 3.64 aa. On subtracting the number of analyses corresponding to that percentage, of Düngupeyem’s correct analyses (1,460) we relate this amount to those of Morfette and RFTagger right analyses. Then, we recalculate the percentage obtaining a closer degree of correspondence among the three machines results:

[frame]


After these results, it is again clear the system’s reliability, and the analyses’ high degree of accuracy. We see that regardless counting not recognised words as if they were wrong analyses, failed analyses reach only 0.75%, while success overcomes 95%. The other two tools are around 50% in right and wrong analyses.

8.4 Comparing against a Quechua analyser

Our work follows the path that Annette Ríos696969https://www.cl.uzh.ch/de/people/team/compling/arios.html has drawn for Quechua. This step is not an exception. Although there are notable differences, Düngupeyem’s evaluation is largely based on the same process for Quechua FST tools carried out by Ríos [Ríos, A. 2015: 36-39] RefB:17 .

The Quechua system is made up of 5 cascading transducers [Ríos, A. 2015: 22] RefB:17 , and some of these FSTs use a disambiguator for specific parts of the forms. For example, to unravel whether a root is nominal or verbal; or on another transducer to reveal if a suffix is of one or another type, etc. So the final Quechua form has been disambiguated throughout the analysis process.

Training material also differs. Quechua system used about 3,000 disambiguated sentences. Mapuche training material (see 8.2, p. 8.2 and 8.3.1, p. 8.3.1) consists of less than the half of those sentences, 1,220 to be exact. In both cases, however, this is too little material to train data-driven systems well enough.

Comparison corpus are similar. Ours contains 170 sentences with 2,207 tokens (1,475 words + 732 punctuation marks) (see 8.1.3, p. 8.1.3), Quechua has 322 sentences containing 2,142 tokens [Ríos, A. 2015: 36] RefB:17 .

Although the procedure was also performed differently, the final counting (the addition of results) may serve as comparison data. In the Quechua case, both texts of the comparison corpus were tagged separately, to later compare them with each other. In the Mapuche case, by contrast, texts were tagged at once.

8.4.1 Quechua results

Quechua results are transferred and presented in the style used in this article, different from their original presentation, for comparison reasons707070Data are taken form table 2.13: Evaluation: Disambiguated Texts [Ríos, A. 2015: 38] RefB:17 ..

Data type RFTagger Morfette Quechua S.
Tokens 2,142 - 100% 2,142 - 100% 2,091 - 97.6%
Right 1,459 - 68.1% 1,505 - 70.2% 2,041 - 95.2%
Wrong 683 - 31.8% 637 - 29.7% 50 - 2.33%
Table 11: Analysis results. RFTagger / Morfette / Quechua system
RFTagger Morfette
Right Wrong Right Wrong
Quechua 68.11% 31.88% 70.26% 29.73%
Mapudüngun 49.12% 50.88% 53.43% 46.57%
Difference 18.99% 19% 16.83% 16.84%
Average difference 18.99% 16,83%
Table 12: Results RFTagger / Morfette - Quechua / Mapudüngun
RFTagger results.

table 11 shows that increasing the amount of training data (1,220 to 3,000 sentences), results improve consequently for both, RFTagger and Morfette.

RFTagger worked well with 68% of the Quechua forms, which is 19% more than with Mapuche ones. The percentage of wrongly analysed forms is also better, being 19% lower. Thus, performance of RFTagger with Quechua was 19% better than with Mapudüngun (see table 12). However, be aware of the differences between the two evaluation processes (see 8.4, p. 8.4).

Morfette results.

This tool also performs better for Quechua (compare results in table 12). As with Mapuche texts, Morfette performs better than RFTagger in the Quechua texts tagging.

Morfette results with Quechua texts were 16.8% better than with Mapuche texts (see table 12). It produced 16.8% more of correct analysis, and the same percentage less of incorrect analyses. The difference is smaller, as it is in the Mapuche texts analysis.

Recall that the percentage of correctly analysed words of our analyser was 95.62% (see Düngupeyem results., p. 8.3.2). table 11 shows, in the case of the Quechua system, 95.2% of correctness. Despite the differences between both systems (see 8.4, p. 8.4) results are similar. So, if Ríos considers her machine performs successfully based on her analyses results, we may go along with the same consideration for our system.

8.5 Not analysed!

In analyses of the control corpus (8.2, p. 8.2), made up of Smeets’ sentences [Smeets, I. 2008] RefB:21 , there were two unknown words. We analyse these cases below.

8.5.1 Smeets’ not analysed words.

eluwün-antü ’funeral day’

[Smeets, I. 2008: 402 (74)] RefB:21 . Presented this way, this word is a nominal compound. But we have not collected eluwün ’funeral’ as a noun because it is actually a nominalised verb:

Example 263

[Smeets, I. 2008: 404 (7)] RefB:21
el-uw-ün ’funeral’, lit: ’the leaving behind/going’
-TV.el_dejar-atrás_partir+REF.w31+PVN.n4

Example 264

[Smeets, I. 2008: 66 (48)] RefB:21
antü ’day’
-NN.antü_sol_día_tiempo

This is another type of compound that we have not encoded in our system. We think it is not worth to include it because, up to this moment, it has only appear two or three times. Adding variants of whatever form increases the ambiguity, which negatively affects the performance of the analyser.

mapuche-domo ’Mapuche woman’

[Smeets, I. 2008: 399 (18)] RefB:21 . This is another nominal compound that Smeets presents as made up two nouns. But these are actually three noun roots:

Example 265

[Smeets, I. 2008: 117] RefB:21
mapu-che ’land person’
-NN.mapu_tierra-NN.che_persona

Example 266

[Smeets, I. 2008: 76 (20)] RefB:21
domo ’woman’
-NN.domo_mujer

A compound of three roots is not contemplated in the system either, for the same reasons of the previous compound. But, a solution to successfully analyse these type of compounds may be to cascade another FST capable of analysing three roots compounds after the main FST is unable to do it, this way we do not increase ambiguity in the main analyser.

8.5.2 Zúñiga’s not analysed words.

In the analyses of the comparison corpus (8.2.1, p. 8.2.1), made up of Zúñiga’s sentences [Zúñiga, F. 2006: 266 - 288] RefB:24 , there were ten unknown words (puru ’to dance’ is repeated, so, actually nine). In this section we try to find out why these words could not successfully go through the analyser.

are-tu ’borrowed’

[Zúñiga, F. 2006: 280 (122)] RefB:24 . In Zú-ñiga and Augusta, F. RefB:03 , this root is an adjective. From the perspective of Spanish or English it should be the participle form of the verb. Smeets and Augusta collect it as verb are- ’to lend’, and when followed by the suffix -tu-, are-tu- ’to borrow’.

"An -n form occurs as an adjective denoting an attribute or quality of the modified noun" [Smeets, I. 2008: 190] RefB:21 , among other functions of the plain verbal noun. This lead us to think that aretu may be analysed as are-tu-Ø, where the plain verbal noun is elided, or realised as a null suffix.

A solution, then, to correctly analyse this form, would be to add the null form of the plain verbal noun to the list of suffixes. The problem is that this action would enormously increase the ambiguity, because every non verbal root allowed to take the verbaliser Ø, would be analysed as root and root+VRB.Ø36+PVN.Ø4.

ina-lef-nepe-n ’I startled wake up’

[Zúñiga, F. 2006: 283
(pewma)] RefB:24 . From the meaning Zúñiga gives to this word, we think that it is a short form for:

Example 267

ina-lef-ün nepe-n lit: ’I woke up on the run’
-AV.ina_a-través-IV.lef_correr-CR.IV
+PASS.nge23+PX.pe13+PVN.n4
-IV.nepe_despertar+IND1SG.n3

ngen-ko ’god of the waters’

[Zúñiga, F. 2006: 285 (we tripantü)] RefB:24 . It is the same case as eluwün-antü ’funeral day’ (see eluwün-antü ’funeral day’, p. 8.5.1). In Smeets is found as in the following example:

Example 268

[Smeets, I. 2008: 138 (41)] RefB:21
nge-n ko lit: ’owner/master of the water’
-TV.nge_tener+PVN.n4
-NN.ko_agua

puru ’to dance’

[Zúñiga, F. 2006: 287 (Pausa_Historia)] RefB:24 . We think it is the same case as are-tu, even though, this is a single root, and is not collected as ’danced’, not by Zúñiga nor by Augusta (see are-tu ’borrowed’, p. 8.5.2). So, it is probably:

Example 269

puru-Ø ’to dance’
-IV.puru_bailar+PVN.Ø4

purunenutuy

[Zúñiga, F. 2006: 287 (Pausa_Historia)] RefB:24 . We are not certain how Zúñiga translates it. The form enu inside this word is unknown for us, but we guess it may be:

Example 270

puru-nentu-tu-y ’get oneself dancing717171Maybe from the expression in Spanish ’sacar a bailar’ ’ask someone to dance with’.
-IV.puru_bailar-TV.entu_sacar-CR.TV
+RE.tu16+IND.y4+3.Ø3

taku-tu-mu-tu-y ’she sheltered herself’

[Zúñiga, F. 2006: 279 (108)] RefB:24 . If the suffix -mu- of the form corresponds to the 2nd person agent, slot 23, then this form is not possible following Smeets description. "The subject (s3) of a verb which takes the morpheme -mu- s23 indicates first person. The participant which is deleted from the situation indicated by a -mu- form must be second person. It cannot be first person because the subject marker indicates first person. The participant which is deleted from the situation cannot be third person (for then one would have used the passive marker -nge-), nor can it be included in the subject referent (for then one would have used the reflexive marker -w-) … The suffix -mu- is used when the total number of participants is greater than two. The number marker (slot 2) co-refers to the subject marker and may indicate singular, dual or plural" [Smeets, I. 2008: 268] RefB:21 .

Following Smeets, and fitting in Zúñiga’s translation, the verb should have been:

Example 271

taku-tu-nge-tu-y lit: ’she was sheltered by herself’
-TV.taku_cubrir+TR.tu33+PASS.nge23+RE.tu16
+IND.y4+3.Ø3

Another possibility is that -mu- corresponds to an alternative form for another suffix (that is not in our system) like -me-, thither, slot 20 or -m-, causative, slot 34.

Finally, if we add the suffix -u (dual) at the end of the form, we obtain an analysis, because this form implies the first person in its null form -Ø-, but we distance from the meaning given by Zúñiga:

Example 272

taku-tu-mu-tu-y-u ’you sheltered us both’
-TV.taku_cubrir+TR.tu33+2A.mu23+RE.tu16
+IND.y4+1.Ø3+DL.u2

uf-kün-tuku-pa-y ’they camped in memory’

[Zúñiga, F. 2006: 288 (Abuela_Voz)] RefB:24 . First, note that this word is at a poem. Then, following Zúñiga’s translation, roots and suffixes interpretations we do for this word are as follows:

Example 273

uf-kün-tuku-pa-y lit: ’they tight up and put memory/knowledge there’
-TV.uf_apretar_afirmar-TV.kim_saber_recordar
-TV.tuku_poner+HH.pa17+IND.y4+3.Ø3

The first reason for this analysis not being produced by our machine is the three roots stem, as we have explained in mapuche-domo ’Mapuche woman’ (p. 8.5.1). Then, we are guessing the stem to be composed as uf-kün-tuku-.

Smeets collects üfi- ’to become tight, to tighten’ [Smeets, I. 2008: 567] RefB:21 . Augusta, F. RefB:03 , uf-ün ’tighten the straws with bands (when roofing)’, uf-tüku-n ’tighten with tools’. But also üfü-n ’tighten with something to tie’. So uf- and üfi- are, very likely, variants of each other.

In Smeets’ dictionary are kim-tuku- ’to have known/un-derstood for some time’ and kim-tu- ’to remember’ [Smeets, I. 2008: 559] RefB:21 . Febrés’ dictionary presents kün-tüku-l-ün ’make someone else to remember’; kün-tüku-n ’to remember’; kün-tüku-pe-m ’the memory’ Febrés, A. RefB:03 . It is quite probable that kim- and kün- are also variants of each other.

tuku- and tüku- are undoubtedly variants or each other, and it can not be tü-ku- because there is no attested -ku- suffix.

wima-kütu-ye-nge-y ’she was whipped’

[Zúñiga, F. 2006: 281 (123)] RefB:24 . A more literal translation would be ’she was taken to be whipped all along’ if the parts forming this verb are the ones we suggest:

wima- ’dipstick, thin stick’.

-kütu- it seems to corresponds to a suffix that adds the sense of ’all along’.

Augusta defines it as ’suffix and post-position: (Variant used in Pangi). From (temporarily). kuyfi kütu ’from a long time’. Conjunction: and even, until’ Augusta, F. RefB:03 .

Valdivia recognises in it a locative sense, ’from and to; fa kütu tüye kütu ’from here to there” Valdivia, L. RefB:03 .

Smeets only recognises it as an adverb, küto kütu ’even, also’. There is a root, though, that Smeets collects as weñangkü- ’to get sad’ [Smeets, I. 2008: 573] RefB:21 . Augusta collects it as weñang- ’to have pain, annoyance, desire’, but also weñang-kü-n ’get sad’. So, maybe -kütu- is formed by two suffixes -kü-tu-, being -kü- this suffix of the ’all along’ sense, and -tu- the transitivizer suffix. This is not totally rare in Mapudüngun, see the case of llemay in the note to E67 (p. 67).

Concluding, the verb is made up of a complex stem, which is formed by a ’noun root + 1 or 2 suffixes + verb root’, possibly:

Example 274

wima-kü-tu-ye-nge-y
-NN.wima_vara+THR.kü35+TR.tu33
-TV.ye_traer_llevar+PASS.nge23+IND.y4+3.Ø3

Example 275

wima-kütu-ye-nge-y
-NN.wima_vara+THR.kütu35
-TV.ye_traer_llevar+PASS.nge23+IND.y4+3.Ø3

We have tagged THR the new suffix, from ’through’, that comes from the idea of ’all along’. An we have assigned it to slot 35, which seems to be a suitable position for this suffix, of course all of this is tentative.

witra-n-püra-may-a-n ’That raise it up’

[Zúñiga, F. 2006: 284 (Ngillañmawün)] RefB:24 . The division we propose for this verb would give a meaning like ’I rose it up in assent’ or ’ I rose up my assenting X’. This way, the stem is complex and formed by three roots, a not allowed analysis in our system. The analysis would be:

Example 276

witra-n-püra-may-a-n
-TV.witra_levantar+PVN.n4-IV.püra_subir
-TV.may_asentir-CR.TV+NRLD.a9+IND1SG.n3

9 Public user interface

In this section we briefly present the exploitation interface we have developed for open access to our analyser.

The URL to access it is:
http://www.chandia.net/dungupeyem

Figure 7: Analyser public web interface.
  1. Numbers in figure 7 mark:

  2. Text box to paste or type Mapuche words to be analysed.

  3. An e-mail field to add the user address in order to receive comments about unknown words. And the "analyse" button to submit the text.

  4. A field that allows to upload a .doc, .docx, .odt or .txt file to be analysed.

  5. A text field to input analyse glosses in order to generate Mapuche words.

  6. A link to a contact form in case the user needs some feedback from us.

Figure 8: Analysis results on screen.

Figure 8 shows the analysis results for the word amu-ke-fu-y ’he used to go’, where all tags are in blue and show the tag name when hovered.

When the user uploads a file to be analysed, the system gives a link back to download a text file containing all the analyses.

The user can review information about the tags on a complementary page called "Glosas del Düngupeyem". There is also a search block available where tags and suffixes may be queried.

Whenever a user submits an unknown word to the system, or uploads a file to analyse, we receive an e-mail with this information to check if there were some issues, and improve the system if necessary.

10 Conclusions

The FST analyser has been developed and it has proved to be a reliable tool, of course it is improvable, and it can extend its use to other tools, something that we have already started to do, but not explained in this article, which is devoted to the analyser implementation, specifically in the Mapuche verb treatment.

Throughout this article, the process and results have been described in detail to explain the quality, scale and precision of the system. We continue working on tools derived from the analyser, which is the basic system for current and future work on automatic processing of Mapudüngun.

Acknowledgements.
We deeply and sincerely thank Iñaki Alegria who have guide us along the confection of this article. He has spent countless hours of revision to improve and present this work in a clear, concise and understandable manner.

References

  • (1) Beesley, K. and Karttunen, L., Finite State Morphology. CSLI Studies in Computational Linguistics. CSLI Publications, Stanford, U. S. A. (2003).
  • (2) Castro, R. and Rios, A., Allin Qillqay! A Free Online Web spell checking Service for Quechua, in Ugaz Burga, J. E., Gonzales Sánchez, S. R., and Torres Guerra, C., editors, Memoria - VI Congreso Internacional de Computación y Telecomunicaciones (COMTEL) 2014, Fondo Editorial de la Universidad Inca Garcilaso de la Vega, Lima, Peru, pages 23–30 (2014).
  • (3) Chandía, A. et al. CORLEXIM. Corpus lexicográfico del mapudüngun. <http://corlexim.cl> [April, 2021]
  • (4) Chrupala, G. et al., Learning Morphology with Morfette. In Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., and Tapias, D., editors, Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008), European Language Resources Association (ELRA). Marrakech, Morocco, (2008).
  • (5) Corporación Nacional de Desarrollo Indígena. Gobierno de Chile, Temuco, Chile (2005).
  • (6) Febrés, A., Diccionario Araucano-Español ó sea Calepino Chileno-Hispano Por el P. Andrés Febrés de la Compañía de Jesus. Reproducido textualmente de la edición de Lima de 1765, por Juan M. Larsen. Con un Apéndice sobre las lenguas Quíchua, Aimará y Pampa. Buenos Aires: Juan A. Alsina (1882).
  • (7) Fernández-Garay, A. and Malvestitti, M., Formas no finitas del Mapudüngun en dos variedades de la Argentina. In IX Congreso de la Sociedad Argentina Lingüística: 1–14. Universidad Nacional de Córdoba. Córdoba, Argentina, (2002).
  • (8) Gasser, M. Computational Morphology and the Teaching of Indigenous Languages. Proceedings of the First Symposium on Teaching Indigenous Languages of Latin America. CLACS & MLCP, Indiana University Bloomington & Association for Teaching and Learning Indigenous Languages of Latin America (ATLILLA), (2011).
  • (9) Guevara, T. Las últimas familias i costumbres araucanas. Imprenta, litografía i encuadernación "Barcelona". Snatiago, Chile, (1913).
  • (10) Hulden, M., Fast approximate string matching with finite automata. Procesamiento del Lenguaje Natural, núm. 43. ISSN 1135-5948, pp. 57-64, (2009).
  • (11) Jurafsky, D., Definition of Minimum Edit Distance. CS 124: From Languages to Information. Stanford University. California, U. S. A. (2012).
  • (12) Lonkon, E., Morfología y Aspectos del Mapudüngun. Universidad Autónoma Metropolitana. Unidad Iztapalapa. México D. F., México, (2011).
  • (13) Lonkon, E., Políticas públicas de lengua y cultura aplicada al mapudüngun. El pueblo mapuche en el siglo XXI. Propuestas para un nuevo entendimiento entre culturas en Chile. Centro de Estudios Públicos. Santiago, Chile (2017).
  • (14) Mösbach, E., Vida y costubres de los indígena araucanos en la segunda mitad del siglo XIX. Imprenta Universitaria. Santiago, Chile (1936).
  • (15) Ragileo, A., Gramática del idioma Mapuche. Public domain, (1982).
  • (16) Ríos, A., Spell checking an agglutinative language: Quechua. In: 5th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, Poznań, Poland, 25 November 2011 - 27 November 2011, 51-55, (2011).
  • (17) Ríos, A., A Basic Language Technology Toolkit for Quechua. Faculty of Arts of the University of Zurich. Zurich, Switzerland, (2015).
  • (18) Sadowsky, S. et al., Illustrations of the IPA: Mapudüngun. Journal of the International Phonetic Association 43(1). 87–96. doi: 10.1017/S0025100312000369, (2013).
  • (19) Salas, A., El mapuche o araucano. Centro de Estudios Públicos, 2nd edition. Santiago, Chile, (2006).
  • (20)

    Schmid, H. and Laws, F., Estimation of conditional probabilities with decision trees and an application to fine-grained POS tagging. In Proceedings of the 22 nd International Conference on Computational Linguistics, volume 1 of COLING ’08, pages 777–784. Association for Computational Linguistics. Stroudsburg, U. S. A., (2008).

  • (21) Smeets, I., A Grammar of Mapuche. Mouton de Gruyter. Berlin, Germany. New York, U. S. A., (2008).
  • (22) Sochil (Sociedad Chilena de Lingüística). Encuentro para la Unificación del Alfabeto Mapuche. Proposiciones y Acuerdos. Arturo Hernández Sallés, Coordinador del Encuentro. Temuco: Pontificia Universidad Católica de Chile, (1986).
  • (23) Sochil (Sociedad Chilena de Lingüística). Alfabeto Mapuche Unificado. Temuco: Pontificia Universidad Católica de Chile, (1988).
  • (24) Zúñiga, F., Mapudüngun. El habla mapuche. Centro de Estudios Públicos. Santiago, Chile, (2006).

11 Annexes

11.1 Tags meaning

  • Parts of Speech (and other elements)

  • -AJ: Adjective

  • -AV: Adverb

  • -AP: Anaphoric pronoun

  • -CJ: Conjunction

  • -COLL: Collectivizer

  • -CR.IV: Intransitive verb compound

  • -CR.TV: Transitive verb compound

  • -DP: Demonstrative pronoun

  • -FE: Foreign expression

  • -IJ: Interjection

  • -IP: Interrogative pronoun

  • -IV: Intransitive verb

  • -LOC: Locative

  • -NN: Noun

  • -NBR: Digit (number)

  • -NG: Negation particle

  • -NU: Numeral

  • -ON: Onomatopoeia

  • -PCT: Punctuation mark

  • -PN: Proper name

  • -PP: Personal pronoun

  • -PPN: Possible proper noun

  • -PR: Preposition

  • -PT: Particle

  • -QC: Interrogative chum

  • -QT: Interrogative tunte

  • -RNNR: Reduplicated noun

  • -RONR: Reduplicated onomatopoeia

  • -RVBR: Reduplicated verb

  • -SP: Possessive pronoun

  • -TV: Transitive verb

  • -UNK: Unknown root or word

  • -XV: Auxiliary

  • Suffixes

  • +1: 1st person

  • +1A: 1st person agent

  • +2: 2nd person

  • +2A: 2nd person agent

  • +3: 3rd person

  • +ADJ: Adjectiviser

  • +ADJDO: Adjectiviser doable

  • +ADJQE: Adjectiviser quick and easy

  • +ADV: Adverbializer

  • +AFF: Affirmative

  • +AIML: Aimless/involuntary

  • +AVN: Agentive verbal noun

  • +BEN: Benefactive

  • +CA: Causative

  • +CF: Constant feature

  • +CIRC: Circular movement

  • +CND: Conditional

  • +CNI: Conditional in imperative

  • +CONT: Continuative

  • +CSVN: Completive subjective verbal noun

  • +DL: Dual

  • +DS12A: Dative subject, 1st or 2nd person agent

  • +DS3A: Dative subject, 3rd person agent

  • +DISTR: Distributive

  • +EDO: External direct object

  • +EX: Discontinuative (ex)

  • +EXP: Experience

  • +FAC: Factitive

  • +FORCE: Force

  • +GR: Group(alizer)

  • +HH: Hither

  • +IDO: Internal direct object

  • +IMM: Immediate

  • +IMP: Imperative

  • +IMP1SG: Imperative, 1st person, singular

  • +IMP2SG: Imperative, 2nd person, singular

  • +IMP3: Imperative, 3rd person

  • +IND: Indicative

  • +IND1SG: Indicative, 1st person, singular

  • +INST: Instrumental object

  • +INT: Intensive

  • +IO: Indirect object

  • +IPD: Impeditive

  • +ITR: Interruptive

  • +IVN: Instrumental verbal noun

  • +LOC: Locative

  • +MIO: More involved object

  • +NEG: Negation

  • +NOM: nominaliser

  • +NOMAG: nominaliser agent

  • +NOMPI: nominaliser place or instrument

  • +NRLD: Non-realised

  • +OO: Oblique object

  • +OVN: Objective verbal noun

  • +PASS: Passive

  • +PFPS: Perfect persistent

  • +PL: Plural

  • +PLAY: Play

  • +PLPF: Pluperfect

  • +PLR: Pluraliser

  • +PRPS: Progressive persistent

  • +PR: Progressive

  • +PS: Persistence

  • +PVN: Plain verbal noun

  • +PX: Proximity

  • +RE: Iterative/restorative

  • +REF: Reflexive/reciprocal

  • +REL: Relative

  • +REP: Reportative

  • +SAT: Satisfaction

  • +SFR: Stem formative

  • +SG: Singular

  • +SIM: Simulative

  • +ST: Stative

  • +SUD: Sudden

  • +SVN: Subjective verbal noun

  • +TEMP: Temporal

  • +TH: Thither

  • +TR: Transitivizer

  • +TVN: Transitive verbal noun

  • +VRB: Verbalizer

11.2 Suffixes by slot

11.2.1 Slots 1 to 15: inflectional suffixes

  • slot-01.aff: Dative Subject

  • : 1st or 2nd person agent "+DS12A"

  • -ew -mew -mu: 3rd person agent "+DS3A"

  • slot-02.aff: Number

  • : Singular (1st & 2nd indicative) "+SG"

  • -i: Singular (2nd indicative & conditional) "+SG"

  • -u: Dual "+DL"

  • -iñ: Plural (1st indicative & conditional) "+PL"

  • -ün: Plural (2nd all moods & 3rd indicative) "+PL"

  • slot-03.aff: Person

  • : 1st non-singular "+1" & 3rd "+3" persons

  • -i: 1st person "+1"

  • -m: 2nd person "+2"

  • -e: 3rd person "+3"

  • -ng: 3rd person non-singular "+3"

  • -y: 1st "+1" & 3rd "+3" persons agent

  • slot-03PTMT.aff: Portmanteau [mood, person, number]

  • -n: Indicative, 1st person, singular "+IND1SG"

  • -chi: Imperative, 1st person, singular "+IMP1SG"

  • -nge: Imperative, 2nd person, singular "+IMP2SG"

  • -pe: Imperative, 3rd person "+IMP3"

  • slot-04.aff: Mood

  • : Imperative "+IMP"

  • -l: Conditional "+CND" & "+CNI"

  • -y: Indicative "+IND"

  • slot-04NF.aff: inflectional nominalisers

  • : Objective "+OVN" & subjective "+SVN" verbal noun "+OVN"

  • -el: Objective verbal noun "+OVN"

  • -fiel: Transitive verbal noun "+TVN"

  • -lu: Subjective verbal noun "+SVN"

  • -m: Instrumental verbal noun "+IVN"

  • -n: Plain verbal noun "+PVN"

  • -t: Agentive verbal noun "+AVN"

  • -wma: Completive subjective verbal noun "+CSVN"

  • slot-05.aff: Constant feature

  • -ye: "+CF"

  • slot-06.aff: Internal & External direct objects

  • -e: Internal direct object "+IDO"

  • -fi: External direct object "+EDO"

  • slot-07.aff: Pluperfect

  • -mu: "+PLPF"

  • slot-08.aff: Impeditive

  • -fu: "+IPD"

  • slot-09.aff: Non-realised situation

  • -a: "+NRLD"

  • slot-10.aff: Negation

  • -ki: Imperative "+NEG"

  • -la: Indicative "+NEG"

  • -no: Conditional "+NEG"

  • slot-11.aff: Affirmative

  • -lle: "+AFF"

  • slot-12.aff: Reportative

  • -rke: "+REP"

  • slot-13.aff: Proximity

  • -pe: "+PX"

  • slot-14.aff: Constant feature

  • -ke: "+CF"

  • slot-15.aff: Pluperfect

  • -wye: "+PLPF"

11.2.2 Slots 16 to 27: mobile derivational suffixes

  • slot-16.aff: Repetitive/restorative & continuative

  • -ka: Continuative "+CONT"

  • -tu: Repetitive/restorative "+RE"

  • slot-16M.aff: Repetitive/restorative mobile

  • -tu: "+RE"

  • slot-17.aff: Hither & locative

  • -pa: Hither "+HH"

  • -pu: Locative "+LOC"

  • slot-17M.aff: Hither mobile

  • -pa "+HH"

  • slot-18.aff: Interruptive

  • -r: One interruption "+ITR"

  • -yeku: Repetead interruptions "+ITR"

  • slot-19.aff: Persistence

  • -we "+PS"

  • slot-20.aff: Thither

  • -me: "+TH"

  • slot-21.aff: Immediate & sudden

  • -fem: Immediate "+IMM"

  • -rume: Sudden "+SUD"

  • slot-22.aff: Play & simulation

  • -faluw: Simulation "+SIM"

  • -kantu: Play "+PLAY"

  • slot-23.aff: Passive, 1st & 2nd persons agent

  • -w: 1st person agent "+1A"

  • -mu: 2nd person agent "+2A"

  • -nge: Passive "+PASS"

  • slot-23M.aff: Passive mobile

  • -nge: "+PASS"

  • slot-24.aff: Pluraliser

  • -ye: "+PL"

  • slot-25.aff: Force & satisfaction

  • -fal: Force "+FORCE"

  • -ñmu: Satisfaction "+SAT"

  • slot-25M.aff: Force mobile

  • -fal "+FORCE"

  • slot-26.aff: Indirect object

  • -ñma "+IO"

  • slot-27.aff: Beneficiary

  • -el "+BEN"

11.2.3 Slots 28 to 36: fixed derivational suffixes

  • slot-28.aff: Stative & progressive

  • -le: Stative "+ST"

  • -meke: Progressive "+PR"

  • slot-28M.aff: Stative mobile

  • -le: "+ST"

  • slot-29.aff: More involved object

  • -l: "+MIO"

  • slot-30.aff: Circular movement & intensive

  • -iaw: Circular movement "+CIRC"

  • -tie: Intensive "+INT"

  • slot-31.aff: Reflexive/reciprocal

  • -w: "+REF"

  • slot-32.aff: Progressive persistent & perfect persistent

  • -künu: Perfect persistent "+PFPS"

  • -nie: Progressive persistent "+PRPS"

  • slot-33.aff: Trasitivizer & factitive

  • -ka: Factitive "+FAC"

  • -tu: Transitivizer "+TR"

  • slot-33M.aff: Transitivizer mobile

  • -tu: "+TR"

  • slot-34.aff: Causatives

  • -l: "+CA"

  • -m: "+CA"

  • slot-35.aff: Experience & oblique object

  • -ma: Experience "+EXP"

  • -ye: Oblique object "+OO"

  • slot-36S.aff: Stem formative

  • : "+SFR"

  • -nge: "+SFR"

  • -tu: "+SFR"

  • -ye: "+SFR"

  • slot-36V.aff: Verbalisers

  • : "+VRB"

  • -l: "+VRB"

  • -nge: "+VRB"

  • -ntu: "+VRB"

  • -tu: "+VRB"

  • -ye: "+VRB"

11.2.4 Nominal suffixes

  • CC.aff: Class-changing

  • -chi: Adjectiviser "+ADJ"

  • -tu: Adverbializer "+ADV"

  • -ñma: Adverbializer "+ADV"

  • INST.aff: Instrumental

  • -mew -mu: "+INST"

  • NCC.aff: Non class-changing

  • -em: Ex (discontinuative) "+EX"

  • -ke: Distributive "+DISTR"

  • -ntu: Group "+GR"

  • -rke: Reportative "+REP"

  • -we: Temporal "+TEMP"

  • -wen: Relative "+REL"

  • NOM.aff: nominalisers

  • : nominaliser "+NOM"

  • -fal: Doable "+ADJDO"

  • -fe: Agentive "+NOMAG"

  • -nten: Quick & easy "+ADJQE"

  • -we: Place or instrument "+NOMPI"

11.2.5 Other suffixes

  • OS.aff: Aimless/involuntary

  • -püda: "+AIML"

11.2.6 Examples by suffixes

Verbalizer -Ø- (slot 36):

E25, E26, E55, E59, E61, E62, E64, E69, E128, E136, E141, E170, E192, E200, E205, E209, E225, E226, E233, E237.

Verbalizer -l- (slot 36):

E50, E65.

Verbalizer -nge- (slot 36):

E2, E5, E23, E47, E49, E52, E53, E56, E57, E63, E145, E193, E194.

Verbalizer -ntu- (slot 36):

E66.

Verbalizer -tu- (slot 36):

E26, E30, E51, E52, E53, E