Parser Training with Heterogeneous Treebanks

by   Sara Stymne, et al.

How to make the most of multiple heterogeneous treebanks when training a monolingual dependency parser is an open question. We start by investigating previously suggested, but little evaluated, strategies for exploiting multiple treebanks based on concatenating training sets, with or without fine-tuning. We go on to propose a new method based on treebank embeddings. We perform experiments for several languages and show that in many cases fine-tuning and treebank embeddings lead to substantial improvements over single treebanks or concatenation, with average gains of 2.0--3.5 LAS points. We argue that treebank embeddings should be preferred due to their conceptual simplicity, flexibility and extensibility.



There are no comments yet.


page 1

page 2

page 3

page 4


Learning to Recognize Code-switched Speech Without Forgetting Monolingual Speech Recognition

Recently, there has been significant progress made in Automatic Speech R...

Recipes for Adapting Pre-trained Monolingual and Multilingual Models to Machine Translation

There has been recent success in pre-training on monolingual data and fi...

Many Languages, One Parser

We train one multilingual model for dependency parsing and use it to par...

P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks

Prompt tuning, which only tunes continuous prompts with a frozen languag...

Structured Prompt Tuning

We propose structured prompt tuning, a simple and effective method to im...

On Multilingual Training of Neural Dependency Parsers

We show that a recently proposed neural dependency parser can be improve...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In this paper we investigate how to train monolingual parsers in the situation where several treebanks are available for a single language. This is quite a common occurrence; in release 2.1 of the Universal Dependencies (UD) treebanks Nivre et al. (2017), 25 languages have more than one treebank. These treebanks can differ in several respects: they can contain material from different language variants, domains, or genres, and written or spoken material. Even though the UD project provides guidelines for consistent annotation, treebanks can still differ with respect to annotation choices, consistency and quality of annotation. Some treebanks are thoroughly checked by human annotators, whereas others are based entirely on automatic conversions. All this means that it is often far from trivial to combine multiple treebanks for the same language.

The 2017 CoNLL Shared Task on Universal Dependency Parsing Zeman et al. (2017) included 15 languages with multiple treebanks. An additional parallel test set of 1000 sentences, PUD, was also made available for a selection of languages. Most of the participating teams did not take advantage of the multiple treebanks, however, and simply trained one model per treebank instead of one model per language. There were a few exceptions to this rule, but these teams typically did not investigate the effect of their proposed strategies in detail.

In this paper we begin by performing a thorough investigation of previously proposed strategies for training with multiple treebanks for the same language. We then propose a novel method, based on treebank embeddings. Our new technique has the advantage of producing a single flexible model for each language, regardless of the number of treebanks. We show that this method leads to substantial improvements for many languages. Of the competing methods, training on the concatenation of treebanks, followed by fine-tuning for each treebank, also performed well, but this method results in longer training times and necessitates multiple unwieldy models per language.

2 Training with Multiple Treebanks

The most obvious way to combine treebanks for a particular language, provided that they use the same annotation scheme, is simply to concatenate the training sets. This has the advantage that it does not require any modifications to the parser itself, and it produces a single model that can be directly used for any input from the language in question. bjorkelund-EtAl:2017:K17-3 and das-zaffar-sarkar:2017:K17-3 used this strategy to parse the PUD test sets in the 2017 CoNLL Shared Task. Little details are given on the results, but while it was successful on dev data for most languages, results were mixed on the actual PUD test sets. For the two Norwegian language variants, concatenation has been proposed Velldal et al. (2017), but it hurts results unless combined with machine translation.

Training on concatenated treebanks can be improved by a subsequent fine-tuning step. In this set-up, after training the model on concatenated data, it is refined for each treebank by training only on its own training set for a few additional epochs. This enables the models to learn differences between treebanks, but it requires more training, and results in separate models for each treebank. When the parser is applied to new data, there is thus a choice of which fine-tuned version to use. This approach was used by che-EtAl:2017:K17-3 and shi-EtAl:2017:K17-3 for languages with multiple treebanks in the CoNLL 2017 Shared Task. che-EtAl:2017:K17-3 apply fine-tuning to all but the largest treebank for each language, and show average gains of 1.8 LAS for a subset of nine treebanks. shi-EtAl:2017:K17-3 show that the choice of treebank for parsing the PUD test set is important, but do not have any specific evaluation of the effect of fine-tuning.

Another approach, not explored in this paper, is shared gated adversarial networks, proposed by sato-EtAl:2017:K17-3 for the CoNLL 2017 Shared Task. They use treebank prediction as an adversarial task. In this model, treebank-specific BiLSTMs are constructed for all treebanks in addition to a shared BiLSTM which is used both for parsing and for the adversarial task. This method requires knowing at test time which treebank the input belongs to. sato-EtAl:2017:K17-3 show that this strategy can give substantial improvements, especially for small treebanks. For large treebanks, however, there are mostly no or only minor improvements.

Our approach for taking advantage of multiple treebanks is to use a treebank embedding to represent the treebank to which a sentence belongs. In our proposed model, all parameters of the model are shared; the treebank embedding facilitates soft sharing between treebanks at the word level, and allows the parser to learn treebank-specific phenomena. At test time, a treebank identifier has to be given for the input data. A key benefit of using treebank embeddings is that we can train a single model for each language using all available data while remaining sensitive to the differences between treebanks. The addition of treebank embeddings requires only minor modifications to the parser (see section 3.1). To the best of our knowledge this approach is novel when applied to the monolingual case as treebank embeddings. The most similar approach we have found in the literature is lim-poibeau:2017:K17-3, who used one-hot treebank representations to combine data for improving monolingual parsing for three tiny treebanks, with improvements of 0.6–1.9 LAS. It is also related to work on domain embeddings for machine translation Kobus et al. (2017), and language embeddings for parsing Ammar et al. (2016).

We previously used a similar architecture for combining languages with very small training sets with additional languages de Lhoneux et al. (2017a). Language embeddings have also been explored for other cross-lingual tasks such as language modeling Tsvetkov et al. (2016); Östling and Tiedemann (2017) and POS-tagging Bjerva and Augenstein (2018). Cross-lingual parsing, however, often requires substantially more complex models. They typically include features such as multilingual word embeddings Ammar et al. (2016), linguistic re-write rules Aufrant et al. (2016), or machine translation Tiedemann (2015). Unlike much work on cross-lingual parsing, we do not focus on a low-resource scenario.

3 Experimental Setup

We perform experiments for 24 treebanks from 9 languages, using UUParser (de Lhoneux et al., 2017a, b). We compare concatenation (concat), concatenation with fine-tuning (c+ft), and treebank embeddings (tb-emb). In addition we compare these results to using only single treebanks for training (single). While some of these methods were previously suggested in the literature, no proper evaluation and comparison between them has been performed. For the PUD test data, there is no corresponding training set, so we need to choose a model or set a treebank embedding based on some other treebank. We call this a proxy treebank.

For evaluation we use labeled attachment score (LAS). Significance testing is performed using a randomization test, with the script from the CoNLL 2017 Shared Task.111

3.1 The Parser

We use UUParser222 de Lhoneux et al. (2017a), which is based on the transition-based parser of Kiperwasser and Goldberg (2016), and adapted to UD. It uses the arc-hybrid transition system from Kuhlmann et al. (2011) extended with a Swap transition and a static-dynamic oracle, as described in de Lhoneux et al. (2017b). This model allows the construction of non-projective dependency trees Nivre (2009).

A configuration is represented by a feature function

over a subset of its elements and, for each configuration, transitions are scored by a classifier. In this case, the classifier is a multi-layer perceptron (MLP) and

is a concatenation of the BiLSTM vectors

of words on top of the stack and at the beginning of the buffer. The MLP scores transitions together with the arc labels for transitions that involve adding an arc.

For an input sentence of length with words , the parser creates a sequence of vectors , where the vector representing is the concatenation of a word embedding and a character vector, obtained by running a BiLSTM over the characters of :

Note that no POS-tags or morphological features are used in this parser.

In the tb-emb setup, we also concatenate a treebank embedding to the representation of :

Finally, each input element is represented by a BiLSTM vector, :

All embeddings are initialized randomly, and trained together with the BiLSTMs and MLP. For hyperparameter settings we used default values from delhoneux-EtAl:2017:K17-3. The dimension of the treebank embedding is set to 12 in our experiments; we saw only small and inconsistent changes when varying the number of dimensions. We train the parser for 30 epochs per setting. For

c+ft we apply fine-tuning for an additional 10 epochs for each treebank. We pick the best epoch based on LAS score on the dev set, using average dev scores when training on more than one treebank, and apply the model from this epoch to the test data.

3.2 Data

We performed all experiments on UD version 2.1 treebanks Nivre et al. (2017), using gold sentence and word segmentation. We selected 9 languages, based on the criteria that they should have at least two treebanks with fully available training data and a PUD test set. The sizes of the training corpora for the 9 languages are shown in Table 1

. The situation is quite different across languages with either treebanks of roughly the same size, as for Spanish, or very skewed data sizes with a mix of large and small treebanks, as for Czech. In all cases we use all available data, except for Czech, where we randomly choose a maximum of 15,000 sentences per treebank per epoch for efficiency reasons.

Same treebank test set PUD test set
Language Treebank Size single concat c+ft tb-emb single concat c+ft tb-emb
Czech PDT 68495 86.7 87.5 88.3 87.2 81.7 81.7 81.6 81.2
CAC 23478 86.0 87.8 88.1 88.5 75.0 81.3 81.1
FicTree 10160 84.3 89.3 89.5 89.2 66.1 79.8 80.3
CLTT 860 72.5 86.2 86.9 86.0 42.1 80.8 80.9
English EWT 12543 82.2 82.1 82.5 83.0 80.7 80.0 81.7 81.9
LinES 2738 72.1 76.7 77.3 77.3 62.6 75.9 74.5
ParTUT 1781 80.5 83.5 85.4 85.7 68.0 78.1 76.9
Finnish FTB 14981 76.4 74.4 80.1 80.6 46.7 73.0 54.6 53.1
TDT 12217 78.1 70.6 80.6 80.3 78.6 81.3 80.9
French FTB 14759 83.2 83.2 83.9 84.1 72.0 79.4 76.7 74.1
GSD 14554 84.5 84.1 85.3 85.6 79.1 80.2 80.3
Sequoia 2231 84.0 86.0 89.8 89.1 69.5 78.1 77.6
ParTUT 803 79.8 80.5 89.1 90.3 63.4 78.8 77.5
Italian ISDT 12838 87.7 87.9 87.7 87.6 85.4 86.0 85.7 86.0
PoSTWITA 2808 71.4 76.7 76.8 77.0 68.5 85.7 85.3
ParTUT 1781 83.4 89.2 89.3 88.8 77.4 85.8 86.1
Portuguese GSD 9664 88.3 87.3 89.0 89.1 74.0 76.8 75.2 74.9
Bosque 8331 84.7 84.2 86.2 86.3 75.2 77.5 77.6
Russian SynTagRus 48814 90.2 89.4 90.4 90.4 66.0 68.7 66.3 66.4
GSD 3850 74.7 73.4 79.8 80.8 70.1 77.6 78.0
Spanish AnCora 14305 87.2 86.2 87.5 87.6 75.2 79.9 77.7 76.4
GSD 14187 84.7 83.0 85.8 86.2 79.8 80.8 80.9
Swedish Talbanken 4303 79.6 79.1 80.2 80.6 70.3 72.0 73.2 73.6
LinES 2738 74.3 76.8 77.3 77.1 64.0 70.0 69.0
Average 81.4 82.7 84.9 84.9 77.9 77.5 80.0 80.1
Table 1: LAS scores when testing on the training treebank and on the PUD test set with training treebank as proxy. For each test set, the best result is marked with bold. Treebank size is given as number of sentences in the training data. Statistically significant differences, at the 0.05-level, from single are marked with +, from concat with and from both these systems with *. For clarity, significance for PUD is only shown for the proxy treebank with the highest score.

Ett & vittne & berättade & för & polisen & att & offret & hade & attackerat & den & misstänkte & i & april & .
A & witness & related & for & the-police & that & the-victim & had & attacked & the & suspected & in & April & .
[edge unit distance=0.7em]21det [edge unit distance=0.7em]32nsubj [edge unit distance=0.7em]54case [edge unit distance=0.7em]35obl [edge unit distance=0.7em]96mark [edge unit distance=0.7em]97nsubj [edge unit distance=0.7em]98aux [edge unit distance=0.5em, edge style=thick,blue]39ccomp [edge below, edge style=dashed,thick,red, edge unit distance=0.3em]59acl:relcl [edge unit distance=0.7em]1110det [edge unit distance=0.7em]911obj [edge unit distance=0.7em]1312case [edge unit distance=0.75em]913obl [edge unit distance=0.37em]314punct

Figure 1: Example sentence from the Swedish PUD treebank with parsing error represented by dashed arc. Translation: “A witness told the police that the victim had attacked the suspect in April.”

4 Results

Table 1 shows the results on the test sets of each training treebank and on the PUD test sets. Overall we observe substantial gains when using either c+ft or tb-emb. On average both c+ft and tb-emb beat single by 3.5 LAS points and concat by over 2.0 LAS points when testing on the test sets of the treebanks used for training, and both methods beat both baselines by over 2.0 LAS points for the PUD test set, if we consider the best proxy treebank.

We see positive gains across many scenarios when using c+ft and tb-emb. First, there are gains for both balanced and unbalanced data sizes, as in the cases of Spanish and French, respectively. Secondly, there are cases with different language variants, as for Portuguese, and different domains, as for Finnish where FTB only contains grammar examples and TDT contains a mix of domains. There are also cases of known differences in annotation choices, as for the Swedish treebanks.

When the data is very skewed, as for Russian, the effect of adding a small treebank to a large one is minor, as expected. While our results are not directly comparable to the adversarial learning in sato-EtAl:2017:K17-3 who used a different parser and test set, the improvements of c+ft and tb-emb are typically at least on par with and often larger than their improvements. While our improvements are, unsurprisingly, largest for smaller treebanks, we do also see some improvements for large treebanks, in contrast to sato-EtAl:2017:K17-3.

Some variation can be observed between languages. In two cases, Italian ISDT and Czech PUD, concat performs marginally better than the more advanced methods, but these differences are not statistically significant. In several cases, especially for small treebanks, concat helps noticeably over single, whereas it actually hurts for Finnish and Russian. It is, however, nearly always better to combine treebanks in some way than to use only a single treebank. The differences between the two best methods, c+ft and tb-emb are typically small and not statistically significant, with the exception of Czech PDT, and for some of the small proxy treebanks for PUD.

The PUD test set can be seen as an example of applying the proposed models to unseen data, without matching training data. For all languages, except Czech, the results for c+ft and tb-emb with the best proxy treebank are significantly better than the equivalent result for single, and for six of the nine languages, tb-emb performs significantly better than concat. It is clear that some treebanks are bad fits to PUD, most notably Finnish FTB and Russian SynTagRus. However, even when a treebank is a bad fit, tb-emb and c+ft can still improve substantially over using only the single model for the treebank with the best fit, as for Russian where there is a gain of nearly 8 LAS points for tb-emb over single, when using GSD as a proxy. For some languages, however, most notably Italian, the choice of proxy treebank makes little difference for tb-emb and c+ft. It is also interesting to see that in many cases it is not the largest treebank that is the best proxy for PUD. The large difference in results for PUD, depending on which treebank was used as proxy, also seems to point at potential inconsistencies in the UD annotation for several languages.

5 Error Analysis

To complement the LAS scores, we performed a small manual error analysis for Swedish, looking at the results for the PUD data, when translated using different methods and proxy treebanks. The two Swedish treebanks, Talbanken and LinES, are known to differ in the annotation of a few constructions, notably relative clauses and prepositions that take subordinate clauses as complements. The error analysis reveals that the treebank embedding approach allows the model to learn the distinctive “style” of each treebank, while concatenation, even with fine-tuning, results in more inconsistencies in the output. A typical example is shown in Figure 1. When trained with treebank embeddings (and Talbanken as the proxy treebank), the parser produces the correct tree depicted above the sentence. When trained with fine-tuning instead, the parser incorrectly analyzes the subordinate clause as a relative clause (as shown by the dashed arc below the sentence), because the mark relation is also used for relative pronouns in the LinES treebank, despite the fact that such structures never occur in Talbanken.

6 Conclusion and Future Work

We have conducted the first large-scale study on how best to combine multiple treebanks for a single language, when all treebanks use the same annotation scheme but may be heterogeneous with respect to domain, genre, size, language variant, annotation style, and quality, as is the case for many languages in the UD project. We propose using treebank embeddings, which represent the treebank a sentence comes from. This method is simple, effective, and flexible, and performs on par with a previously suggested method of using concatenation in combination with fine-tuning, which, however, requires longer training, and produces more models.

We show that both these methods give substantial gains for a variety of languages, including different scenarios with respect to the mix of available treebanks. Our results are also at least on par with a previously proposed, but more complex model, based on adversarial learning Sato et al. (2017). To improve parsing accuracy, it is certainly worth combining multiple treebanks, when available, for a language, using more sophisticated methods than simple concatenation. We recommend the treebank embedding model due to its simplicity.

The proposed methods work well with a transition-based parser with BiLSTM feature extractors without POS-tags or pre-trained embeddings. In future work, we want to investigate how these methods interact with other parsers, and if the combination methods are useful also for tasks like POS-tagging and morphology prediction.

We did not yet investigate methods for choosing a proxy treebank when parsing new data. The results on the PUD test set could indicate which treebank is likely to be the best proxy for the languages explored here. Other factors that could be taken into account when making this choice include degree of domain match and treebank quality. The user may also simply choose the desired annotation style by selecting the corresponding proxy treebank. For the tb-emb

approach, interpolation of the various treebank embeddings is another possibility.

In the current paper, we explore only the monolingual case, using several treebanks for a single language. Preliminary experiments show that we can combine treebank and language embeddings and add other languages to the mix. Including closely related languages typically gives additional gains, which we will explore in future work.


We gratefully acknowledge funding from the Swedish Research Council (P2016-01817) and computational resources on the Taito-CSC cluster in Helsinki from NeIC-NLPL (


  • Ammar et al. (2016) Waleed Ammar, George Mulcaire, Miguel Ballesteros, Chris Dyer, and Noah Smith. 2016. Many languages, one parser. Transactions of the Association of Computational Linguistics, 4:431–444.
  • Aufrant et al. (2016) Lauriane Aufrant, Guillaume Wisniewski, and François Yvon. 2016. Zero-resource dependency parsing: Boosting delexicalized cross-lingual transfer with linguistic knowledge. In Proceedings of the 26th International Conference on Computational Linguistics (COLING), pages 119–130, Osaka, Japan.
  • Bjerva and Augenstein (2018) Johannes Bjerva and Isabelle Augenstein. 2018. Tracking typological traits of Uralic languages in distributed language representations. In Proceedings of the Fourth International Workshop on Computatinal Linguistics of Uralic Languages, pages 78–88, Helsinki, Finland.
  • Björkelund et al. (2017) Anders Björkelund, Agnieszka Falenska, Xiang Yu, and Jonas Kuhn. 2017.

    IMS at the CoNLL 2017 UD shared task: CRFs and perceptrons meet neural networks.

    In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pages 40–51, Vancouver, Canada.
  • Che et al. (2017) Wanxiang Che, Jiang Guo, Yuxuan Wang, Bo Zheng, Huaipeng Zhao, Yang Liu, Dechuan Teng, and Ting Liu. 2017. The hit-scir system for end-to-end parsing of universal dependencies. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pages 52–62, Vancouver, Canada.
  • Das et al. (2017) Ayan Das, Affan Zaffar, and Sudeshna Sarkar. 2017. Delexicalized transfer parsing for low-resource languages using transformed and combined treebanks. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pages 182–190, Vancouver, Canada.
  • Kiperwasser and Goldberg (2016) Eliyahu Kiperwasser and Yoav Goldberg. 2016. Simple and accurate dependency parsing using bidirectional LSTM feature representations. Transactions of the Association of Computational Linguistics, 4:313–327.
  • Kobus et al. (2017) Catherine Kobus, Josep Crego, and Jean Senellart. 2017.

    Domain control for neural machine translation.


    Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP’17

    , pages 372–378, Varna, Bulgaria.
  • Kuhlmann et al. (2011) Marco Kuhlmann, Carlos Gómez-Rodríguez, and Giorgio Satta. 2011. Dynamic programming algorithms for transition-based dependency parsers. In Proceedings of the 49th Annual Meeting of the ACL: Human Language Technologies, pages 673–682, Portland, Oregon, USA.
  • de Lhoneux et al. (2017a) Miryam de Lhoneux, Yan Shao, Ali Basirat, Eliyahu Kiperwasser, Sara Stymne, Yoav Goldberg, and Joakim Nivre. 2017a. From raw text to universal dependencies - look, no tags! In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pages 207–217, Vancouver, Canada.
  • de Lhoneux et al. (2017b) Miryam de Lhoneux, Sara Stymne, and Joakim Nivre. 2017b. Arc-hybrid non-projective dependency parsing with a static-dynamic oracle. In Proceedings of the 15th International Conference on Parsing Technologies, pages 99–104, Pisa, Italy.
  • Lim and Poibeau (2017) KyungTae Lim and Thierry Poibeau. 2017. A system for multilingual dependency parsing based on bidirectional lstm feature representations. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pages 63–70, Vancouver, Canada.
  • Nivre (2009) Joakim Nivre. 2009. Non-projective dependency parsing in expected linear time. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 351–359, Suntec, Singapore.
  • Nivre et al. (2017) Joakim Nivre, Željko Agić, Lars Ahrenberg, Lene Antonsen, Maria Jesus Aranzabe, Masayuki Asahara, Luma Ateyah, Mohammed Attia, Aitziber Atutxa, Liesbeth Augustinus, Elena Badmaeva, Miguel Ballesteros, Esha Banerjee, Sebastian Bank, Verginica Barbu Mititelu, John Bauer, Kepa Bengoetxea, Riyaz Ahmad Bhat, Eckhard Bick, Victoria Bobicev, Carl Börstell, Cristina Bosco, Gosse Bouma, Sam Bowman, Aljoscha Burchardt, Marie Candito, Gauthier Caron, Gülşen Cebiroğlu Eryiğit, Giuseppe G. A. Celano, Savas Cetin, Fabricio Chalub, Jinho Choi, Silvie Cinková, Çağrı Çöltekin, Miriam Connor, Elizabeth Davidson, Marie-Catherine de Marneffe, Valeria de Paiva, Arantza Diaz de Ilarraza, Peter Dirix, Kaja Dobrovoljc, Timothy Dozat, Kira Droganova, Puneet Dwivedi, Marhaba Eli, Ali Elkahky, Tomaž Erjavec, Richárd Farkas, Hector Fernandez Alcalde, Jennifer Foster, Cláudia Freitas, Katarína Gajdošová, Daniel Galbraith, Marcos Garcia, Moa Gärdenfors, Kim Gerdes, Filip Ginter, Iakes Goenaga, Koldo Gojenola, Memduh Gökırmak, Yoav Goldberg, Xavier Gómez Guinovart, Berta Gonzáles Saavedra, Matias Grioni, Normunds Grūzītis, Bruno Guillaume, Nizar Habash, Jan Hajič, Jan Hajič jr., Linh Hà Mỹ, Kim Harris, Dag Haug, Barbora Hladká, Jaroslava Hlaváčová, Florinel Hociung, Petter Hohle, Radu Ion, Elena Irimia, Tomáš Jelínek, Anders Johannsen, Fredrik Jørgensen, Hüner Kaşıkara, Hiroshi Kanayama, Jenna Kanerva, Tolga Kayadelen, Václava Kettnerová, Jesse Kirchner, Natalia Kotsyba, Simon Krek, Veronika Laippala, Lorenzo Lambertino, Tatiana Lando, John Lee, Phương Lê Hồng, Alessandro Lenci, Saran Lertpradit, Herman Leung, Cheuk Ying Li, Josie Li, Keying Li, Nikola Ljubešić, Olga Loginova, Olga Lyashevskaya, Teresa Lynn, Vivien Macketanz, Aibek Makazhanov, Michael Mandl, Christopher Manning, Cătălina Mărănduc, David Mareček, Katrin Marheinecke, Héctor Martínez Alonso, André Martins, Jan Mašek, Yuji Matsumoto, Ryan McDonald, Gustavo Mendonça, Niko Miekka, Anna Missilä, Cătălin Mititelu, Yusuke Miyao, Simonetta Montemagni, Amir More, Laura Moreno Romero, Shinsuke Mori, Bohdan Moskalevskyi, Kadri Muischnek, Kaili Müürisep, Pinkey Nainwani, Anna Nedoluzhko, Gunta Nešpore-Bērzkalne, Luong Nguyen Thị, Huyen Nguyen Thị Minh, Vitaly Nikolaev, Hanna Nurmi, Stina Ojala, Petya Osenova, Robert Östling, Lilja Øvrelid, Elena Pascual, Marco Passarotti, Cenel-Augusto Perez, Guy Perrier, Slav Petrov, Jussi Piitulainen, Emily Pitler, Barbara Plank, Martin Popel, Lauma Pretkalniņa, Prokopis Prokopidis, Tiina Puolakainen, Sampo Pyysalo, Alexandre Rademaker, Loganathan Ramasamy, Taraka Rama, Vinit Ravishankar, Livy Real, Siva Reddy, Georg Rehm, Larissa Rinaldi, Laura Rituma, Mykhailo Romanenko, Rudolf Rosa, Davide Rovati, Benoît Sagot, Shadi Saleh, Tanja Samardžić, Manuela Sanguinetti, Baiba Saulīte, Sebastian Schuster, Djamé Seddah, Wolfgang Seeker, Mojgan Seraji, Mo Shen, Atsuko Shimada, Dmitry Sichinava, Natalia Silveira, Maria Simi, Radu Simionescu, Katalin Simkó, Mária Šimková, Kiril Simov, Aaron Smith, Antonio Stella, Milan Straka, Jana Strnadová, Alane Suhr, Umut Sulubacak, Zsolt Szántó, Dima Taji, Takaaki Tanaka, Trond Trosterud, Anna Trukhina, Reut Tsarfaty, Francis Tyers, Sumire Uematsu, Zdeňka Urešová, Larraitz Uria, Hans Uszkoreit, Sowmya Vajjala, Daniel van Niekerk, Gertjan van Noord, Viktor Varga, Eric Villemonte de la Clergerie, Veronika Vincze, Lars Wallin, Jonathan North Washington, Mats Wirén, Tak-sum Wong, Zhuoran Yu, Zdeněk Žabokrtský, Amir Zeldes, Daniel Zeman, and Hanzhi Zhu. 2017. Universal dependencies 2.1. LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University.
  • Östling and Tiedemann (2017) Robert Östling and Jörg Tiedemann. 2017. Continuous multilinguality with language vectors. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 644–649, Valencia, Spain.
  • Sato et al. (2017) Motoki Sato, Hitoshi Manabe, Hiroshi Noji, and Yuji Matsumoto. 2017. Adversarial training for cross-domain universal dependency parsing. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pages 71–79, Vancouver, Canada.
  • Shi et al. (2017) Tianze Shi, Felix G. Wu, Xilun Chen, and Yao Cheng. 2017. Combining global models for parsing universal dependencies. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pages 31–39, Vancouver, Canada.
  • Tiedemann (2015) Jörg Tiedemann. 2015. Cross-lingual dependency parsing with universal dependencies and predicted PoS labels. In Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015), pages 340–349, Uppsala, Sweden. Uppsala University, Uppsala, Sweden.
  • Tsvetkov et al. (2016) Yulia Tsvetkov, Sunayana Sitaram, Manaal Faruqui, Guillaume Lample, Patrick Littell, David Mortensen, Alan W Black, Lori Levin, and Chris Dyer. 2016. Polyglot neural language models: A case study in cross-lingual phonetic representation learning. In Proceedings of the 2016 Conference of the NAACL: Human Language Technologies, pages 1357–1366, San Diego, California, USA.
  • Velldal et al. (2017) Erik Velldal, Lilja Øvrelid, and Petter Hohle. 2017. Joint UD parsing of Norwegian Bokmål and Nynorsk. In Proceedings of the 21st Nordic Conference on Computational Linguistics (NODALIDA’17), pages 1–10, Gothenburg, Sweden.
  • Zeman et al. (2017) Daniel Zeman, Martin Popel, Milan Straka, Hajič Jan, Joakim Nivre, Filip Ginter, Juhani Luotolahti, Sampo Pyysalo, Slav Petrov, Martin Potthast, Francis Tyers, Elena Badmaeva, Memduh Gokirmak, Anna Nedoluzhko, Silvie Cinkova, Jan Hajič jr., Jaroslava Hlavacova, Václava Kettnerová, Zdenka Uresova, Jenna Kanerva, Stina Ojala, Anna Missilä, Christopher D. Manning, Sebastian Schuster, Siva Reddy, Dima Taji, Nizar Habash, Herman Leung, Marie-Catherine de Marneffe, Manuela Sanguinetti, Maria Simi, Hiroshi Kanayama, Valeria dePaiva, Kira Droganova, Héctor Martínez Alonso, Çağrı Çöltekin, Umut Sulubacak, Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Georg Rehm, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Michael Mandl, Jesse Kirchner, Hector Fernandez Alcalde, Jana Strnadová, Esha Banerjee, Ruli Manurung, Antonio Stella, Atsuko Shimada, Sookyoung Kwak, Gustavo Mendonca, Tatiana Lando, Rattima Nitisaroj, and Josie Li. 2017. CoNLL 2017 shared task: Multilingual parsing from raw text to universal dependencies. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pages 1–19, Vancouver, Canada.