Log In Sign Up

ELIT: Emory Language and Information Toolkit

by   Han He, et al.
Emory University

We introduce ELIT, the Emory Language and Information Toolkit, which is a comprehensive NLP framework providing transformer-based end-to-end models for core tasks with a special focus on memory efficiency while maintaining state-of-the-art accuracy and speed. Compared to existing toolkits, ELIT features an efficient Multi-Task Learning (MTL) model with many downstream tasks that include lemmatization, part-of-speech tagging, named entity recognition, dependency parsing, constituency parsing, semantic role labeling, and AMR parsing. The backbone of ELIT's MTL framework is a pre-trained transformer encoder that is shared across tasks to speed up their inference. ELIT provides pre-trained models developed on a remix of eight datasets. To scale up its service, ELIT also integrates a RESTful Client/Server combination. On the server side, ELIT extends its functionality to cover other tasks such as tokenization and coreference resolution, providing an end user with agile research experience. All resources including the source codes, documentation, and pre-trained models are publicly available at


page 1

page 2

page 3

page 4


COMBO: State-of-the-Art Morphosyntactic Analysis

We introduce COMBO - a fully neural NLP system for accurate part-of-spee...

N-LTP: A Open-source Neural Chinese Language Technology Platform with Pretrained Models

We introduce N-LTP, an open-source Python Chinese natural language proce...

A Focused Study to Compare Arabic Pre-training Models on Newswire IE Tasks

The Arabic language is a morphological rich language, posing many challe...

ESPnet2-TTS: Extending the Edge of TTS Research

This paper describes ESPnet2-TTS, an end-to-end text-to-speech (E2E-TTS)...

WeKws: A production first small-footprint end-to-end Keyword Spotting Toolkit

Keyword spotting (KWS) enables speech-based user interaction and gradual...

SciWING – A Software Toolkit for Scientific Document Processing

We introduce SciWING, an open-source software toolkit which provides acc...

1 Introduction

The open source community has contributed many natural language processing (NLP) toolkits to the research and industry organizations, lowering the barrier of entry to access computational structures. Despite their wide usage, many NLP toolkits suffer from a major limitation that their architectures are bounded by the pipeline design

Manning et al. (2014); Straka and Straková (2017); Gardner et al. (2018); Akbik et al. (2019); Qi et al. (2020), that leads to error propagation, large memory consumption, and high latency.Although toolkits from the industry, such as spaCy, have started to exploit Multi-Task Learning (MTL), a lot of key components like semantic role labeling, constituency parsing and Abstract Meaning Representation parsing are not generally available. Additionally, they lack the ability to serve massive concurrent requests from the web, due to the lack of an efficient requests batching and corresponding multi-processing mechanism.

Figure 1: The overview of the MTL framework in ELIT. ELIT takes input as either a plain text or a tokenized document, encodes it with a transformer encoder, and decodes multiple NLP tasks in parallel. The outputs ofNLP decoders are aggregated and presented as a JSON structure to the end user. Additionally, ELIT features a RESTful API for agile development.

In the face of these challenges, we introduce ELIT, an efficient yet accurate and fast NLP toolkit supporting the largest number of core NLP tasks with the boost of state-of-the-art transformer encoders.Compared to existing popular NLP toolkits, ELIT excels with the following outstanding advantages:

Figure 2: A sliding window example for the sentence “Emory NLP is a research lab in Atlanta .” with the maximum window size . In each window, the tokens highlighted in green will be used in the ultimate outputs.
  • MTL Framework ELIT is powered by an efficient MTL framework to accommodate many NLP tasks, spanning from surface tasks, syntactic tasks to semantic tasks.

  • State-of-the-Art Performance Backed up by the latest transformer encoder and decoders, ELIT establishes state-of-the-art on most tasks and yields comparable results on the others.

  • Requests Batching for Concurrency In order to scale up its inference, ELIT provides a built-in multi-worker web server featured with an efficient requests batching mechanism.

ELIT embraces the NLP community with a fully open source license and lots of pre-trained models for public download. We hope ELIT can facilitate NLP research and applications, and bring the benefits of the NLP techniques to broader audiences.

2 System Architecture

From the user’s view, ELIT provides 2 sets of APIs: (1) a native multi-learning task framework coupled with a rule-based tokenizer (see Figure 1); (2) a RESTful client interface to the MTL server. In this section, we introduce their architecture designs.

2.1 Architecutre

Multi-Task Learning (MTL)

Given a tokenized sentence (or document), ELIT employs a sub-word tokenizer to further tokenize each token into sub-tokens. These sub-tokens are fed into a fine-tuned Transformer Encoder (TE) to get their contextualized embeddings. For those sentences longer than the maximum input length of the TE, a sliding window routine is invoked to mitigate the issue.

Function sliding_window():
         // stride
       []   // windows
       for  to by  do
Function restore():
         // stride
         // start offset
       []   // restored sequence
       for  to by  do
             if  then
Algorithm 1 Sliding Window

As in Algorithm 1, the sliding_window sub-routine takes input as a list of sub-tokens and slices them into windows with the maximal size , which will be fed into a Transformer Encoder. Then the hidden states are restored using the restore sub-routine to match the original sequence such that in each window the inner parts will be used (Figure 2). Then, average pooling is applied to sub-token embeddings to get the corresponding token embeddings. Finally, the token embeddings are fed into the decoder of every task in parallel. This MTL architecture apportions the cost of TE between decoders, yielding lower overall latency than running TE individually for each of the decoders. It also reduces the training cost and deployment efforts due to its compact structure.


In scenarios with strict latency requirements such as a dialogue system or real-time machine translator, not even the native MTL can respond timely to highly concurrent requests due to the Global Interpreter Lock (GIL) of Python111

We also investigated serving techniques free of GIL such as PyTorch JIT and ONNX. However, these techniques usually convert the model into a symbolic (static) representation, hampering features such as dynamic tasks scheduling and length-based batching. Thus, we leave them for future work.

. To scale up the inference, we also implement a RESTful Client/Server with requests batching in ELIT as illustrated by Figure 3.

Figure 3: System diagram of the ELIT RESTful server.

On the server side, HTTP requests are bucketed into batches according to their arrival time, and predicted concurrently by several worker processes spawned over the CPUs and GPUs. It is possible that sentences from different requests are put into the same batch, as long as the requested tasks are the same. When no worker is idle, a First-In-First-Out (FIFO) queue is used to store the new coming requests. On the client side, the requests batching mechanism is transparent and each user is able to parse their documents in the same way as using a native MTL API exclusively. Note that both APIs share similar semantics and return the same JSON format so that users can easily switch APIs without breaking changes.

2.2 Decoders

Within the MTL architecture, ELIT provides a set of decoders for widely used NLP tasks. Each task is supported by a state-of-the-art decoder. In this section, we will briefly introduce each of them.


ELIT reduces the lemmatization problem to a sequence tagging problem by predicting a tag for each token representing an edit script to transform the token form to its lemma (Chrupała, 2006; Müller et al., 2015; Kondratyuk and Straka, 2019).


For efficiency, a linear layer is used to predict the part-of-speech tags. We acknowledge the effectiveness of character level and case features (Bohnet et al., 2018; Akbik et al., 2018) for POS, however, the improvements to accuracy brought by these features are marginal compared to the latency they introduce in our setting. Thus, we do not integrate them.


A biaffine layer (Dozat and Manning, 2017) is used for the NER task. Different from Yu et al. (2020), we avoid using document level features or the variational BiLSTM for faster decoding speed.


Two biaffine layers (Dozat and Manning, 2017) are used to compute the dependent-head score matrix and label distribution. Then the Chu-Liu/Edmonds algorithm Chu (1965); Edmonds (1967) is applied on the score matrix to decode the maximum spanning tree.


The two-stage CRF (Zhang et al., 2020) decoder is used for CON which is optimized using a tree-structure CRF objective on unlabeled constituents. POS features are avoided to remove the dependency on the POS decoder.


The end-to-end span ranking decoder (He et al., 2018) is used for SRL. Their attention-based span representations are replaced by average pooled embeddings for simplicity.


The graph sequence transduction decoder (He and Choi, 2021) is used for AMR. Linguistic features including POS, NER and LEM embeddings are removed for efficiency.

Separated from the MTL architecture, ELIT also provides two coreference resolution components that perform the traditional document-level decoding (DCR) as well as online decoding (OCR).


We adapt the implementation from Xu and Choi (2020) that is based on the end-to-end coreference resolution system with transformer encoders (Joshi et al., 2019, 2020).


The model takes a current utterance and conversation context as the input, extracting mentions in the current utterance (including singletons), and resolving their coreference with previously predicted mentions from the past history.

3 System Usage

3.1 Mtl Api

The user interface of ELIT is designed to hide the underlying complexity from end users, allowing them for agile developments of NLP models. To fulfill this goal, ELIT packs all sophisticated work required by other NLP toolkits, such as downloading models, using the right class to create an instance, loading and deploying it to GPU where possible, feeding the outputs from one component to another, into the following 3 lines of codes:

import elit
nlp = elit.load(’LEM_POS_NER_DEP_SDP_CON_AMR_EN’)
doc = nlp([’Emory’,’NLP’,’is’,’in’,’Atlanta’])

As indicated by the identifier passed to the load call, LEM_POS_NER_DEP_SDP_CON_AMR_EN takes input sentences and performs LEM, POS, NER, DEP, SDP, CON and AMR jointly. The interface for coreference resolution is the same as MTL except that the order of input sentences is expected to be consistent with the document. The following snippet loads the coreference model with SpanBERTLarge and perform predictions for DCR:

import elit
nlp = elit.load(’DOC_COREF_SPANBERT_LARGE_EN’)
doc = nlp([[’Emory’,’NLP’,’is’,’in’,’Atlanta’],

3.2 RESTful API

In a more common setting where multiple users require their documents to be parsed by ELIT, its web server can be set up to handle concurrent requests efficiently:

!elit serve
import elit
nlp = Client(’’)
doc = nlp([’Emory NLP is in Atlanta’])

Note that we design the native MTL API and the RESTful API such that they share the same semantics and can be used interchangeably. The only subtle difference is that, the native MTL accepts only tokenized sentences whereas the RESTful API additionally accepts raw text which will be tokenized, segmented to sentences on the server side.

3.3 Output Format

In both MTL and RESTful APIs, a Document instance will be returned to the user, which is a Python dict storing all annotation results. For each task, its annotations are associated to a key indicated by its task name, e.g., the above doc will have the following structure:

  "tok": [
    ["Emory", "NLP", "is", "in", "Atlanta"]
"lem": [
    ["emory", "nlp", "be", "in", "atlanta"]
  "pos": [
    ["NNP", "NNP", "VBZ", "IN", "NNP"]
  "ner": [
    [["ORG", 0, 2, "Emory NLP"], ["GPE", 4, 5, "Atlanta"]]
  "srl": [
    [[["ARG1", 0, 2, "Emory NLP"], ["PRED", 2, 3, "is"], ["ARG2", 3, 5, "in Atlanta"]]]
  "dep": [
    [[1, "com"], [3, "nsbj"], [3, "cop"], [-1, "root"], [3, "obj"]]
  "con": [
    ["TOP", [["S", [["NP", [["NNP", ["Emory"]], ["NNP", ["NLP"]]]], ["VP", [["VBZ", ["is"]], ["PP", [["IN", ["in"]], ["NP", [["NNP", ["Atlanta"]]]]]]]]]]]]
  "amr": [
    [["c0", "ARG1", "c1"], ["c0", "ARG2", "c2"], ["c0", "instance", "be-located-at-91"], ["c1", "instance", "emory nlp"], ["c2", "instance", "atlanta"]]
  • tok stores the surface form of each token.

  • lem stores the lemmatization of each token.

  • pos stores the part-of-speech tag of each token. In this component, the Penn Treebank Part-of-speech tags Santorini (1990) are used.

  • ner stores the (type, start, end, form) of each entity. In this component, the OntoNotes 5 NER annotations Weischedel et al. (2013) are used.

  • srl stores the (role, start, end, form) of the predicates and arguments corresponding to each flattened predicate-argument structure. In this component, the OntoNotes 5 SRL annotations Weischedel et al. (2013) are used with an additional role PRED indicating the predicate.

  • dep stores the (head, relation) of each token, with the offset starting from -1 (ROOT). In this component, the primary dependency of Deep Dependency Graph Representation Choi (2017) is used. The full representation with secondary dependencies are provided in another component.

  • con stores the constituency trees, specifically (label, child-constituents) for the non-terminal constituents and form for the terminals. Note that we designed a nested list representation with the round brackets replaced by square brackets to avoid ambiguity and make it compatible with JSON. When not being printed out, the Document class will convert our nested list structure to the conventional bracketed tree.

  • amr stores the logical triples of Abstract Meaning Representation in the format of (source, relation, target). Note that the Document class will convert it to Penman format Goodman (2020) when being accessed through code.

For the DCR example shown before, the following output should be generated:

  "dcr": [
    [[0, 0, 2, "Emory NLP"], [1, 0, 1, "It"]]]

where dcr contains a list of clusters, and each cluster consists of spans referring to the same entity, in the format of (sentence-index, token-start, token-end, text). The input and output formats for OCR are shown in details on the GitHub page 222

3.4 Training

In ELIT, the APIs to train a new model are as strait-forward as the inference APIs. Instead of resorting to a configuration file, we opt for the native Python APIs which offer built-in documentations of each parameter and the necessary type checking. Sticking to the Python APIs requires no extra efforts to learn another language to write config files. These benefits are illustrated in the following code snippet, which demonstrates how to train a joint NER and DEP component with easy integration with the RoBERTa encoder:

tasks = {
    ’ner’: BiaffineNamedEntityRecognition(
        SortingSamplerBuilder(batch_size=128, batch_max_tokens=12800),
    ’dep’: BiaffineDependencyParsing(
        SortingSamplerBuilder(batch_size=128, batch_max_tokens=12800),
mtl = MultiTaskLearning()
        word_dropout=[.2, ’unk’],

For each task, its popular training file format (e.g., tsv, CONLL-U) is well retained such that users can make use of the readily available open access datasets on the web. Note that our SortingSamplerBuilder builds a sampler which groups sequences similar in length into the same batch such that training and inference will be significantly accelerated. The training for coreference resolution models are performed outside ELIT, for which users can refer to our documentation.

4 Performance Evaluation

4.1 Datasets and Metrics


The ELIT models are trained on a mixture of OntoNotes 5 (Weischedel et al., 2013), BOLT English Treebanks (Song et al., 2019; Tracey et al., 2019), THYME/SHARP/MiPACQ Treebanks (Albright et al., 2013), English Web Treebank (Bies et al., 2012), Questionbank (Judge et al., 2006) and AMR 3.0 dataset (Knight et al., 2020). Batches of NLP tasks are mixed together so that even if a corpus offers no annotation for some tasks, it can still be exploited by ELIT. For LEM, POS, NER, DEP, CON and SRL, the mixed corpora are split into training, development and test set with a 8:1:1 ratio. For AMR, the standard splits are used.


The following evaluation are used for each task - LEM and POS: accuracy, NER: span-level labeled F1, DEP: labeled attachment score, CON: constituency-level labeled F1, SRL: micro-averaged F1 of predicate-argument-label triples, AMR: Smatch (Cai and Knight, 2013), DCR/OCR: Averaged F1 of MUC, B3, and CEAF.

Model F1
Stanza Qi et al. (2020) 88.8
Flair Akbik et al. (2019) 89.7
spaCy RoBERTa 89.8
biaffine w/o doc Yu et al. (2020) 89.8
BART-NER Yan et al. (2021) 90.4
ELIT BERT-large 89.7
(a) NER performance on OntoNotes 5. Scores with are from experiments by Yan et al. (2021) excluding document context.
Model F1
e2e Li et al. (2019) 83.1
SpanRel Jiang et al. (2020) 82.4
ELIT BERT-large 84.0
(b) Span-based end-to-end SRL results on CoNLL’12.
Model F1
SpanBERT (Joshi et al., 2020) 79.6
Higher-Order Xu and Choi (2020) 80.2
ELIT SpanBERT-large 80.2
(c) Coreference resolution (DCR) results on CoNLL’12.
Table 1: Performance on OntoNotes 5, CoNLL-2012.

4.2 Training

MTL-RoBERTa 99.65 98.08 89.01 91.21 90.78 77.78 73.9 264
Table 2: Performance and speed of the MTL-RoBERTa in ELIT.

All hyper-parameters are tuned on the development set of OntoNotes 5 and applied to the mixed datasets. For the pre-trained transformer encoder, Electra base Clark et al. (2020) and RoBERTa base Liu et al. (2019) are compared and we opt for RoBERTa due to its better results on more tasks.

4.3 Accuracies and Speed

The ELIT MTL model is trained and evaluated using a single TITAN RTX GPU for 30 epochs, which take roughly 36 hours. The scores on test set and decoding speed are listed in Table 2.

To make relatively fair comparisons with the existing toolkits and models from published work, we train single-task learning models on OntoNotes 5 and CoNLL-2012 as well with the standard split for NER, SRL and Coref. The results are shown in Table 1. We find that ELIT achieved either higher or close performance in comparison to existing toolkits or models.

5 Conclusion and Future Work

We introduced ELIT, the Emory Language and Information Toolkit providing the largest number of NLP tasks within an efficient MTL framework. Our APIs demonstrate the usability of ELIT within several function calls. The interchangeable RESTful API further extends ELIT with extra features and enables it to serve large-scale concurrent requests. For future work, we plan the followings:

  • Train multilingual models since the design of ELIT is language agonistic.

  • Integrate statistical tokenizers to replace the rule-based one and to support non-tokenized languages like Korean and Chinese.

  • Exploit model distillation and compression techniques to reduce the size of transformer encoders.


  • A. Akbik, T. Bergmann, D. Blythe, K. Rasul, S. Schweter, and R. Vollgraf (2019) FLAIR: an easy-to-use framework for state-of-the-art nlp. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pp. 54–59. Cited by: §1, 0(a).
  • A. Akbik, D. Blythe, and R. Vollgraf (2018) Contextual string embeddings for sequence labeling. In Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, pp. 1638–1649. External Links: Link Cited by: §2.2.
  • D. Albright, A. Lanfranchi, A. Fredriksen, W. F. Styler, C. Warner, J. D. Hwang, J. D. Choi, D. Dligach, R. D. Nielsen, J. Martin, et al. (2013) Towards comprehensive syntactic and semantic annotations of the clinical narrative. Journal of the American Medical Informatics Association 20 (5), pp. 922–930. Cited by: §4.1.
  • A. Bies, J. Mott, C. Warner, and S. Kulick (2012) English web treebank. Linguistic Data Consortium, Philadelphia, PA. Cited by: §4.1.
  • B. Bohnet, R. McDonald, G. Simões, D. Andor, E. Pitler, and J. Maynez (2018) Morphosyntactic tagging with a meta-BiLSTM model over context sensitive token encodings. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, pp. 2642–2652. External Links: Document, Link Cited by: §2.2.
  • S. Cai and K. Knight (2013)

    Smatch: an evaluation metric for semantic feature structures

    In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 748–752. Cited by: §4.1.
  • J. Choi (2017) Deep dependency graph conversion in english.. In TLT, pp. 35–62. Cited by: 6th item.
  • G. Chrupała (2006) Simple data-driven context-sensitive lemmatization. Cited by: §2.2.
  • Y. Chu (1965) On the shortest arborescence of a directed graph. Scientia Sinica 14, pp. 1396–1400. Cited by: §2.2.
  • K. Clark, M. Luong, Q. V. Le, and C. D. Manning (2020) Electra: pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555. Cited by: §4.2.
  • T. Dozat and C. D. Manning (2017) Deep Biaffine Attention for Neural Dependency Parsing. In Proceedings of the 5th International Conference on Learning Representations, ICLR’17. External Links: Link Cited by: §2.2, §2.2.
  • J. Edmonds (1967) Optimum branchings. Journal of Research of the national Bureau of Standards B 71 (4), pp. 233–240. Cited by: §2.2.
  • M. Gardner, J. Grus, M. Neumann, O. Tafjord, P. Dasigi, N. F. Liu, M. Peters, M. Schmitz, and L. Zettlemoyer (2018) AllenNLP: a deep semantic natural language processing platform. In Proceedings of Workshop for NLP Open Source Software (NLP-OSS), Melbourne, Australia, pp. 1–6. External Links: Document, Link Cited by: §1.
  • M. W. Goodman (2020) Penman: an open-source library and tool for amr graphs. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 312–319. Cited by: 8th item.
  • H. He and J. D. Choi (2021) Levi graph amr parser using heterogeneous attention. In Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies, Cited by: §2.2.
  • L. He, K. Lee, O. Levy, and L. Zettlemoyer (2018) Jointly Predicting Predicates and Arguments in Neural Semantic Role Labeling. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Melbourne, Australia, pp. 364–369. External Links: Document, Link Cited by: §2.2.
  • Z. Jiang, W. Xu, J. Araki, and G. Neubig (2020) Generalizing natural language analysis through span-relation representations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, pp. 2120–2133. External Links: Document, Link Cited by: 0(b).
  • M. Joshi, D. Chen, Y. Liu, D. S. Weld, L. Zettlemoyer, and O. Levy (2020) SpanBERT: improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics 8, pp. 64–77. External Links: Document, Link Cited by: §2.2, 0(c).
  • M. Joshi, O. Levy, L. Zettlemoyer, and D. Weld (2019) BERT for coreference resolution: baselines and analysis. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 5803–5808. External Links: Document, Link Cited by: §2.2.
  • J. Judge, A. Cahill, and J. Van Genabith (2006) Questionbank: creating a corpus of parse-annotated questions. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 497–504. Cited by: §4.1.
  • K. Knight, B. Badarau, L. Banarescu, C. Bonial, M. Bardocz, K. Griffitt, U. Hermjakob, D. Marcu, M. Palmer, T. O’Gorman, et al. (2020) Abstract meaning representation (amr) annotation release 3.0. Technical report Technical Report LDC2020T02, Linguistic Data Consortium, Philadelphia, PA, June. Cited by: §4.1.
  • D. Kondratyuk and M. Straka (2019) 75 languages, 1 model: parsing Universal Dependencies universally. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 2779–2795. External Links: Document, Link Cited by: §2.2.
  • Z. Li, S. He, H. Zhao, Y. Zhang, Z. Zhang, X. Zhou, and X. Zhou (2019) Dependency or span, end-to-end uniform semantic role labeling. In

    Proceedings of the AAAI Conference on Artificial Intelligence

    Vol. 33, pp. 6730–6737. Cited by: 0(b).
  • Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov (2019) Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692. Cited by: §4.2.
  • C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky (2014) The stanford corenlp natural language processing toolkit. In Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations, pp. 55–60. Cited by: §1.
  • T. Müller, R. Cotterell, A. Fraser, and H. Schütze (2015) Joint lemmatization and morphological tagging with lemming. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 2268–2274. Cited by: §2.2.
  • P. Qi, Y. Zhang, Y. Zhang, J. Bolton, and C. D. Manning (2020) Stanza: a python natural language processing toolkit for many human languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Online, pp. 101–108. External Links: Document, Link Cited by: §1, 0(a).
  • B. Santorini (1990) Part-of-speech tagging guidelines for the penn treebank project (3rd revision). Technical Reports (CIS), pp. 570. Cited by: 3rd item.
  • Z. Song, D. Fore, S. Strassel, H. Lee, and J. Wright (2019) BOLT english sms/chat. Cited by: §4.1.
  • M. Straka and J. Straková (2017) Tokenizing, pos tagging, lemmatizing and parsing ud 2.0 with udpipe. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 88–99. Cited by: §1.
  • J. Tracey, H. Lee, and S. Strassel (2019) BOLT english discussion forums. Cited by: §4.1.
  • R. Weischedel, M. Palmer, M. Marcus, E. Hovy, S. Pradhan, L. Ramshaw, N. Xue, A. Taylor, J. Kaufman, M. Franchini, et al. (2013) Ontonotes release 5.0 ldc2013t19. Linguistic Data Consortium, Philadelphia, PA. Cited by: 4th item, 5th item, §4.1.
  • L. Xu and J. D. Choi (2020) Revealing the myth of higher-order inference in coreference resolution. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, pp. 8527–8533. External Links: Document, Link Cited by: §2.2, 0(c).
  • H. Yan, T. Gui, J. Dai, Q. Guo, Z. Zhang, and X. Qiu (2021) A unified generative framework for various ner subtasks. arXiv preprint arXiv:2106.01223. Cited by: 0(a).
  • J. Yu, B. Bohnet, and M. Poesio (2020) Named entity recognition as dependency parsing. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, pp. 6470–6476. External Links: Document, Link Cited by: §2.2, 0(a).
  • Y. Zhang, H. Zhou, and Z. Li (2020) Fast and Accurate Neural CRF Constituency Parsing. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, C. Bessiere (Ed.), pp. 4046–4053. Note: Main track External Links: Document, Link Cited by: §2.2.