Stanza: A Python Natural Language Processing Toolkit for Many Human Languages

03/16/2020
by   Peng Qi, et al.
0

We introduce Stanza, an open-source Python natural language processing toolkit supporting 66 human languages. Compared to existing widely used toolkits, Stanza features a language-agnostic fully neural pipeline for text analysis, including tokenization, multi-word token expansion, lemmatization, part-of-speech and morphological feature tagging, dependency parsing, and named entity recognition. We have trained Stanza on a total of 112 datasets, including the Universal Dependencies treebanks and other multilingual corpora, and show that the same neural architecture generalizes well and achieves competitive performance on all languages tested. Additionally, Stanza includes a native Python interface to the widely used Java Stanford CoreNLP software, which further extends its functionalities to cover other tasks such as coreference resolution and relation extraction. Source code, documentation, and pretrained models for 66 languages are available at https://stanfordnlp.github.io/stanza.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/09/2021

Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing

We introduce Trankit, a light-weight Transformer-based Toolkit for multi...
research
09/24/2020

N-LTP: A Open-source Neural Chinese Language Technology Platform with Pretrained Models

We introduce N-LTP, an open-source Python Chinese natural language proce...
research
01/06/2022

HuSpaCy: an industrial-strength Hungarian natural language processing toolkit

Although there are a couple of open-source language processing pipelines...
research
09/18/2020

fastHan: A BERT-based Joint Many-Task Toolkit for Chinese NLP

We present fastHan, an open-source toolkit for four basic tasks in Chine...
research
12/01/2015

Multilingual Language Processing From Bytes

We describe an LSTM-based model which we call Byte-to-Span (BTS) that re...
research
01/10/2022

Informal Persian Universal Dependency Treebank

This paper presents the phonological, morphological, and syntactic disti...
research
02/22/2021

Subword Pooling Makes a Difference

Contextual word-representations became a standard in modern natural lang...

Please sign up or login with your details

Forgot password? Click here to reset