Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing

01/09/2021
by   Minh Nguyen, et al.
0

We introduce Trankit, a light-weight Transformer-based Toolkit for multilingual Natural Language Processing (NLP). It provides a trainable pipeline for fundamental NLP tasks over 100 languages, and 90 pretrained pipelines for 56 languages. Built on a state-of-the-art pretrained language model, Trankit significantly outperforms prior multilingual NLP pipelines over sentence segmentation, part-of-speech tagging, morphological feature tagging, and dependency parsing while maintaining competitive performance for tokenization, multi-word token expansion, and lemmatization over 90 Universal Dependencies treebanks. Despite the use of a large pretrained transformer, our toolkit is still efficient in memory usage and speed. This is achieved by our novel plug-and-play mechanism with Adapters where a multilingual pretrained transformer is shared across pipelines for different languages. Our toolkit along with pretrained models and code are publicly available at: https://github.com/nlp-uoregon/trankit. A demo website for our toolkit is also available at: http://nlp.uoregon.edu/trankit. Finally, we create a demo video for Trankit at: https://youtu.be/q0KGP3zGjGc.

READ FULL TEXT
research
03/16/2020

Stanza: A Python Natural Language Processing Toolkit for Many Human Languages

We introduce Stanza, an open-source Python natural language processing t...
research
02/16/2022

FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction

This paper presents FAMIE, a comprehensive and efficient active learning...
research
11/03/2022

Exploring the State-of-the-Art Language Modeling Methods and Data Augmentation Techniques for Multilingual Clause-Level Morphology

This paper describes the KUIS-AI NLP team's submission for the 1^st Shar...
research
11/16/2020

NLPGym – A toolkit for evaluating RL agents on Natural Language Processing Tasks

Reinforcement learning (RL) has recently shown impressive performance in...
research
05/25/2022

A Simple and Unified Tagging Model with Priming for Relational Structure Predictions

Relational structure extraction covers a wide range of tasks and plays a...
research
08/24/2023

Advancing Hungarian Text Processing with HuSpaCy: Efficient and Accurate NLP Pipelines

This paper presents a set of industrial-grade text processing models for...
research
09/08/2021

ELIT: Emory Language and Information Toolkit

We introduce ELIT, the Emory Language and Information Toolkit, which is ...

Please sign up or login with your details

Forgot password? Click here to reset