fastHan: A BERT-based Joint Many-Task Toolkit for Chinese NLP

09/18/2020
by   Zhichao Geng, et al.
0

We present fastHan, an open-source toolkit for four basic tasks in Chinese natural language processing: Chinese word segmentation, Part-of-Speech tagging, named entity recognition, and dependency parsing. The kernel of fastHan is a joint many-task model based on a pruned BERT, which uses the first 8 layers in BERT. We also provide a 4-layer base version of model compressed from the 8-layer model. The joint-model is trained and evaluated in 13 corpora of four tasks, yielding near state-of-the-art (SOTA) performance in the dependency parsing task and SOTA performance in the other three tasks. In addition to its small size and excellent performance, fastHan is also very user-friendly. Implemented as a python package, fastHan allows users to easily download and use it. Users can get what they want with one line of code, even if they have little knowledge of deep learning. The project is released on Github.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/24/2020

N-LTP: A Open-source Neural Chinese Language Technology Platform with Pretrained Models

We introduce N-LTP, an open-source Python Chinese natural language proce...
research
01/05/2021

PhoNLP: A joint multi-task learning model for Vietnamese part-of-speech tagging, named entity recognition and dependency parsing

We present the first multi-task learning model – named PhoNLP – for join...
research
01/11/2021

A More Efficient Chinese Named Entity Recognition base on BERT and Syntactic Analysis

We propose a new Named entity recognition (NER) method to effectively ma...
research
05/15/2021

Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter

Lexicon information and pre-trained models, such as BERT, have been comb...
research
03/16/2020

Stanza: A Python Natural Language Processing Toolkit for Many Human Languages

We introduce Stanza, an open-source Python natural language processing t...
research
07/11/2018

UniParse: A universal graph-based parsing toolkit

This paper describes the design and use of the graph-based parsing frame...
research
01/20/2018

Building an Ellipsis-aware Chinese Dependency Treebank for Web Text

Web 2.0 has brought with it numerous user-produced data revealing one's ...

Please sign up or login with your details

Forgot password? Click here to reset