Chinese Lexical Analysis with Deep Bi-GRU-CRF Network

07/05/2018
by   Zhenyu Jiao, et al.
0

Lexical analysis is believed to be a crucial step towards natural language understanding and has been widely studied. Recent years, end-to-end lexical analysis models with recurrent neural networks have gained increasing attention. In this report, we introduce a deep Bi-GRU-CRF network that jointly models word segmentation, part-of-speech tagging and named entity recognition tasks. We trained the model using several massive corpus pre-tagged by our best Chinese lexical analysis tool, together with a small, yet high-quality human annotated corpus. We conducted balanced sampling between different corpora to guarantee the influence of human annotations, and fine-tune the CRF decoding layer regularly during the training progress. As evaluated by linguistic experts, the model achieved a 95.5 relative error reduction over our (previously) best Chinese lexical analysis tool. The model is computationally efficient, achieving the speed of 2.3K characters per second with one thread.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/31/2019

NEZHA: Neural Contextualized Representation for Chinese Language Understanding

The pre-trained language models have achieved great successes in various...
research
04/26/2019

Neural Chinese Named Entity Recognition via CNN-LSTM-CRF and Joint Training with Word Segmentation

Chinese named entity recognition (CNER) is an important task in Chinese ...
research
03/01/2016

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

We study the segmental recurrent neural network for end-to-end acoustic ...
research
12/10/2020

Segmenting Natural Language Sentences via Lexical Unit Analysis

In this work, we present Lexical Unit Analysis (LUA), a framework for ge...
research
04/11/2020

End to End Chinese Lexical Fusion Recognition with Sememe Knowledge

In this paper, we present Chinese lexical fusion recognition, a new task...
research
09/08/2023

CSPRD: A Financial Policy Retrieval Dataset for Chinese Stock Market

In recent years, great advances in pre-trained language models (PLMs) ha...
research
01/10/2022

Morphological Analysis of Japanese Hiragana Sentences using the BI-LSTM CRF Model

This study proposes a method to develop neural models of the morphologic...

Please sign up or login with your details

Forgot password? Click here to reset