BERT Meets Chinese Word Segmentation

09/20/2019
by   Haiqin Yang, et al.
0

Chinese word segmentation (CWS) is a fundamental task for Chinese language understanding. Recently, neural network-based models have attained superior performance in solving the in-domain CWS task. Last year, Bidirectional Encoder Representation from Transformers (BERT), a new language representation model, has been proposed as a backbone model for many natural language tasks and redefined the corresponding performance. The excellent performance of BERT motivates us to apply it to solve the CWS task. By conducting intensive experiments in the benchmark datasets from the second International Chinese Word Segmentation Bake-off, we obtain several keen observations. BERT can slightly improve the performance even when the datasets contain the issue of labeling inconsistency. When applying sufficiently learned features, Softmax, a simpler classifier, can attain the same performance as that of a more complicated classifier, e.g., Conditional Random Field (CRF). The performance of BERT usually increases as the model size increases. The features extracted by BERT can be also applied as good candidates for other neural network models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/01/2021

Span Labeling Approach for Vietnamese and Chinese Word Segmentation

In this paper, we propose a span labeling approach to model n-gram infor...
research
07/11/2018

Neural Chinese Word Segmentation with Dictionary Knowledge

Chinese word segmentation (CWS) is an important task for Chinese NLP. Re...
research
10/15/2020

Does Chinese BERT Encode Word Structure?

Contextualized representations give significantly improved results for a...
research
07/26/2019

Investigating Self-Attention Network for Chinese Word Segmentation

Neural network has become the dominant method for Chinese word segmentat...
research
04/13/2020

Unified Multi-Criteria Chinese Word Segmentation with BERT

Multi-Criteria Chinese Word Segmentation (MCCWS) aims at finding word bo...
research
03/11/2019

Toward Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning

The ambiguous annotation criteria bring into the divergence of Chinese W...
research
04/09/2021

BERT-based Chinese Text Classification for Emergency Domain with a Novel Loss Function

This paper proposes an automatic Chinese text categorization method for ...

Please sign up or login with your details

Forgot password? Click here to reset