Log In Sign Up

Pre-trained Model for Chinese Word Segmentation with Meta Learning

by   Zhen Ke, et al.

Recent researches show that pre-trained models such as BERT (Devlin et al., 2019) are beneficial for Chinese Word Segmentation tasks. However, existing approaches usually finetune pre-trained models directly on a separate downstream Chinese Word Segmentation corpus. These recent methods don't fully utilize the prior knowledge of existing segmentation corpora, and don't regard the discrepancy between the pre-training tasks and the downstream Chinese Word Segmentation tasks. In this work, we propose a Pre-Trained Model for Chinese Word Segmentation, which can be abbreviated as PTM-CWS. PTM-CWS model employs a unified architecture for different segmentation criteria, and is pre-trained on a joint multi-criteria corpus with meta learning algorithm. Empirical results show that our PTM-CWS model can utilize the existing prior segmentation knowledge, reduce the discrepancy between the pre-training tasks and the downstream Chinese Word Segmentation tasks, and achieve new state-of-the-art performance on twelve Chinese Word Segmentation corpora.


page 1

page 2

page 3

page 4


Chinese Grammatical Correction Using BERT-based Pre-trained Model

In recent years, pre-trained models have been extensively studied, and s...

Unified Multi-Criteria Chinese Word Segmentation with BERT

Multi-Criteria Chinese Word Segmentation (MCCWS) aims at finding word bo...

RethinkCWS: Is Chinese Word Segmentation a Solved Task?

The performance of the Chinese Word Segmentation (CWS) systems has gradu...

UER: An Open-Source Toolkit for Pre-training Models

Existing works, including ELMO and BERT, have revealed the importance of...

Is word segmentation necessary for Vietnamese sentiment classification?

To the best of our knowledge, this paper made the first attempt to answe...

A Method to Judge the Style of Classical Poetry Based on Pre-trained Model

One of the important topics in the research field of Chinese classical p...