Pre-trained Model for Chinese Word Segmentation with Meta Learning

10/23/2020
by   Zhen Ke, et al.
0

Recent researches show that pre-trained models such as BERT (Devlin et al., 2019) are beneficial for Chinese Word Segmentation tasks. However, existing approaches usually finetune pre-trained models directly on a separate downstream Chinese Word Segmentation corpus. These recent methods don't fully utilize the prior knowledge of existing segmentation corpora, and don't regard the discrepancy between the pre-training tasks and the downstream Chinese Word Segmentation tasks. In this work, we propose a Pre-Trained Model for Chinese Word Segmentation, which can be abbreviated as PTM-CWS. PTM-CWS model employs a unified architecture for different segmentation criteria, and is pre-trained on a joint multi-criteria corpus with meta learning algorithm. Empirical results show that our PTM-CWS model can utilize the existing prior segmentation knowledge, reduce the discrepancy between the pre-training tasks and the downstream Chinese Word Segmentation tasks, and achieve new state-of-the-art performance on twelve Chinese Word Segmentation corpora.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/04/2020

Chinese Grammatical Correction Using BERT-based Pre-trained Model

In recent years, pre-trained models have been extensively studied, and s...
research
04/13/2020

Unified Multi-Criteria Chinese Word Segmentation with BERT

Multi-Criteria Chinese Word Segmentation (MCCWS) aims at finding word bo...
research
11/13/2020

RethinkCWS: Is Chinese Word Segmentation a Solved Task?

The performance of the Chinese Word Segmentation (CWS) systems has gradu...
research
04/03/2023

MiniRBT: A Two-stage Distilled Small Chinese Pre-trained Model

In natural language processing, pre-trained language models have become ...
research
07/22/2020

When Classical Chinese Meets Machine Learning: Explaining the Relative Performances of Word and Sentence Segmentation Tasks

We consider three major text sources about the Tang Dynasty of China in ...
research
08/14/2023

A Novel Ehanced Move Recognition Algorithm Based on Pre-trained Models with Positional Embeddings

The recognition of abstracts is crucial for effectively locating the con...
research
03/27/2023

Meeting Action Item Detection with Regularized Context Modeling

Meetings are increasingly important for collaborations. Action items in ...

Please sign up or login with your details

Forgot password? Click here to reset