Pre-Training with Whole Word Masking for Chinese BERT

06/19/2019
by   Yiming Cui, et al.
0

Bidirectional Encoder Representations from Transformers (BERT) has shown marvelous improvements across various NLP tasks. Recently, an upgraded version of BERT has been released with Whole Word Masking (WWM), which mitigate the drawbacks of masking partial WordPiece tokens in pre-training BERT. In this technical report, we adapt whole word masking in Chinese text, that masking the whole word instead of masking Chinese characters, which could bring another challenge in Masked Language Model (MLM) pre-training task. The model was trained on the latest Chinese Wikipedia dump. We aim to provide easy extensibility and better performance for Chinese BERT without changing any neural architecture or even hyper-parameters. The model is verified on various NLP tasks, across sentence-level to document-level, including sentiment classification (ChnSentiCorp, Sina Weibo), named entity recognition (People Daily, MSRA-NER), natural language inference (XNLI), sentence pair matching (LCQMC, BQ Corpus), and machine reading comprehension (CMRC 2018, DRCD, CAIL RC). Experimental results on these datasets show that the whole word masking could bring another significant gain. Moreover, we also examine the effectiveness of Chinese pre-trained models: BERT, ERNIE, BERT-wwm. We release the pre-trained model (both TensorFlow and PyTorch) on GitHub: https://github.com/ymcui/Chinese-BERT-wwm

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2020

Revisiting Pre-Trained Models for Chinese Natural Language Processing

Bidirectional Encoder Representations from Transformers (BERT) has shown...
research
08/31/2019

NEZHA: Neural Contextualized Representation for Chinese Language Understanding

The pre-trained language models have achieved great successes in various...
research
11/21/2022

TCBERT: A Technical Report for Chinese Topic Classification BERT

Bidirectional Encoder Representations from Transformers or BERT <cit.> h...
research
03/12/2022

MarkBERT: Marking Word Boundaries Improves Chinese BERT

We present a Chinese BERT model dubbed MarkBERT that uses word informati...
research
06/06/2021

Transient Chaos in BERT

Language is an outcome of our complex and dynamic human-interactions and...
research
05/23/2023

All Roads Lead to Rome? Exploring the Invariance of Transformers' Representations

Transformer models bring propelling advances in various NLP tasks, thus ...
research
01/22/2021

Censorship of Online Encyclopedias: Implications for NLP Models

While artificial intelligence provides the backbone for many tools peopl...

Please sign up or login with your details

Forgot password? Click here to reset