SDCL: Self-Distillation Contrastive Learning for Chinese Spell Checking

10/31/2022
by   Xiaotian Zhang, et al.
0

Due to the ambiguity of homophones, Chinese Spell Checking (CSC) has widespread applications. Existing systems typically utilize BERT for text encoding. However, CSC requires the model to account for both phonetic and graphemic information. To adapt BERT to the CSC task, we propose a token-level self-distillation contrastive learning method. We employ BERT to encode both the corrupted and corresponding correct sentence. Then, we use contrastive learning loss to regularize corrupted tokens' hidden states to be closer to counterparts in the correct sentence. On three CSC datasets, we confirmed our method provides a significant improvement above baselines.

READ FULL TEXT
research
03/11/2022

A Sentence is Worth 128 Pseudo Tokens: A Semantic-Aware Contrastive Learning Framework for Sentence Embeddings

Contrastive learning has shown great potential in unsupervised sentence ...
research
05/16/2020

CERT: Contrastive Self-supervised Learning for Language Understanding

Pretrained language models such as BERT, GPT have shown great effectiven...
research
11/04/2020

A BERT-based Dual Embedding Model for Chinese Idiom Prediction

Chinese idioms are special fixed phrases usually derived from ancient st...
research
10/25/2022

A Chinese Spelling Check Framework Based on Reverse Contrastive Learning

Chinese spelling check is a task to detect and correct spelling mistakes...
research
07/04/2023

Mitigating the Learning Bias towards Repetition by Self-Contrastive Training for Open-Ended Generation

Despite the huge progress in myriad generation tasks, pretrained languag...
research
12/18/2022

On Isotropy and Learning Dynamics of Contrastive-based Sentence Representation Learning

Incorporating contrastive learning objectives in sentence representation...

Please sign up or login with your details

Forgot password? Click here to reset