Improving Pre-trained Language Models with Syntactic Dependency Prediction Task for Chinese Semantic Error Recognition

04/15/2022
by   Bo Sun, et al.
0

Existing Chinese text error detection mainly focuses on spelling and simple grammatical errors. These errors have been studied extensively and are relatively simple for humans. On the contrary, Chinese semantic errors are understudied and more complex that humans cannot easily recognize. The task of this paper is Chinese Semantic Error Recognition (CSER), a binary classification task to determine whether a sentence contains semantic errors. The current research has no effective method to solve this task. In this paper, we inherit the model structure of BERT and design several syntax-related pre-training tasks so that the model can learn syntactic knowledge. Our pre-training tasks consider both the directionality of the dependency structure and the diversity of the dependency relationship. Due to the lack of a published dataset for CSER, we build a high-quality dataset for CSER for the first time named Corpus of Chinese Linguistic Semantic Acceptability (CoCLSA). The experimental results on the CoCLSA show that our methods outperform universal pre-trained models and syntax-infused models.

READ FULL TEXT
research
05/09/2023

CSED: A Chinese Semantic Error Diagnosis Corpus

Recently, much Chinese text error correction work has focused on Chinese...
research
01/09/2021

Learning Better Sentence Representation with Syntax Information

Sentence semantic understanding is a key topic in the field of natural l...
research
11/04/2020

Chinese Grammatical Correction Using BERT-based Pre-trained Model

In recent years, pre-trained models have been extensively studied, and s...
research
10/19/2022

Improving Chinese Story Generation via Awareness of Syntactic Dependencies and Semantics

Story generation aims to generate a long narrative conditioned on a give...
research
03/23/2018

Automated Evaluation of Out-of-Context Errors

We present a new approach to evaluate computational models for the task ...
research
09/08/2023

GLS-CSC: A Simple but Effective Strategy to Mitigate Chinese STM Models' Over-Reliance on Superficial Clue

Pre-trained models have achieved success in Chinese Short Text Matching ...
research
10/16/2022

Improving Semantic Matching through Dependency-Enhanced Pre-trained Model with Adaptive Fusion

Transformer-based pre-trained models like BERT have achieved great progr...

Please sign up or login with your details

Forgot password? Click here to reset