MuCPAD: A Multi-Domain Chinese Predicate-Argument Dataset

05/13/2022
by   Yahui Liu, et al.
6

During the past decade, neural network models have made tremendous progress on in-domain semantic role labeling (SRL). However, performance drops dramatically under the out-of-domain setting. In order to facilitate research on cross-domain SRL, this paper presents MuCPAD, a multi-domain Chinese predicate-argument dataset, which consists of 30,897 sentences and 92,051 predicates from six different domains. MuCPAD exhibits three important features. 1) Based on a frame-free annotation methodology, we avoid writing complex frames for new predicates. 2) We explicitly annotate omitted core arguments to recover more complete semantic structure, considering that omission of content words is ubiquitous in multi-domain Chinese texts. 3) We compile 53 pages of annotation guidelines and adopt strict double annotation for improving data quality. This paper describes in detail the annotation methodology and annotation process of MuCPAD, and presents in-depth data analysis. We also give benchmark results on cross-domain SRL based on MuCPAD.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/16/2020

Coupling Distant Annotation and Adversarial Training for Cross-Domain Chinese Word Segmentation

Fully supervised neural approaches have achieved significant progress in...
research
02/27/2020

CrossWOZ: A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset

To advance multi-domain (cross-domain) dialogue modeling as well as alle...
research
08/17/2023

Is Argument Structure of Learner Chinese Understandable: A Corpus-Based Analysis

This paper presents a corpus-based analysis of argument structure errors...
research
05/25/2023

NaSGEC: a Multi-Domain Chinese Grammatical Error Correction Dataset from Native Speaker Texts

We introduce NaSGEC, a new dataset to facilitate research on Chinese gra...
research
03/23/2021

Annotation of Chinese Predicate Heads and Relevant Elements

A predicate head is a verbal expression that plays a role as the structu...
research
12/04/2019

Implicit Knowledge in Argumentative Texts: An Annotated Corpus

When speaking or writing, people omit information that seems clear and e...
research
04/17/2021

Learning to Share by Masking the Non-shared for Multi-domain Sentiment Classification

Multi-domain sentiment classification deals with the scenario where labe...

Please sign up or login with your details

Forgot password? Click here to reset