PARADE: A New Dataset for Paraphrase Identification Requiring Computer Science Domain Knowledge

10/08/2020
by   Yun He, et al.
0

We present a new benchmark dataset called PARADE for paraphrase identification that requires specialized domain knowledge. PARADE contains paraphrases that overlap very little at the lexical and syntactic level but are semantically equivalent based on computer science domain knowledge, as well as non-paraphrases that overlap greatly at the lexical and syntactic level but are not semantically equivalent based on this domain knowledge. Experiments show that both state-of-the-art neural models and non-expert human annotators have poor performance on PARADE. For example, BERT after fine-tuning achieves an F1 score of 0.709, which is much lower than its performance on other paraphrase identification datasets. PARADE can serve as a resource for researchers interested in testing models that incorporate domain knowledge. We make our data and code freely available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2020

MULTISEM at SemEval-2020 Task 3: Fine-tuning BERT for Lexical Meaning

We present the MULTISEM systems submitted to SemEval 2020 Task 3: Graded...
research
10/31/2022

CorrLoss: Integrating Co-Occurrence Domain Knowledge for Affect Recognition

Neural networks are widely adopted, yet the integration of domain knowle...
research
11/15/2018

Nudging Neural Conversational Model with Domain Knowledge

Neural conversation models are attractive because one can train a model ...
research
08/06/2019

Clustering of Deep Contextualized Representations for Summarization of Biomedical Texts

In recent years, summarizers that incorporate domain knowledge into the ...
research
06/14/2021

Improving Paraphrase Detection with the Adversarial Paraphrasing Task

If two sentences have the same meaning, it should follow that they are e...
research
12/12/2012

Generalized Instrumental Variables

This paper concerns the assessment of direct causal effects from a combi...
research
04/26/2002

Qualitative Analysis of Correspondence for Experimental Algorithmics

Correspondence identifies relationships among objects via similarities a...

Please sign up or login with your details

Forgot password? Click here to reset