Corpus-Based Paraphrase Detection Experiments and Review

05/31/2021
by   Tedo Vrbanec, et al.
0

Paraphrase detection is important for a number of applications, including plagiarism detection, authorship attribution, question answering, text summarization, text mining in general, etc. In this paper, we give a performance overview of various types of corpus-based models, especially deep learning (DL) models, with the task of paraphrase detection. We report the results of eight models (LSI, TF-IDF, Word2Vec, Doc2Vec, GloVe, FastText, ELMO, and USE) evaluated on three different public available corpora: Microsoft Research Paraphrase Corpus, Clough and Stevenson and Webis Crowd Paraphrase Corpus 2011. Through a great number of experiments, we decided on the most appropriate approaches for text pre-processing: hyper-parameters, sub-model selection-where they exist (e.g., Skipgram vs. CBOW), distance measures, and semantic similarity/paraphrase detection threshold. Our findings and those of other researchers who have used deep learning models show that DL models are very competitive with traditional state-of-the-art approaches and have potential that should be further developed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/21/2018

Expert Finding in Community Question Answering: A Review

The rapid development recently of Community Question Answering (CQA) sat...
research
12/15/2022

An Empirical Study of Deep Learning Models for Vulnerability Detection

Deep learning (DL) models of code have recently reported great progress ...
research
04/06/2020

Deep Learning Based Text Classification: A Comprehensive Review

Deep learning based models have surpassed classical machine learning bas...
research
08/25/2023

Knowledge-Based Version Incompatibility Detection for Deep Learning

Version incompatibility issues are rampant when reusing or reproducing d...
research
08/22/2020

FAT ALBERT: Finding Answers in Large Texts using Semantic Similarity Attention Layer based on BERT

Machine based text comprehension has always been a significant research ...
research
01/30/2022

Recognition of Implicit Geographic Movement in Text

Analyzing the geographic movement of humans, animals, and other phenomen...
research
09/26/2020

Techniques to Improve Q A Accuracy with Transformer-based models on Large Complex Documents

This paper discusses the effectiveness of various text processing techni...

Please sign up or login with your details

Forgot password? Click here to reset