A Multi-cascaded Model with Data Augmentation for Enhanced Paraphrase Detection in Short Texts

12/27/2019
by   Muhammad Haroon Shakeel, et al.
0

Paraphrase detection is an important task in text analytics with numerous applications such as plagiarism detection, duplicate question identification, and enhanced customer support helpdesks. Deep models have been proposed for representing and classifying paraphrases. These models, however, require large quantities of human-labeled data, which is expensive to obtain. In this work, we present a data augmentation strategy and a multi-cascaded model for improved paraphrase detection in short texts. Our data augmentation strategy considers the notions of paraphrases and non-paraphrases as binary relations over the set of texts. Subsequently, it uses graph theoretic concepts to efficiently generate additional paraphrase and non-paraphrase pairs in a sound manner. Our multi-cascaded model employs three supervised feature learners (cascades) based on CNN and LSTM networks with and without soft-attention. The learned features, together with hand-crafted linguistic features, are then forwarded to a discriminator network for final classification. Our model is both wide and deep and provides greater robustness across clean and noisy short texts. We evaluate our approach on three benchmark datasets and show that it produces a comparable or state-of-the-art performance on all three.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/26/2022

Data Augmentation for Dementia Detection in Spoken Language

Dementia is a growing problem as our society ages, and detection methods...
research
08/15/2016

Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification

The ability of deep convolutional neural networks (CNN) to learn discrim...
research
05/16/2023

Data Augmentation for Conflict and Duplicate Detection in Software Engineering Sentence Pairs

This paper explores the use of text data augmentation techniques to enha...
research
10/21/2020

KnowDis: Knowledge Enhanced Data Augmentation for Event Causality Detection via Distant Supervision

Modern models of event causality detection (ECD) are mainly based on sup...
research
01/20/2023

Data Augmentation for Modeling Human Personality: The Dexter Machine

Modeling human personality is important for several AI challenges, from ...
research
09/27/2021

Optimized Automated Cardiac MR Scar Quantification with GAN-Based Data Augmentation

Background: The clinical utility of late gadolinium enhancement (LGE) ca...

Please sign up or login with your details

Forgot password? Click here to reset