Data Augmentation for Conflict and Duplicate Detection in Software Engineering Sentence Pairs

05/16/2023
by   Garima Malik, et al.
0

This paper explores the use of text data augmentation techniques to enhance conflict and duplicate detection in software engineering tasks through sentence pair classification. The study adapts generic augmentation techniques such as shuffling, back translation, and paraphrasing and proposes new data augmentation techniques such as Noun-Verb Substitution, target-lemma replacement and Actor-Action Substitution for software requirement texts. A comprehensive empirical analysis is conducted on six software text datasets to identify conflicts and duplicates among sentence pairs. The results demonstrate that data augmentation techniques have a significant impact on the performance of all software pair text datasets. On the other hand, in cases where the datasets are relatively balanced, the use of augmentation techniques may result in a negative effect on the classification performance.

READ FULL TEXT

page 8

page 9

research
04/05/2023

Performance of Data Augmentation Methods for Brazilian Portuguese Text Classification

Improving machine learning performance while increasing model generaliza...
research
06/12/2023

Textual Augmentation Techniques Applied to Low Resource Machine Translation: Case of Swahili

In this work we investigate the impact of applying textual data augmenta...
research
08/22/2018

SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation

In this work, we examine methods for data augmentation for text-based ta...
research
12/27/2019

A Multi-cascaded Model with Data Augmentation for Enhanced Paraphrase Detection in Short Texts

Paraphrase detection is an important task in text analytics with numerou...
research
08/10/2022

Data Augmentation for Improving Emotion Recognition in Software Engineering Communication

Emotions (e.g., Joy, Anger) are prevalent in daily software engineering ...
research
01/09/2023

Transfer learning for conflict and duplicate detection in software requirement pairs

Consistent and holistic expression of software requirements is important...
research
12/05/2018

Text Data Augmentation Made Simple By Leveraging NLP Cloud APIs

In practice, it is common to find oneself with far too little text data ...

Please sign up or login with your details

Forgot password? Click here to reset