STA: Self-controlled Text Augmentation for Improving Text Classifications

02/24/2023
by   Congcong Wang, et al.
0

Despite recent advancements in Machine Learning, many tasks still involve working in low-data regimes which can make solving natural language problems difficult. Recently, a number of text augmentation techniques have emerged in the field of Natural Language Processing (NLP) which can enrich the training data with new examples, though they are not without their caveats. For instance, simple rule-based heuristic methods are effective, but lack variation in semantic content and syntactic structure with respect to the original text. On the other hand, more complex deep learning approaches can cause extreme shifts in the intrinsic meaning of the text and introduce unwanted noise into the training data. To more reliably control the quality of the augmented examples, we introduce a state-of-the-art approach for Self-Controlled Text Augmentation (STA). Our approach tightly controls the generation process by introducing a self-checking procedure to ensure that generated examples retain the semantic content of the original text. Experimental results on multiple benchmarking datasets demonstrate that STA substantially outperforms existing state-of-the-art techniques, whilst qualitative analysis reveals that the generated examples are both lexically diverse and semantically reliable.

READ FULL TEXT
research
05/16/2023

Boosting Event Extraction with Denoised Structure-to-Text Augmentation

Event extraction aims to recognize pre-defined event triggers and argume...
research
11/02/2020

An Empirical Study of Contextual Data Augmentation for Japanese Zero Anaphora Resolution

One critical issue of zero anaphora resolution (ZAR) is the scarcity of ...
research
05/26/2023

ParaAMR: A Large-Scale Syntactically Diverse Paraphrase Dataset by AMR Back-Translation

Paraphrase generation is a long-standing task in natural language proces...
research
02/19/2021

Multilingual Augmenter: The Model Chooses

Natural Language Processing (NLP) relies heavily on training data. Trans...
research
10/16/2021

Improving Compositional Generalization with Self-Training for Data-to-Text Generation

Data-to-text generation focuses on generating fluent natural language re...
research
03/26/2021

Data Augmentation in Natural Language Processing: A Novel Text Generation Approach for Long and Short Text Classifiers

In many cases of machine learning, research suggests that the developmen...
research
08/24/2023

Text Similarity from Image Contents using Statistical and Semantic Analysis Techniques

Plagiarism detection is one of the most researched areas among the Natur...

Please sign up or login with your details

Forgot password? Click here to reset