RPN: A Word Vector Level Data Augmentation Algorithm in Deep Learning for Language Understanding

12/12/2022
by   Zhengqing Yuan, et al.
0

This paper presents a new data augmentation algorithm for natural understanding tasks, called RPN:Random Position Noise algorithm.Due to the relative paucity of current text augmentation methods. Few of the extant methods apply to natural language understanding tasks for all sentence-level tasks.RPN applies the traditional augmentation on the original text to the word vector level. The RPN algorithm makes a substitution in one or several dimensions of some word vectors. As a result, the RPN can introduce a certain degree of perturbation to the sample and can adjust the range of perturbation on different tasks. The augmented samples are then used to give the model training.This makes the model more robust. In subsequent experiments, we found that adding RPN to the training or fine-tuning model resulted in a stable boost on all 8 natural language processing tasks, including TweetEval, CoLA, and SST-2 datasets, and more significant improvements than other data augmentation algorithms.The RPN algorithm applies to all sentence-level tasks for language understanding and is used in any deep learning model with a word embedding layer.

READ FULL TEXT
research
04/22/2018

Word Embedding Perturbation for Sentence Classification

In this technique report, we aim to mitigate the overfitting problem of ...
research
05/12/2022

TreeMix: Compositional Constituency-based Data Augmentation for Natural Language Understanding

Data augmentation is an effective approach to tackle over-fitting. Many ...
research
10/16/2020

CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding

Data augmentation has been demonstrated as an effective strategy for imp...
research
02/22/2021

MixUp Training Leads to Reduced Overfitting and Improved Calibration for the Transformer Architecture

MixUp is a computer vision data augmentation technique that uses convex ...
research
09/27/2019

Automatically Learning Data Augmentation Policies for Dialogue Tasks

Automatic data augmentation (AutoAugment) (Cubuk et al., 2019) searches ...
research
05/03/2023

PTP: Boosting Stability and Performance of Prompt Tuning with Perturbation-Based Regularizer

Recent studies show that prompt tuning can better leverage the power of ...
research
04/11/2020

DeepSentiPers: Novel Deep Learning Models Trained Over Proposed Augmented Persian Sentiment Corpus

This paper focuses on how to extract opinions over each Persian sentence...

Please sign up or login with your details

Forgot password? Click here to reset