Leveraging GPT-2 for Classifying Spam Reviews with Limited Labeled Data via Adversarial Training

12/24/2020
by   Athirai A. Irissappane, et al.
6

Online reviews are a vital source of information when purchasing a service or a product. Opinion spammers manipulate these reviews, deliberately altering the overall perception of the service. Though there exists a corpus of online reviews, only a few have been labeled as spam or non-spam, making it difficult to train spam detection models. We propose an adversarial training mechanism leveraging the capabilities of Generative Pre-Training 2 (GPT-2) for classifying opinion spam with limited labeled data and a large set of unlabeled data. Experiments on TripAdvisor and YelpZip datasets show that the proposed model outperforms state-of-the-art techniques by at least 7 accuracy when labeled data is limited. The proposed model can also generate synthetic spam/non-spam reviews with reasonable perplexity, thereby, providing additional labeled data during training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/19/2019

GANs for Semi-Supervised Opinion Spam Detection

Online reviews have become a vital source of information in purchasing a...
research
05/26/2022

Opinion Spam Detection: A New Approach Using Machine Learning and Network-Based Algorithms

E-commerce is the fastest-growing segment of the economy. Online reviews...
research
10/22/2020

Self-training and Pre-training are Complementary for Speech Recognition

Self-training and unsupervised pre-training have emerged as effective ap...
research
12/29/2021

Attention-based Bidirectional LSTM for Deceptive Opinion Spam Classification

Online Reviews play a vital role in e commerce for decision making. Much...
research
06/15/2020

Improving Adversarial Robustness via Unlabeled Out-of-Domain Data

Data augmentation by incorporating cheap unlabeled data from multiple do...
research
12/27/2020

Improving Opinion Spam Detection by Cumulative Relative Frequency Distribution

Over the last years, online reviews became very important since they can...
research
07/07/2019

Neural Aspect and Opinion Term Extraction with Mined Rules as Weak Supervision

Lack of labeled training data is a major bottleneck for neural network b...

Please sign up or login with your details

Forgot password? Click here to reset