SMS Spam Filtering using Probabilistic Topic Modelling and Stacked Denoising Autoencoder

06/17/2016
by   Noura Al Moubayed, et al.
1

In This paper we present a novel approach to spam filtering and demonstrate its applicability with respect to SMS messages. Our approach requires minimum features engineering and a small set of la- belled data samples. Features are extracted using topic modelling based on latent Dirichlet allocation, and then a comprehensive data model is created using a Stacked Denoising Autoencoder (SDA). Topic modelling summarises the data providing ease of use and high interpretability by visualising the topics using word clouds. Given that the SMS messages can be regarded as either spam (unwanted) or ham (wanted), the SDA is able to model the messages and accurately discriminate between the two classes without the need for a pre-labelled training set. The results are compared against the state-of-the-art spam detection algorithms with our proposed approach achieving over 97 best reported algorithms presented in the literature.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/10/2016

Decoding Stacked Denoising Autoencoders

Data representation in a stacked denoising autoencoder is investigated. ...
research
02/13/2023

Visualizing Topic Uncertainty in Topic Modelling

Word clouds became a standard tool for presenting results of natural lan...
research
03/09/2022

Enhance Topics Analysis based on Keywords Properties

Topic Modelling is one of the most prevalent text analysis technique use...
research
03/22/2015

Real-time Dynamic MRI Reconstruction using Stacked Denoising Autoencoder

In this work we address the problem of real-time dynamic MRI reconstruct...
research
12/14/2018

On Stacked Denoising Autoencoder based Pre-training of ANN for Isolated Handwritten Bengali Numerals Dataset Recognition

This work attempts to find the most optimal parameter setting of a deep ...
research
07/29/2016

TopicResponse: A Marriage of Topic Modelling and Rasch Modelling for Automatic Measurement in MOOCs

This paper explores the suitability of using automatically discovered to...
research
03/04/2022

Beyond Plain Toxic: Detection of Inappropriate Statements on Flammable Topics for the Russian Language

Toxicity on the Internet, such as hate speech, offenses towards particul...

Please sign up or login with your details

Forgot password? Click here to reset