SpaML: a Bimodal Ensemble Learning Spam Detector based on NLP Techniques
In this paper, we put forward a new tool, called SpaML, for spam detection using a set of supervised and unsupervised classifiers, and two techniques imbued with Natural Language Processing (NLP), namely Bag of Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF). We first present the NLP techniques used. Then, we present our classifiers and their performance on each of these techniques. Then, we present our overall Ensemble Learning classifier and the strategy we are using to combine them. Finally, we present the interesting results shown by SpaML in terms of accuracy and precision.
READ FULL TEXT