Development of email classifier in Brazilian Portuguese using feature selection for automatic response

07/08/2019
by   Rogerio Bonatti, et al.
0

Automatic email categorization is an important application of text classification. We study the automatic reply of email business messages in Brazilian Portuguese. We present a novel corpus containing messages from a real application, and baseline categorization experiments using Naive Bayes and support Vector Machines. We then discuss the effect of lemmatization and the role of part-of-speech tagging filtering on precision and recall. Support Vector Machines classification coupled with nonlemmatized selection of verbs, nouns and adjectives was the best approach, with 87.3 Straightforward lemmatization in Portuguese led to the lowest classification results in the group, with 85.3 respectively. Thus, while lemmatization reduced precision and recall, part-of-speech filtering improved overall results.

READ FULL TEXT

page 26

page 30

page 34

research
12/04/2009

Qualitative Robustness of Support Vector Machines

Support vector machines have attracted much attention in theoretical and...
research
08/27/2020

Automatic Speech Summarisation: A Scoping Review

Speech summarisation techniques take human speech as input and then outp...
research
04/26/2022

Automatic Monitoring of Fruit Ripening Rooms by UHF RFID Sensor Network and Machine Learning

Accelerated ripening through the exposure of fruits to controlled enviro...
research
08/25/2010

Machine Learning Approaches for Modeling Spammer Behavior

Spam is commonly known as unsolicited or unwanted email messages in the ...
research
10/28/2019

A Comparison of Neural Network Training Methods for Text Classification

We study the impact of neural networks in text classification. Our focus...
research
06/29/2016

How Many Folders Do You Really Need?

Email classification is still a mostly manual task. Consequently, most W...

Please sign up or login with your details

Forgot password? Click here to reset