How Many Folders Do You Really Need?

by   Mihajlo Grbovic, et al.

Email classification is still a mostly manual task. Consequently, most Web mail users never define a single folder. Recently however, automatic classification offering the same categories to all users has started to appear in some Web mail clients, such as AOL or Gmail. We adopt this approach, rather than previous (unsuccessful) personalized approaches because of the change in the nature of consumer email traffic, which is now dominated by (non-spam) machine-generated email. We propose here a novel approach for (1) automatically distinguishing between personal and machine-generated email and (2) classifying messages into latent categories, without requiring users to have defined any folder. We report how we have discovered that a set of 6 "latent" categories (one for human- and the others for machine-generated messages) can explain a significant portion of email traffic. We describe in details the steps involved in building a Web-scale email categorization system, from the collection of ground-truth labels, the selection of features to the training of models. Experimental evaluation was performed on more than 500 billion messages received during a period of six months by users of Yahoo mail service, who elected to be part of such research studies. Our system achieved precision and recall rates close to 90 cover 70 results pave the way for a change of approach in the Web mail industry, and could support the invention of new large-scale email discovery paradigms that had not been possible before.


page 1

page 2

page 3

page 4


Wrapper Maintenance: A Machine Learning Approach

The proliferation of online information sources has led to an increased ...

Refining Image Categorization by Exploiting Web Images and General Corpus

Studies show that refining real-world categories into semantic subcatego...

Development of email classifier in Brazilian Portuguese using feature selection for automatic response

Automatic email categorization is an important application of text class...

Unsupervised paradigm for information extraction from transcripts using BERT

Audio call transcripts are one of the valuable sources of information fo...

Fast Context-Annotated Classification of Different Types of Web Service Descriptions

In the recent rapid growth of web services, IoT, and cloud computing, ma...

Field Label Prediction for Autofill in Web Browsers

Automatic form fill is an important productivity related feature present...

Toward the Automatic Classification of Self-Affirmed Refactoring

The concept of Self-Affirmed Refactoring (SAR) was introduced to explore...

Please sign up or login with your details

Forgot password? Click here to reset