Classifying Emails into Human vs Machine Category

12/14/2021
by   Changsung Kang, et al.
0

It is an essential product requirement of Yahoo Mail to distinguish between personal and machine-generated emails. The old production classifier in Yahoo Mail was based on a simple logistic regression model. That model was trained by aggregating features at the SMTP address level. We propose building deep learning models at the message level. We built and trained four individual CNN models: (1) a content model with subject and content as input; (2) a sender model with sender email address and name as input; (3) an action model by analyzing email recipients' action patterns and correspondingly generating target labels based on senders' opening/deleting behaviors; (4) a salutation model by utilizing senders' "explicit salutation" signal as positive labels. Next, we built a final full model after exploring different combinations of the above four models. Experimental results on editorial data show that our full model improves the adjusted-recall from 70.5 production model, while at the same time lifts the precision from 94.7 96.0 at this task. This full model has been deployed into the current production system (Yahoo Mail 6).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/01/2022

Deep vs. Shallow Learning: A Benchmark Study in Low Magnitude Earthquake Detection

While deep learning models have seen recent high uptake in the geoscienc...
research
04/13/2018

Language Recognition using Time Delay Deep Neural Network

This work explores the use of a monolingual Deep Neural Network (DNN) mo...
research
12/16/2022

Shapley variable importance cloud for machine learning models

Current practice in interpretable machine learning often focuses on expl...
research
04/27/2018

Auto-Detection of Safety Issues in Baby Products

Every year, thousands of people receive consumer product related injurie...
research
06/07/2019

Building a Production Model for Retrieval-Based Chatbots

Response suggestion is an important task for building human-computer con...
research
12/27/2018

Classification of radiology reports by modality and anatomy: A comparative study

Data labeling is currently a time-consuming task that often requires exp...
research
04/19/2021

Production vs Perception: The Role of Individuality in Usage-Based Grammar Induction

This paper asks whether a distinction between production-based and perce...

Please sign up or login with your details

Forgot password? Click here to reset