Secure and Scalable Data Classification

06/25/2020
by   Paulo Tanaka, et al.
0

Content based data classification is an open challenge. Traditional Data Loss Prevention (DLP)-like systems solve this problem by fingerprinting the data in question and monitoring endpoints for the fingerprinted data. With trillions of constantly changing data assets in Facebook, this approach is both not scalable and ineffective in discovering what data is where. This paper is about an end-to-end system built to detect sensitive semantic types within Facebook at scale and enforce data retention and access controls automatically. The approach described here is our first end-to-end privacy system that attempts to solve this problem by incorporating data signals, machine learning, and traditional fingerprinting techniques to map out and classify all data within Facebook. The described system is in production achieving a 0.9+ average F2 scores across various privacy classes while handling trillions of data assets across dozens of data stores.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/25/2020

Scalable Data Classification for Security and Privacy

Content based data classification is an open challenge. Traditional Data...
research
06/25/2020

Privacy at Facebook Scale

Most organizations today collect data across every facet of their busine...
research
09/24/2019

Jointly Learning to Detect Emotions and Predict Facebook Reactions

The growing ubiquity of Social Media data offers an attractive perspecti...
research
12/01/2021

Seeking Sinhala Sentiment: Predicting Facebook Reactions of Sinhala Posts

The Facebook network allows its users to record their reactions to text ...
research
02/21/2023

Que2Engage: Embedding-based Retrieval for Relevant and Engaging Products at Facebook Marketplace

Embedding-based Retrieval (EBR) in e-commerce search is a powerful searc...
research
11/01/2018

Horizon: Facebook's Open Source Applied Reinforcement Learning Platform

In this paper we present Horizon, Facebook's open source applied reinfor...
research
09/08/2021

Knowledge Learning-based Adaptable System for Sensitive Information Identification and Handling

Diagnostic data such as logs and memory dumps from production systems ar...

Please sign up or login with your details

Forgot password? Click here to reset