Privacy at Facebook Scale

06/25/2020
by   Paulo Tanaka, et al.
0

Most organizations today collect data across every facet of their business. There becomes no shortage of data in these businesses as this data eventually gets copied, transformed, and scattered across the organization's data warehouse. During privacy-related audits, organizations are required to locate all instances of a certain type of data to enforce privacy and security related policies around this data. In these cases, it becomes crucial to have insight into the data so that automatic access controls and data retention policies can be applied to certain data assets within the data stores. This paper is about an end-to-end system built to detect sensitive semantic types within Facebook at scale and enforce data retention and access controls automatically. Content based data classification is an open challenge. Traditional Data Loss Prevention (DLP)-like systems solve this problem by fingerprinting the data in question and monitoring endpoints for the fingerprinted data. With trillions of constantly changing data assets in Facebook, this approach is both not scalable and ineffective in discovering what data is where. Instead, the approach described here is our first end-to-end privacy system that attempts to solve this problem by incorporating data signals, machine learning, and traditional fingerprinting techniques to map out and classify all data within Facebook. The described system is in production achieving a 0.9+ average F2 scores across various privacy classes while handling trillions of data assets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/25/2020

Secure and Scalable Data Classification

Content based data classification is an open challenge. Traditional Data...
research
06/25/2020

Scalable Data Classification for Security and Privacy

Content based data classification is an open challenge. Traditional Data...
research
10/11/2017

Understanding Organizational Approach towards End User Privacy

End user privacy is a critical concern for all organizations that collec...
research
07/08/2021

Zeph: Cryptographic Enforcement of End-to-End Data Privacy

As increasingly more sensitive data is being collected to gain valuable ...
research
06/27/2022

EGEON: Software-Defined Data Protection for Object Storage

With the growth in popularity of cloud computing, object storage systems...
research
12/18/2020

PAARS: Privacy Aware Access Regulation System

During pandemics, health officials usually recommend access monitoring a...
research
09/04/2018

Challenges of capturing engagement on Facebook for Altmetrics

Previous research shows that, despite its popularity, Facebook is less f...

Please sign up or login with your details

Forgot password? Click here to reset