Calibrate: Frequency Estimation and Heavy Hitter Identification with Local Differential Privacy via Incorporating Prior Knowledge

12/05/2018
by   Jinyuan Jia, et al.
0

Estimating frequencies of certain items among a population is a basic step in data analytics, which enables more advanced data analytics (e.g., heavy hitter identification, frequent pattern mining), client software optimization, and detecting unwanted or malicious hijacking of user settings in browsers. Frequency estimation and heavy hitter identification with local differential privacy (LDP) protect user privacy as well as the data collector. Existing LDP algorithms cannot leverage 1) prior knowledge about the noise in the estimated item frequencies and 2) prior knowledge about the true item frequencies. As a result, they achieve suboptimal performance in practice. In this work, we aim to design LDP algorithms that can leverage such prior knowledge. Specifically, we design Calibrate to incorporate the prior knowledge via statistical inference. Calibrate can be appended to an existing LDP algorithm to reduce its estimation errors. We model the prior knowledge about the noise and the true item frequencies as two probability distributions, respectively. Given the two probability distributions and an estimated frequency of an item produced by an existing LDP algorithm, our Calibrate computes the conditional probability distribution of the item's frequency and uses the mean of the conditional probability distribution as the calibrated frequency for the item. It is challenging to estimate the two probability distributions due to data sparsity. We address the challenge via integrating techniques from statistics and machine learning. Our empirical results on two real-world datasets show that Calibrate significantly outperforms state-of-the-art LDP algorithms for frequency estimation and heavy hitter identification.

READ FULL TEXT
research
11/05/2019

Data Poisoning Attacks to Local Differential Privacy Protocols

Local Differential Privacy (LDP) protocols enable an untrusted data coll...
research
09/03/2022

LDP-FPMiner: FP-Tree Based Frequent Itemset Mining with Local Differential Privacy

Data aggregation in the setting of local differential privacy (LDP) guar...
research
10/31/2022

Local Differentially Private Frequency Estimation based on Learned Sketches

Sketches are widely used for frequency estimation of data with a large d...
research
06/05/2019

Locally Differentially Private Data Collection and Analysis

Local differential privacy (LDP) can provide each user with strong priva...
research
10/03/2021

Unified Likelihood Ratio Estimation for High- to Zero-frequency N-grams

Likelihood ratios (LRs), which are commonly used for probabilistic data ...
research
06/08/2017

Distribution-Free One-Pass Learning

In many large-scale machine learning applications, data are accumulated ...
research
04/09/2020

Composable Sketches for Functions of Frequencies: Beyond the Worst Case

Recently there has been increased interest in using machine learning tec...

Please sign up or login with your details

Forgot password? Click here to reset