Log In Sign Up

Differentially Private Weighted Sampling

by   Edith Cohen, et al.

Common datasets have the form of elements with keys (e.g., transactions and products) and the goal is to perform analytics on the aggregated form of key and frequency pairs. A weighted sample of keys by (a function of) frequency is a highly versatile summary that provides a sparse set of representative keys and supports approximate evaluations of query statistics. We propose private weighted sampling (PWS): A method that ensures element-level differential privacy while retaining, to the extent possible, the utility of a respective non-private weighted sample. PWS maximizes the reporting probabilities of keys and improves over the state of the art also for the well-studied special case of private histograms, when no sampling is performed. We empirically demonstrate significant performance gains compared with prior baselines: 20%-300% increase in key reporting for common Zipfian frequency distributions and accuracy for × 2-8 lower frequencies in estimation tasks. Moreover, PWS is applied as a simple post-processing of a non-private sample, without requiring the original data. This allows for seamless integration with existing implementations of non-private schemes and retaining the efficiency of schemes designed for resource-constrained settings such as massive distributed or streamed data. We believe that due to practicality and performance, PWS may become a method of choice in applications where privacy is desired.


page 1

page 2

page 3

page 4


Sampling Sketches for Concave Sublinear Functions of Frequencies

We consider massive distributed datasets that consist of elements modele...

Improved Utility Analysis of Private CountSketch

Sketching is an important tool for dealing with high-dimensional vectors...

PCKV: Locally Differentially Private Correlated Key-Value Data Collection with Optimized Utility

Data collection under local differential privacy (LDP) has been mostly s...

On the Round Complexity of the Shuffle Model

The shuffle model of differential privacy was proposed as a viable model...

Controlling Privacy Loss in Survey Sampling (Working Paper)

Social science and economics research is often based on data collected i...

Differential Private Stream Processing of Energy Consumption

A number of applications benefit from continuously releasing streams of ...