Calibrating Noise to Variance in Adaptive Data Analysis

by   Vitaly Feldman, et al.

Datasets are often used multiple times and each successive analysis may depend on the outcome of previous analyses. Standard techniques for ensuring generalization and statistical validity do not account for this adaptive dependence. A recent line of work studies the challenges that arise from such adaptive data reuse by considering the problem of answering a sequence of "queries" about the data distribution where each query may depend arbitrarily on answers to previous queries. The strongest results obtained for this problem rely on differential privacy -- a strong notion of algorithmic stability with the important property that it "composes" well when data is reused. However the notion is rather strict, as it requires stability under replacement of an arbitrary data element. The simplest algorithm is to add Gaussian (or Laplace) noise to distort the empirical answers. However, analysing this technique using differential privacy yields suboptimal accuracy guarantees when the queries have low variance. Here we propose a relaxed notion of stability that also composes adaptively. We demonstrate that a simple and natural algorithm based on adding noise scaled to the standard deviation of the query provides our notion of stability. This implies an algorithm that can answer statistical queries about the dataset with substantially improved accuracy guarantees for low-variance queries. The only previous approach that provides such accuracy guarantees is based on a more involved differentially private median-of-means algorithm and its analysis exploits stronger "group" stability of the algorithm.


page 1

page 2

page 3

page 4


Generalization for Adaptively-chosen Estimators via Stable Median

Datasets are often reused to perform multiple statistical analyses in an...

Generalization in the Face of Adaptivity: A Bayesian Perspective

Repeated use of a data sample via adaptively chosen queries can rapidly ...

The Everlasting Database: Statistical Validity at a Fair Price

The problem of handling adaptivity in data analysis, intentional or not,...

A New Analysis of Differential Privacy's Generalization Guarantees

We give a new proof of the "transfer theorem" underlying adaptive data a...

Bayesian Adaptive Data Analysis Guarantees from Subgaussianity

The new field of adaptive data analysis seeks to provide algorithms and ...

A necessary and sufficient stability notion for adaptive generalization

We introduce a new notion of the stability of computations, which holds ...

The Sparse Vector Technique, Revisited

We revisit one of the most basic and widely applicable techniques in the...