Calibrating Noise to Variance in Adaptive Data Analysis

12/19/2017
by   Vitaly Feldman, et al.
0

Datasets are often used multiple times and each successive analysis may depend on the outcome of previous analyses. Standard techniques for ensuring generalization and statistical validity do not account for this adaptive dependence. A recent line of work studies the challenges that arise from such adaptive data reuse by considering the problem of answering a sequence of "queries" about the data distribution where each query may depend arbitrarily on answers to previous queries. The strongest results obtained for this problem rely on differential privacy -- a strong notion of algorithmic stability with the important property that it "composes" well when data is reused. However the notion is rather strict, as it requires stability under replacement of an arbitrary data element. The simplest algorithm is to add Gaussian (or Laplace) noise to distort the empirical answers. However, analysing this technique using differential privacy yields suboptimal accuracy guarantees when the queries have low variance. Here we propose a relaxed notion of stability that also composes adaptively. We demonstrate that a simple and natural algorithm based on adding noise scaled to the standard deviation of the query provides our notion of stability. This implies an algorithm that can answer statistical queries about the dataset with substantially improved accuracy guarantees for low-variance queries. The only previous approach that provides such accuracy guarantees is based on a more involved differentially private median-of-means algorithm and its analysis exploits stronger "group" stability of the algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2017

Generalization for Adaptively-chosen Estimators via Stable Median

Datasets are often reused to perform multiple statistical analyses in an...
research
06/20/2021

Generalization in the Face of Adaptivity: A Bayesian Perspective

Repeated use of a data sample via adaptively chosen queries can rapidly ...
research
09/09/2019

A New Analysis of Differential Privacy's Generalization Guarantees

We give a new proof of the "transfer theorem" underlying adaptive data a...
research
03/12/2018

The Everlasting Database: Statistical Validity at a Fair Price

The problem of handling adaptivity in data analysis, intentional or not,...
research
02/17/2023

Subsampling Suffices for Adaptive Data Analysis

Ensuring that analyses performed on a dataset are representative of the ...
research
10/31/2016

Bayesian Adaptive Data Analysis Guarantees from Subgaussianity

The new field of adaptive data analysis seeks to provide algorithms and ...
research
03/05/2019

A New Approach to Adaptive Data Analysis and Learning via Maximal Leakage

There is an increasing concern that most current published research find...

Please sign up or login with your details

Forgot password? Click here to reset