The power of private likelihood-ratio tests for goodness-of-fit in frequency tables

09/20/2021
by   Emanuele Dolera, et al.
0

Privacy-protecting data analysis investigates statistical methods under privacy constraints. This is a rising challenge in modern statistics, as the achievement of confidentiality guarantees, which typically occurs through suitable perturbations of the data, may determine a loss in the statistical utility of the data. In this paper, we consider privacy-protecting tests for goodness-of-fit in frequency tables, this being arguably the most common form of releasing data. Under the popular framework of (ε,δ)-differential privacy for perturbed data, we introduce a private likelihood-ratio (LR) test for goodness-of-fit and we study its large sample properties, showing the importance of taking the perturbation into account to avoid a loss in the statistical significance of the test. Our main contribution provides a quantitative characterization of the trade-off between confidentiality, measured via differential privacy parameters ε and δ, and utility, measured via the power of the test. In particular, we establish a precise Bahadur-Rao type large deviation expansion for the power of the private LR test, which leads to: i) identify a critical quantity, as a function of the sample size and (ε,δ), which determines a loss in the power of the private LR test; ii) quantify the sample cost of (ε,δ)-differential privacy in the private LR test, namely the additional sample size that is required to recover the power of the LR test in the absence of perturbation. Such a result relies on a novel multidimensional large deviation principle for sum of i.i.d. random vectors, which is of independent interest. Our work presents the first rigorous treatment of privacy-protecting LR tests for goodness-of-fit in frequency tables, making use of the power of the test to quantify the trade-off between confidentiality and utility.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/19/2022

Differential Private Discrete Noise Adding Mechanism: Conditions, Properties and Optimization

Differential privacy is a standard framework to quantify the privacy los...
research
10/05/2022

On the Statistical Complexity of Estimation and Testing under Privacy Constraints

Producing statistics that respect the privacy of the samples while still...
research
05/07/2022

Private Hypothesis Testing for Social Sciences

While running any experiment, we often have to consider the statistical ...
research
04/10/2022

Private Sequential Hypothesis Testing for Statisticians: Privacy, Error Rates, and Sample Size

The sequential hypothesis testing problem is a class of statistical anal...
research
08/01/2018

A Differentially Private Kernel Two-Sample Test

Kernel two-sample testing is a useful statistical tool in determining wh...
research
07/08/2019

Differential Privacy in the 2020 Decennial Census and the Implications for Available Data Products

In early 2021, the US Census Bureau will begin releasing statistical tab...
research
03/02/2021

Significance tests of feature relevance for a blackbox learner

An exciting recent development is the uptake of deep learning in many sc...

Please sign up or login with your details

Forgot password? Click here to reset