The power of private likelihood-ratio tests for goodness-of-fit in frequency tables

09/20/2021
by   Emanuele Dolera, et al.
0

Privacy-protecting data analysis investigates statistical methods under privacy constraints. This is a rising challenge in modern statistics, as the achievement of confidentiality guarantees, which typically occurs through suitable perturbations of the data, may determine a loss in the statistical utility of the data. In this paper, we consider privacy-protecting tests for goodness-of-fit in frequency tables, this being arguably the most common form of releasing data. Under the popular framework of (ε,δ)-differential privacy for perturbed data, we introduce a private likelihood-ratio (LR) test for goodness-of-fit and we study its large sample properties, showing the importance of taking the perturbation into account to avoid a loss in the statistical significance of the test. Our main contribution provides a quantitative characterization of the trade-off between confidentiality, measured via differential privacy parameters ε and δ, and utility, measured via the power of the test. In particular, we establish a precise Bahadur-Rao type large deviation expansion for the power of the private LR test, which leads to: i) identify a critical quantity, as a function of the sample size and (ε,δ), which determines a loss in the power of the private LR test; ii) quantify the sample cost of (ε,δ)-differential privacy in the private LR test, namely the additional sample size that is required to recover the power of the LR test in the absence of perturbation. Such a result relies on a novel multidimensional large deviation principle for sum of i.i.d. random vectors, which is of independent interest. Our work presents the first rigorous treatment of privacy-protecting LR tests for goodness-of-fit in frequency tables, making use of the power of the test to quantify the trade-off between confidentiality and utility.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

03/19/2022

Differential Private Discrete Noise Adding Mechanism: Conditions, Properties and Optimization

Differential privacy is a standard framework to quantify the privacy los...
04/10/2022

Private Sequential Hypothesis Testing for Statisticians: Privacy, Error Rates, and Sample Size

The sequential hypothesis testing problem is a class of statistical anal...
05/07/2022

Private Hypothesis Testing for Social Sciences

While running any experiment, we often have to consider the statistical ...
08/11/2021

Statistical Inference in the Differential Privacy Model

In modern settings of data analysis, we may be running our algorithms on...
06/07/2020

BUDS: Balancing Utility and Differential Privacy by Shuffling

Balancing utility and differential privacy by shuffling or BUDS is an ap...
07/08/2019

Differential Privacy in the 2020 Decennial Census and the Implications for Available Data Products

In early 2021, the US Census Bureau will begin releasing statistical tab...
03/24/2018

Comparing Population Means under Local Differential Privacy: with Significance and Power

A statistical hypothesis test determines whether a hypothesis should be ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.