A Feasibility Study of Differentially Private Summary Statistics and Regression Analyses for Administrative Tax Data

10/22/2021
by   Andres F. Barrientos, et al.
0

Federal administrative tax data are invaluable for research, but because of privacy concerns, access to these data is typically limited to select agencies and a few individuals. An alternative to sharing microlevel data are validation servers, which allow individuals to query statistics without accessing the confidential data. This paper studies the feasibility of using differentially private (DP) methods to implement such a server. We provide an extensive study on existing DP methods for releasing tabular statistics, means, quantiles, and regression estimates. We also include new methodological adaptations to existing DP regression algorithms for using new data types and returning standard error estimates. We evaluate the selected methods based on the accuracy of the output for statistical analyses, using real administrative tax data obtained from the Internal Revenue Service Statistics of Income (SOI) Division. Our findings show that a validation server would be feasible for simple statistics but would struggle to produce accurate regression estimates and confidence intervals. We outline challenges and offer recommendations for future work on validation servers. This is the first comprehensive statistical study of DP methodology on a real, complex dataset, that has significant implications for the direction of a growing research field.

READ FULL TEXT

page 27

page 28

research
09/19/2023

DPpack: An R Package for Differentially Private Statistical Analysis and Machine Learning

Differential privacy (DP) is the state-of-the-art framework for guarante...
research
10/12/2022

Differentially Private Bootstrap: New Privacy Analysis and Inference Strategies

Differential private (DP) mechanisms protect individual-level informatio...
research
08/12/2022

Differentially Private Kolmogorov-Smirnov-Type Tests

The test statistics for many nonparametric hypothesis tests can be expre...
research
01/16/2022

Visualizing Privacy-Utility Trade-Offs in Differentially Private Data Releases

Organizations often collect private data and release aggregate statistic...
research
07/26/2022

Differentially Private Estimation via Statistical Depth

Constructing a differentially private (DP) estimator requires deriving t...
research
04/30/2020

A Primer on Private Statistics

Differentially private statistical estimation has seen a flurry of devel...

Please sign up or login with your details

Forgot password? Click here to reset