On the Existence of Densities for Functional Data and their Link to Statistical Privacy
In statistical privacy (or statistical disclosure control) the goal is to minimize the potential for identification of individual records or sensitive characteristics while at the same time ensuring that the released information provides accurate and valid statistical inference. Differential Privacy, DP, has emerged as a mathematically rigorous definition of risk and more broadly as a framework for releasing privacy enhanced versions of a statistical summary. This work develops an extensive theory for achieving DP with functional data or function valued parameters more generally. Functional data analysis, FDA, as well as other branches of statistics and machine learning often deal with function valued parameters. Functional data and/or functional parameters may contain unexpectedly large amounts of personally identifying information, and thus developing a privacy framework for these areas is critical in the era of big data. Our theoretical framework is based on densities over function spaces, which is of independent interest to FDA researchers, as densities have proven to be challenging to define and utilize for FDA models. Of particular interest to researchers working in statistical disclosure control, we demonstrate how even small amounts of over smoothing or regularizing can produce releases with substantially improved utility. We carry out extensive simulations to examine the utility of privacy enhanced releases and consider an application to Multiple Sclerosis and Diffusion Tensor Imaging.
READ FULL TEXT