Data blurring: sample splitting a single sample

12/21/2021
by   James Leiner, et al.
9

Suppose we observe a random vector X from some distribution P in a known family with unknown parameters. We ask the following question: when is it possible to split X into two parts f(X) and g(X) such that neither part is sufficient to reconstruct X by itself, but both together can recover X fully, and the joint distribution of (f(X),g(X)) is tractable? As one example, if X=(X_1,…,X_n) and P is a product distribution, then for any m<n, we can split the sample to define f(X)=(X_1,…,X_m) and g(X)=(X_m+1,…,X_n). Rasines and Young (2021) offers an alternative route of accomplishing this task through randomization of X with additive Gaussian noise which enables post-selection inference in finite samples for Gaussian distributed data and asymptotically for non-Gaussian additive models. In this paper, we offer a more general methodology for achieving such a split in finite samples by borrowing ideas from Bayesian inference to yield a (frequentist) solution that can be viewed as a continuous analog of data splitting. We call our method data blurring, as an alternative to data splitting, data carving and p-value masking. We exemplify the method on a few prototypical applications, such as post-selection inference for trend filtering and other regression problems.

READ FULL TEXT
research
10/13/2022

A New Optimality Property of Strang's Splitting

For systems of the form q̇ = M^-1 p, ṗ = -Aq+f(q), common in many applic...
research
02/03/2021

Splitting strategies for post-selection inference

We consider the problem of providing valid inference for a selected para...
research
11/07/2012

Blind Signal Separation in the Presence of Gaussian Noise

A prototypical blind signal separation problem is the so-called cocktail...
research
06/03/2020

Learning Kernel Tests Without Data Splitting

Modern large-scale kernel-based tests such as maximum mean discrepancy (...
research
01/19/2012

Split HMC for Gaussian Process Models

In this paper, we discuss an extension of the Split Hamiltonian Monte Ca...
research
07/03/2020

Stochastic Variational Bayesian Inference for a Nonlinear Forward Model

Variational Bayes (VB) has been used to facilitate the calculation of th...
research
08/11/2019

Sample Splitting as an M-Estimator with Application to Physical Activity Scoring

Sample splitting is widely used in statistical applications, including c...

Please sign up or login with your details

Forgot password? Click here to reset