A parametric distribution for exact post-selection inference with data carving

05/21/2023
by   Erik Drysdale, et al.
0

Post-selection inference (PoSI) is a statistical technique for obtaining valid confidence intervals and p-values when hypothesis generation and testing use the same source of data. PoSI can be used on a range of popular algorithms including the Lasso. Data carving is a variant of PoSI in which a portion of held out data is combined with the hypothesis generating data at inference time. While data carving has attractive theoretical and empirical properties, existing approaches rely on computationally expensive MCMC methods to carry out inference. This paper's key contribution is to show that pivotal quantities can be constructed for the data carving procedure based on a known parametric distribution. Specifically, when the selection event is characterized by a set of polyhedral constraints on a Gaussian response, data carving will follow the sum of a normal and a truncated normal (SNTN), which is a variant of the truncated bivariate normal distribution. The main impact of this insight is that obtaining exact inference for data carving can be made computationally trivial, since the CDF of the SNTN distribution can be found using the CDF of a standard bivariate normal. A python package sntn has been released to further facilitate the adoption of data carving with PoSI.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2020

A (tight) upper bound for the length of confidence intervals with conditional coverage

Kivaranovic and Leeb (2020) showed that confidence intervals based on a ...
research
12/29/2021

Exact Post-selection Inference For Tracking S P500

The problem that is solved in this paper is known as index tracking. The...
research
11/01/2017

Post-selection estimation and testing following aggregated association tests

The practice of pooling several individual test statistics to form aggre...
research
11/18/2020

Post-Selection Inference via Algorithmic Stability

Modern approaches to data analysis make extensive use of data-driven mod...
research
01/27/2018

More powerful post-selection inference, with application to the Lasso

Investigators often use the data to generate interesting hypotheses and ...
research
12/25/2022

Exact Selective Inference with Randomization

We introduce a pivot for exact selective inference with randomization. N...
research
12/10/2018

Post-Selection Inference for Changepoint Detection Algorithms with Application to Copy Number Variation Data

Changepoint detection methods are used in many areas of science and engi...

Please sign up or login with your details

Forgot password? Click here to reset