Selective Inference with Distributed Data

01/15/2023
by   Sifan Liu, et al.
0

Nowadays, big datasets are spread over many machines which compute in parallel and communicate with a central machine through short messages. We consider a sparse regression setting in our paper and develop a new procedure for selective inference with distributed data. While there are many distributed procedures for point estimation in the sparse setting, not many options exist for estimating uncertainties or conducting hypothesis tests in models based on the estimated sparsity. We solve a generalized linear regression on each machine which communicates a selected set of predictors to the central machine. The central machine forms a generalized linear model with the selected predictors. How do we conduct selective inference for the selected regression coefficients? Is it possible to reuse distributed data, in an aggregated form, for selective inference? Our proposed procedure bases approximately-valid selective inference on an asymptotic likelihood. The proposal seeks only aggregated information, in relatively few dimensions, from each machine which is merged at the central machine to construct selective inference. Our procedure is also broadly applicable as a solution to the p-value lottery problem that arises with model selection on random splits of data.

READ FULL TEXT

page 21

page 22

research
07/15/2020

Selective Inference for Additive and Linear Mixed Models

This work addresses the problem of conducting valid inference for additi...
research
03/29/2022

A new procedure for Selective Inference with the Generalized Linear Lasso

This articles investigates the distribution of the solutions of the gene...
research
06/26/2019

Selective Inference via Marginal Screening for High Dimensional Classification

Post-selection inference is a statistical technique for determining sali...
research
05/02/2021

Selective Inference in Propensity Score Analysis

Selective inference (post-selection inference) is a methodology that has...
research
01/02/2023

Selective Conformal Inference with FCR Control

Conformal inference is a popular tool for constructing prediction interv...
research
09/30/2018

Distributed linear regression by averaging

Modern massive datasets pose an enormous computational burden to practit...
research
02/13/2019

Selective Inference for Testing Trees and Edges in Phylogenetics

Selective inference is considered for testing trees and edges in phyloge...

Please sign up or login with your details

Forgot password? Click here to reset