Surrogate Aided Unsupervised Recovery of Sparse Signals in Single Index Models for Binary Outcomes

01/18/2017
by   Abhishek Chakrabortty, et al.
0

We consider the recovery of regression coefficients, denoted by β_0, for a single index model (SIM) relating a binary outcome Y to a set of possibly high dimensional covariates X, based on a large but 'unlabeled' dataset U, with Y never observed. On U, we fully observe X and additionally, a surrogate S which, while not being strongly predictive of Y throughout the entirety of its support, can forecast it with high accuracy when it assumes extreme values. Such datasets arise naturally in modern studies involving large databases such as electronic medical records (EMR) where Y, unlike (X, S), is difficult and/or expensive to obtain. In EMR studies, an example of Y and S would be the true disease phenotype and the count of the associated diagnostic codes respectively. Assuming another SIM for S given X, we show that under sparsity assumptions, we can recover β_0 proportionally by simply fitting a least squares LASSO estimator to the subset of the observed data on (X, S) restricted to the extreme sets of S, with Y imputed using the surrogacy of S. We obtain sharp finite sample performance bounds for our estimator, including deterministic deviation bounds and probabilistic guarantees. We demonstrate the effectiveness of our approach through multiple simulation studies, as well as by application to real data from an EMR study conducted at the Partners HealthCare Systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/11/2022

Single-index models for extreme value index regression

Since the extreme value index (EVI) controls the tail behaviour of the d...
research
09/10/2022

Slice Weighted Average Regression

It has previously been shown that ordinary least squares can be used to ...
research
05/17/2021

Convergence guarantee for the sparse monotone single index model

We consider a high-dimensional monotone single index model (hdSIM), whic...
research
07/08/2020

Sparse Regression for Extreme Values

We study the problem of selecting features associated with extreme value...
research
04/20/2019

Learning Sparse Dynamical Systems from a Single Sample Trajectory

This paper addresses the problem of identifying sparse linear time-invar...
research
12/28/2018

Predicting with Proxies

Predictive analytics is increasingly used to guide decision-making in ma...
research
10/12/2018

Spherical Regression under Mismatch Corruption with Application to Automated Knowledge Translation

Motivated by a series of applications in data integration, language tran...

Please sign up or login with your details

Forgot password? Click here to reset