Statistical Inference in High-dimensional Generalized Linear Models with Streaming Data

08/10/2021
by   Lan Luo, et al.
0

In this paper we develop an online statistical inference approach for high-dimensional generalized linear models with streaming data for real-time estimation and inference. We propose an online debiased lasso (ODL) method to accommodate the special structure of streaming data. ODL differs from offline debiased lasso in two important aspects. First, in computing the estimate at the current stage, it only uses summary statistics of the historical data. Second, in addition to debiasing an online lasso estimator, ODL corrects an approximation error term arising from nonlinear online updating with streaming data. We show that the proposed online debiased estimators for the GLMs are consistent and asymptotically normal. This result provides a theoretical basis for carrying out real-time interim statistical inference with streaming data. Extensive numerical experiments are conducted to evaluate the performance of the proposed ODL method. These experiments demonstrate the effectiveness of our algorithm and support the theoretical results. A streaming dataset from the National Automotive Sampling System-Crashworthiness Data System is analyzed to illustrate the application of the proposed method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2021

Online Debiased Lasso

We propose an online debiased lasso (ODL) method for statistical inferen...
research
10/03/2022

Inference on High-dimensional Single-index Models with Streaming Data

Traditional statistical methods are faced with new challenges due to str...
research
01/29/2020

Adaptive Estimation and Statistical Inference for High-Dimensional Graph-Based Linear Models

We consider adaptive estimation and statistical inference for high-dimen...
research
06/30/2021

Real-Time Regression Analysis of Streaming Clustered Data With Possible Abnormal Data Batches

This paper develops an incremental learning algorithm based on quadratic...
research
10/11/2022

Renewable Learning for Multiplicative Regression with Streaming Datasets

When large amounts of data continuously arrive in streams, online updati...
research
03/14/2023

Optimal Sampling Designs for Multi-dimensional Streaming Time Series with Application to Power Grid Sensor Data

The Internet of Things (IoT) system generates massive high-speed tempora...
research
08/29/2023

Streaming Compression of Scientific Data via weak-SINDy

In this paper a streaming weak-SINDy algorithm is developed specifically...

Please sign up or login with your details

Forgot password? Click here to reset