A Two-stage Online Monitoring Procedure for High-Dimensional Data Streams

12/14/2017
by   Jun Li, et al.
0

Advanced computing and data acquisition technologies have made possible the collection of high-dimensional data streams in many fields. Efficient online monitoring tools which can correctly identify any abnormal data stream for such data are highly sought after. However, most of the existing monitoring procedures directly apply the false discover rate (FDR) controlling procedure to the data at each time point, and the FDR at each time point (the point-wise FDR) is either specified by users or determined by the in-control (IC) average run length (ARL). If the point-wise FDR is specified by users, the resulting procedure lacks control of the global FDR and keeps users in the dark in terms of the IC-ARL. If the point-wise FDR is determined by the IC-ARL, the resulting procedure does not give users the flexibility to choose the number of false alarms (Type-I errors) they can tolerate when identifying abnormal data streams, which often makes the procedure too conservative. To address those limitations, we propose a two-stage monitoring procedure that can control both the IC-ARL and Type-I errors at the levels specified by users. As a result, the proposed procedure allows users to choose not only how often they expect any false alarms when all data streams are IC, but also how many false alarms they can tolerate when identifying abnormal data streams. With this extra flexibility, our proposed two-stage monitoring procedure is shown in the simulation study and real data analysis to outperform the exiting methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/15/2017

Efficient Global Monitoring Statistics for High-Dimensional Data

Global monitoring statistics play an important role for developing effic...
research
06/05/2019

Robust real-time monitoring of high-dimensional data streams

Robust real-time monitoring of high-dimensional data streams has many im...
research
08/06/2019

Online Detection of Sparse Changes in High-Dimensional Data Streams Using Tailored Projections

When applying principal component analysis (PCA) for dimension reduction...
research
06/28/2020

Dual Control of Testing Errors in High-Dimensional Data Analysis

False negative errors are of major concern in applications where missing...
research
08/31/2022

Two-stage Hypothesis Tests for Variable Interactions with FDR Control

In many scenarios such as genome-wide association studies where dependen...
research
12/15/2021

Simultaneous Monitoring of a Large Number of Heterogeneous Categorical Data Streams

This article proposes a powerful scheme to monitor a large number of cat...
research
06/14/2021

z-anonymity: Zero-Delay Anonymization for Data Streams

With the advent of big data and the birth of the data markets that sell ...

Please sign up or login with your details

Forgot password? Click here to reset