Asymptotic distribution-free change-point detection for data with repeated observations
In the regime of change-point detection, a nonparametric framework based on scan statistics utilizing graphs representing similarities among observations is gaining attention due to its flexibility and good performances for high-dimensional and non-Euclidean data sequences, which are ubiquitous in this big data era. However, this graph-based framework encounters problems when there are repeated observations in the sequence, which often happens for discrete data, such as network data. In this work, we extend the graph-based framework to solve this problem by averaging or taking union of all possible "optimal" graphs resulted from repeated observations. We consider both the single change-point alternative and the changed-interval alternative, and derive analytic formulas to control the type I error for the new methods, making them fast applicable to large data sets. The extended methods are illustrated on an application in detecting changes in a sequence of dynamic networks over time.
READ FULL TEXT