Hierarchical Quickest Change Detection via Surrogates

03/31/2016 ∙ by Prithwish Chakraborty, et al. ∙ The University of Arizona Virginia Polytechnic Institute and State University 0

Change detection (CD) in time series data is a critical problem as it reveal changes in the underlying generative processes driving the time series. Despite having received significant attention, one important unexplored aspect is how to efficiently utilize additional correlated information to improve the detection and the understanding of changepoints. We propose hierarchical quickest change detection (HQCD), a framework that formalizes the process of incorporating additional correlated sources for early changepoint detection. The core ideas behind HQCD are rooted in the theory of quickest detection and HQCD can be regarded as its novel generalization to a hierarchical setting. The sources are classified into targets and surrogates, and HQCD leverages this structure to systematically assimilate observed data to update changepoint statistics across layers. The decision on actual changepoints are provided by minimizing the delay while still maintaining reliability bounds. In addition, HQCD also uncovers interesting relations between changes at targets from changes across surrogates. We validate HQCD for reliability and performance against several state-of-the-art methods for both synthetic dataset (known changepoints) and several real-life examples (unknown changepoints). Our experiments indicate that we gain significant robustness without loss of detection delay through HQCD. Our real-life experiments also showcase the usefulness of the hierarchical setting by connecting the surrogate sources (such as Twitter chatter) to target sources (such as Employment related protests that ultimately lead to major uprisings).



There are no comments yet.


page 20

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

With the increasing availability of digital data sources, there is a concomitant interest in using such sources to understand and detect events of interest, reliably and rapidly. For instance, protest uprisings in unstable countries can be better analyzed by considering a variety of sources such as economic indicators (e.g. inflation, food prices) and social media indicators (e.g. Twitter and news activity). Concurrently, detecting the onset of such events with minimal delay is of critical importance. For instance, detecting a disease outbreak (Painter et al, 2013) in real time can help in triggering preventive measures to control the outbreak. Similarly, early alerts about possible protest uprisings can help in designing traffic diversions and enhanced security to ensure peaceful protests.


Figure 1: Illustrative example showing surrogate sources which could have led to an early detection of onset of 2013 Brazilian spring protests. (Top) Total protest counts in Brazil over Apr’13 to Aug’13 exhibiting a sharp increase around mid-June. (Middle) Employment and Wages related protests and (bottom) aggregated counts of a clusters of protest related keywords in Twitter.

Desirable Sequential Window- Bayesian Relative Hierarchical HQCD
Properties GLRT Limited Online Density- Bayesian (This
1963 GLRT CPD ratio Analysis of Paper)
1995 1995 2007 Estimation Change
2010 (RuLSIF) Point
2013 Problems
Bounded False
Alarm Rate /
Detection delay
Non-IID data
Table 1: Comparison of state-of-the-art methods vs Hierarchical Quickest Change Detection

Consider the evolution of the Brazilian Spring protests during mid June 2013 which are shown in terms of the total number of protests per week as in Fig. 1 (top panel). This uprising can be further analyzed by looking at individual categories of protests as shown in Fig. 1 (middle panel). As seen, during the Brazilian Spring there was a sharp increase in employment and wages related protests. It is noteworthy that similar observations can be made from Fig. 1 by observing the increase in Twitter activity for protest related keywords (bottom panel) such as “Aborto, Agravar, Central Dos Trabalhadores e Trabalhadoras Do Brasil” during early June. This example leads to the following important observations: there is potentially significant correlated information that can be leveraged to reduce detection delay, and identifying the informative data source(s) can help reduce the false positives. Thus, appropriate usage of such surrogate information can potentially lead to change detection with improved performance as well offer an interpretation behind the cause of the changepoint.

Motivated by the aforementioned observations, we propose Hierarchical Quickest Change Detection (HQCD), for online change detection across multiple sources, viz. target and surrogates. Typically, targets are sources of imminent interest (such as disease outbreaks or civil unrest); whereas surrogates (such as counts of the word ‘protesta’ in Twitter) by themselves are not of significant interest. Thus, HQCD  is aimed towards continuously utilizing both categories, but more focused on early (or quickest) detection of significant changes across the target sources. Traditional event (or change) detection approaches are not suitable for such problems. These are either a) offline approaches (Page, 1954; Wald, 1945; Shewhart, 1925; Carlin et al, 1992) using the entire data retrospectively - thus not applicable to real-time scenarios, or b) online detection approaches (Shiryaev, 1963; Siegmund and Venkatraman, 1995; Lai, 1995; Lai and Xing, 2010; Adams and MacKay, 2007; Liu et al, 2013) with primary focus on the target source of interest and do not utilize other correlated sources. Table 1 shows a comparison of HQCD and several state-of-the-art methods in terms of the desirable attributes.

The main contributions of this paper are:
HQCD formalizes a hierarchical structure which in addition to the observed set of target sources (i.e., ’s), incorporates additional surrogates, denoted by ’s, and encodes propagation of change from surrogate to target sources.
HQCD presents a specialized change detection metric that guarantees a maximum level of false alarm rate while reducing the detection delay in quickest detection framework. In addition, HQCD yields a natural methodology for analyzing the causality of change in a particular target source through a sequence of change propagations in other sources.
HQCD presents a specialized sequential Monte Carlo based change detection framework that along with specialized change detection metrics enables hierarchical data to be analyzed in online fashion.
We extensively test HQCD on both synthetic and real world data. We compare against state-of-the-art methods and illustrate the robustness of our methods and the usefulness of surrogates. We also present a detailed analysis of three protest uprisings using real world data and show that the uprising could have been predicted a few weeks in advance by incorporating surrogate data such as Twitter chatter. Moreover, we analyzed target-surrogate relationships and uncover important propagation patterns that led to such uprisings.

2 Hqcd–Hierarchical Quickest Change Detection

We first provide a brief overview of classical QCD problem and then present the HQCD framework.

2.1 Quickest Change Detection (QCD)

Let us consider a data source changing over time and following different stochastic processes before and after an unknown time (changepoint). The task of QCD is to produce an estimate in an online setting (i.e., at time , only is available). Figure 2 illustrates the two fundamental performance metrics related to this problem. In the figure, is the actual time-point when the changepoint happened. An early estimate such as in the figure leads to a false alarm, where another estimate, such as leads to an ‘additive delay’ of

. The goal of QCD is to design an online detection strategy which minimizes the expected additive detection delay (EADD) while not exceeding a maximum pre-specified probability of false alarm (PFA). QCD has been studied in various contexts. Some of the foremost methods have considered i.i.d. distributions with known (or unknown) parameters before and after unknown changepoints 

(Veeravalli and Banerjee, 2013). Some of the more popular methods have used CUSUM (cumulative sum of likelihood) based tests while more general approaches are adapted in GLRT (generalized likelihood ratio test) based methods (Dessein and Cont, 2013).


Figure 2: Illustration of Quickest Change Detection (QCD): blue colored line represents the actual changepoint at time . (a) declaring a change at leads to a false alarm, whereas (b) declaring the change at leads to detection delay. QCD can strike a tradeoff between false alarm and detection delay.

2.2 Changepoint detection in Hierarchical Data

We next present our approach to generalize QCD to a hierarchical setting. We first describe a generic hierarchical model and then propose the QCD statistics for such models in Section 2.2.2. For computational feasibility, we present a bounded approximate of the same and our multilevel changepoint algorithm in Section 2.2.3.


Figure 3: Generative process for HQCD. As an example consider civil unrest protests. In the framework, different protest types (such as Education- and Housing-related protests) form the targets denoted by ’s. The total number of protests will be denoted by the top-most variable . Finally, the set of surrogates, such as counts of Twitter keywords, stock price data, weather data, network usage data etc. are denoted by ’s.

2.2.1 Generic Hierarchical Model

Let us consider , a set of correlated temporal sequences where, represents the target data sequence for , collected up and until some time . The cumulative sum of the target sources ’s at time is given by , i.e., . Concurrent to target sources, we also observe a set of surrogate sources, , where , for , which may either have a causal or effectual relationship with the target source set (see Figure 3). We assume that targets and surrogates follow a stochastic Markov process as follows:

The binary variables

capture the notion of significant changes in events through changes in distribution of the generative process as follows: if the surrogate source undergoes a change in distribution at some time , then, changes from to . In other words, (respectively ) denotes the pre-change (post-change) distribution of the th surrogate source. Similarly, if the target source undergoes a change in distribution at some time , then changes from to . In other words, (respectively ) denotes the pre-change (post-change) conditional distribution of the th target data source. We denote (respectively

) as the random variable denoting the time at which

(respectively, ) changes from to . Finally, we write , and as the collective sets of changepoints in the surrogate and target sources, respectively. Finally, denote as the changepoint random variable for the top layer, , which represents the sum total of all target sources.

2.2.2 From QCD to HQCD

We extend the concepts of QCD presented in Section 2.1 to multilevel setting by formalizing the problem as the earliest detection of the set of all changepoints, i.e., having observed the target and surrogate sources i.e. . Let be the vector of decision variables for the changepoints. To measure detection performance, we define the following two novel performance criteria:

Multi-Level Probability-of-False-Alarm (ML-PFA):


where for any two length vectors , the notation implies , for . For instance, consider the example of target, and surrogate. Then and , and the probability of multi-level false alarm is given by . This definition of ML-PFA declares a false alarm only if all the change decision variables are smaller than the true changepoints.

Expected Additive Detection Delay (EADD):


Given the observations, i.e., all target and surrogate sources till time governed by unknown changepoints , we aim to make an optimal decision about these changepoints under the following criterion


In other words, is the optimal change decision vector which minimizes the EADD while guaranteeing that the ML-PFA is no more than a tolerable threshold . We note that the above optimal test is challenging to implement for real-world data sets due to following issues: a) it requires the knowledge of pre- and post- change distributions (for all sources) and the distribution of the changepoint random vector , b) unlike single source QCD, finding the optimal requires a multi-dimensional search over multiple sources, making it computationally expensive, and c) it does not discriminate between false alarms across different sources. For instance, declaring false alarm at a target source (such as premature declaration of the onset of protests or disease outbreaks) must be penalized more in comparison to declaring false alarm at a surrogate source (such as incorrectly declaring rise in Twitter activity).

2.2.3 Bounded approximation of HQCD

We can circumvent the problem (b) of the original definition of ML-PFA as given in equation 1 by upper bounding it in Theorem 2.1.

Theorem 2.1 (Modified-PFA).

Let be the a set of estimates about true changepoint for targets, surrogates and sum-of-targets, respectively. Then under the condition of greater importance to accurate target layer detections, ML-PFA (see 1) is upper-bounded by Modified-PFA, where:


We can prove the upper bound of ML-PFA with the following reductions:


where and follows from the union bound on probability and follows from the fact that the joint probability of a set of events is less than the probability of any one event, i.e., , for any , and then taking the minimum over all . The resulting upper bound in (5) leads to the basis of the modification of the multi-level PFA:

Modified-PFA expression leads to intuitive interpretations as follows: (i) as false alarms at targets can have a higher impact, it is desirable to keep the worst case PFA across these to be the smallest, or equivalently, should be minimized. (ii) false alarms at surrogates are not as impactful and we can declare a false alarm if all of the surrogate level detection(s) are unreliable, or equivalently, needs to be minimized. (iii) notably, the above modification leads to a low-complexity change detection approach across multiple sources by locally optimal detection strategies avoiding a multi-dimensional search.

Based on Modified-PFA, we next present a compact test suite to declare changes at pre-specified levels of maximum PFA as given in Theorem 2.2 and incorporate specificity issues pointed out in problem (c) of the original formulation of PFA.

Theorem 2.2 (Multi-level Change Detection).

Let be the true change point random variable for the th target source, . Let and represent the same for the th surrogate and the sum-of-targets, respectively. Let the data observed till time be and denote the estimate of the conditional distribution (see Section 3.2). Then, if represent the PFA thresholds for the , the changepoint tests can be given as:



is the test statistic (TS) for a source



In quickest change detection, our goal at time is to decide if a change should be declared for some for a particular data source. To this end, we can use the following change detection test

which is equivalent to the following test:


Intuitively, the above test declares the change for the th target source at the smallest time

for which the test statistic (i.e., posterior probability of the change point random variable being less than

) exceeds a threshold. The probability of false alarm for the above test can be bounded in terms of the threshold as:


where follows from the fact that given the observed data and the event, , i.e., the change is declared at , then it follows from (equation 7) that

Let us denote the test statistic (TS) for a data source as:

Then, then the multi-level change detection test is:

From Theorem 2.2, we can infer the following boundedness property of Modified-PFA as expressed in the following Lemma.

Lemma 2.3.

If we define and , then Modified-PFA in equation 4 can be bounded as:



3 HQCD for Protest Detection via Surrogates




Figure 4: Histogram fit of (a) surrogate source (Twitter keyword counts) and (b) target source (Number of protests of different categories), for various temporal windows, under i.i.d. assumptions. These assumptions lead to satisfactory distribution fit, at a batch level, for both sources. The top-most row corresponds to the period before the Brazilian spring (pre 2013-05-25), the second row is for the period 2013-05-25 to 2013-10-20, and the third is for the period after 2013-10-20. The last row shows the fit for the entire period. These temporal fits are indicative of significant changes in distribution along the Brazilian Spring timeline, for both target and surrogates.

In this section we discuss the HQCD framework for early detection of protest uprisings via surrogate sources. Protests can happen in civil society for various reasons such as protests against fare hike or protests demanding more job opportunities. Such protests, especially major changes in protest base levels, are potentially interlinked. However explaining such interactions is a non-trivial process. Ramakrishnan et al (2014) found several social sources, especially Twitter chatter, to capture protest related information. We apply HQCD to find significant changes in protests concurrent to changes in Twitter chatter, such that detecting changes accurately are of primary importance in contrast to the chatters which can be influenced by a range of factors, including protests. In general, HQCD can be applied in similar events, such as disease outbreaks, to find significant changes in targets using information from noisy surrogates.

3.1 Hierarchical Model for Protest Count Data

For the case of protest uprisings, we first note that surrogate sources such as Twitter are in general noisy and involve a complex interplay of several factors - one of which could be protest uprisings. Furthermore, for protest uprisings, we are more concerned in using the surrogates (Twitter chatter) to help declare changes at target level (protest counts) than accurately identifying the changes in surrogates. Thus, without loss of generality, we model the surrogates as i.i.d. distributed variables. Figure 4) evaluates the i.i.d. assumptions, for both protest counts and Twitter chatter. Our results indicate that Log-normal is a reasonable fit for Twitter chatter.

Surrogate Sources: Formally, we assume that the surrogate source is generated i.i.d. from a distribution w.r.t to the associated changepoint as:


where, and are the pre- and post-change parameters. Following our earlier discussion, we select as Log-normal (with location and scale parameters ) for Twitter counts.

Target Sources: Target sources can in general be dependent on both the past values of targets as well as the surrogates. Here, we restrict the target source process to be a first order Markov process. Under this assumption, we formalize the target source to follow a Markov process w.r.t to its changepoint as:


where, and are the pre- and post-change parameters of the process. Poisson process with dynamic rate parameters has been shown (Carlin et al, 1992) to be effective in specifying hierarchical count data w.r.t changepoints. Here, we model the rate parameters as a nested autoregressive process (Fokianos et al, 2009; Carlin et al, 1992) given as:


Here, captures the latent rate and

denotes the error variance.

captures the variation due to the observed values of target and surrogates sources.

Changepoint Priors: Following our prior discussion, surrogate changepoints can be assumed to have an uninformative prior and we model via a memoryless arrival distribution (static probability of observing change given it hasn’t occurred earlier) as:


Conversely, target changepoints can be influenced by surrogate changepoints as their generative process is dependent on the surrogates. Specifically, whenever we observe a changepoint in the surrogates, we assume that the base rate of changepoint for a target to increase for a certain period of time. Formally, target changepoint priors are assumed to follow a dynamic process as:


where, is the indicator function. represents the nominal base rate for the changepoint. It can be seen, a change in the th surrogate source is modeled as an exponentially decaying ‘impulse’ of amplitude . The summation of targets, is known deterministically given . Moreover, given , can be considered to be summation of independent Poisson processes following similar dynamics as equation 13 which is omitted due to limited space. Similarly, relationships for dependence of can be modeled to be dependent on similar to equation 15.

3.2 Changepoint Posterior Estimation

Algorithm LABEL:al:main involves posterior estimation of the changepoints given the data at a particular time point. Earlier works have focused mainly on offline methods such as Gibbs Sampling (Carlin et al, 1992)

. Online posterior estimation for such problems have been studied extensively in the context of Sequential Bayesian Inference 

(Casella and Berger, 2002)

such as Kalman filters 

(Kalman, 1960; Simon, 2010; Anderson, 2001) (Gaussian transitions) and Particle Filters (Del Moral, 1996; Pitt and Shephard, 1999; Doucet and Johansen, 2009). Recently, Chopin et al.  (Chopin et al, 2013) proposed a robust Particle Filter, SMC which is ideally suited for fitting the parameters of the non-linear hierarchical model described in Section 3.1. In this section we formulate a Sequential Bayesian Algorithm that makes the HQCD tractable under real world constraints (see Figure 5).


Figure 5: Computation time for one complete run of changepoint detection (in mins) on a 1.6 GHz quad core 8gb intel i5 processor: Gibbs sampling (Carlin et al, 1992) vs HQCD vs HQCD without surrogates. Gibbs sampling computation times are unsuitable for online detection.

To find the posterior at any time using SMC we first cast the model parameters and variables into the following three categories:

Observations : In the context of SMC these are the parameters that correspond to observed variables at each time point . For HQCD we can model as:


Hidden States : SMC estimates the observations based on interaction with hidden states which are dynamic, unobserved and is sufficient to describe at . For HQCD, we can express as follows:


Static Parameters : Finally, SMC also accommodates the concept of static parameters which do not change over time such as the base probabilities of changepoint and the noise matrix in HQCD. We can express as:


For a given set of such parameters, SMC works by first generating samples of using the prior distribution . For each of these samples of , SMC samples samples of from its prior . Following standard practices, we use conjugate distributions (Casella and Berger, 2002) for the priors.

algocf[htbp]     At each time point , the samples are perturbed using the model equations given in Section 3.1 and associated with weights to estimate the joint posteriors as:


where, is the Kronecker-delta function. Algorithm LABEL:al:smc outlines the steps involved in this process. For more details on SMC see Appendix.

4 Experiments

We present experimental results for both synthetic and real-world datasets, and compare HQCD against several state-of-the-art online change detection methods (see Table 1), specifically, GLRT (Siegmund and Venkatraman, 1995), W-GLRT (Lai and Xing, 2010), BOCPD (Adams and MacKay, 2007) and RuLSIF (Liu et al, 2013). To further analyze the effects of surrogates in detecting changepoints, we compare against HQCD without surrogates, where is dropped from equation 13 and is made static (i.e. independent of changepoints from surrogates) in equation 15.

29 7 10 13 36 7 33 4 32 3
6 11 5 14 8 16 10 28 22 8 2 9 3
24 7 16 15 29 5 22 - 26 2
26 5 11 11 38 12 27 1 31 5
47 40 15 8 26 - 50 3 55 8
Table 2: (Synthetic data) comparing true changepoint () for targets against detected changepoint () by HQCD against state-of-the-art methods for false alarm (FA) and additive detection delay (ADD). Each row represent a target and best detected changepoint is shown in bold whereas false alarms are shown in red.


Figure 6: Comparison of HQCD against state-of-the-art on simulated target sources. X-axis represents time and Y-axis represents actual value. Solid blue lines refer to the true changepoint, solid green refers to the ones detected by HQCD and brown refers to HQCD without surrogates. Dashed red, magenta, purple and gold lines refer to changepoints detected by RuLSIF, WGLRT, BOCPD and GLRT, respectively. HQCD shows better detection for most targets with low overall detection delay and false alarms.

4.1 Synthetic Data

In this section, we validate against synthetic datasets with known changepoint parameters. For this, we pick 5 targets () and 10 surrogates (

). The surrogates were generated from i.i.d. Log-normal distributions (see equation 

11) while the targets were generated using Poisson process (see equation 12

). The changepoints for surrogates were sampled from a fixed Gamma distribution (see 

14) while the associated changepoints for target sources were simulated via equation 15.

4.1.1 Comparisons with state-of-the-art

As true changepoints are known for the synthetic dataset, we can compare HQCD against the state-of-the-art methods for the detected changepoint as shown in Figure 6. Table 2 presents the results in terms of the false alarm (FA) and additive detection delay (ADD). From the table, we can see that HQCD is able to detect the changepoints with fewer false alarms. Also HQCD has the lowest delay across all methods for all targets except Target-1 for which HQCD without surrogates achieved better delay indicating the surrogates are not informative for this target source.

4.1.2 Usefulness of Surrogates

Our comparisons with the state-of-the-art shows significant improvements that were achieved by HQCD, both in terms of FA and ADD and showcase the importance of systematically admitting surrogate information to attain a quicker change detection with low false alarm. We compare HQCD with surrogates against HQCD without surrogates (Table 2) and find that admitting surrogates significantly improves average delay ( compared to ). We also plot the average false alarm rate against the detection delay in Figure 7 and find that HQCD results are in general the ones with the best tradeoff between FA and ADD.


Figure 7: False Alarm vs Delay trade-off for different methods. HQCD shows the best trade-off.


(a) Civil Unrest Protests


(b) Twitter chatter
Figure 8: Illustration of civil unrest data: (a) shows an example where relevant news articles (top) are scanned to produce an annotated dataset of protest activities. (b) Geo-fenced Twitter data(top). Twitter chatter can uncover various socio-political factors, some of which could be civil unrest events(bottom).

4.2 Analysis of Protest Uprisings

In real-life scenarios, the true changepoint is typically unknown. One representative example could be seen w.r.t. the onset of major civil unrest related protests and uprisings. We present a detailed analysis of three major uprisings: (i) in Brazil around mid 2013 (often termed as the Brazilian Spring), (ii) in Venezuela around early 2014 and, (iii) in Uruguay around late 2013. We first describe the data collection procedure (Figure 8) and followup with a comparative analysis of detected changepoints.

Weekly counts of civil unrest events from Nov. 2012 to Dec. 2014 were obtained as part of a database of discrete unrest events (Gold Standard Report - GSR) prepared by human analysts by parsing news articles for civil unrest content. Among other annotations, the GSR also classifies each event to one of 6 possible event types based on the reason (‘why’) behind the protest. Each of these event types such as a) Employment and Wages, b) Housing, c) Energy and Resources, d) Other government, e) Other economic and f) Other, bears certain societal importance. We treat the weekly counts of each of these event-types as target sources () and the sum total of all protests for a week as the sum-of-targets (). We also collected geo-fenced tweets for each country over the same time-period. We used a human-annotated dictionary of 962 such keywords/phrases that contains several identifiers of protest in the languages spoken in the countries of interest (similar to Ramakrishnan et.al. (Ramakrishnan et al, 2014)

). As most of these keywords could have similar trends, we cluster them using k-means into 30 clusters (i.e., we have

surrogates). To account for scaling effects while preserving temporal coherence, each keyword time-series was normalized to zero-mean and unit variance.

Brazil Employment & Wages 02/10 03/17 06/16 05/26 08/18 4
Energy & Resources 02/10 03/17 06/09 05/19 06/02 6
Housing 03/24 03/31 07/28 05/19 06/16 8
Other Economic 03/24 03/24 06/23 05/19 06/30 5
Other Government 02/17 06/23 04/07 05/19 06/16 4
Other 03/03 03/17 06/30 05/19 06/23 6
All 02/17 04/28 05/19 06/16 06/16 8
Venezuela Employment & Wages 01/14 01/13 01/28 01/25 01/27 3
Energy & Resources 01/20 01/11 02/28 01/20 02/24 7
Housing - - - - - -
Other Economic 01/31 01/31 01/28 - 01/27 9
Other Government 01/22 01/11 02/03 01/20 02/10 4
Other 01/14 01/12 01/25 01/30 01/24 5
All 01/26 01/11 01/30 01/20 02/12 3
Uruguay Employment & Wages 12/06 12/08 12/13 12/03 12/10 3
Energy & Resources 12/04 12/05 12/10 - 12/09 4
Housing 12/21 12/06 11/30 - 11/28 2
Other Economic 12/20 12/06 - - 11/26 2
Other Government 11/25 12/05 12/16 11/29 12/15 3
Other 12/05 12/09 12/03 - 01/14 10
All 12/05 12/09 12/03 11/29 12/10 3
Table 3: (Protest uprisings) Comparison of HQCD vs state-of-the-art with respect to detected changepoints


(a) Brazil Total Protests


(b) Venezuela Total Protests


(c) Uruguay Total Protests
Figure 9: Comparison of detected changepoints at the sum-of-targets (all Protests). HQCD detections are shown in solid green while those from the state-of-the-art methods i.e. RuLSIF (red), WGLRT (magenta), BOCPD (purple) and GLRT (gold) are shown with dashed lines. HQCD detection is the closest to the traditional start date of Mass Protests in the three countries studied .


(a) Brazil Subtypes


(b) Venezuela Subtypes


(c) Uruguay Subtypes
Figure 10: Comparison of detected changepoints at the target sources (Protest types) HQCD detections are shown in solid green while those from the state-of-the-art methods i.e. RuLSIF (red), WGLRT (magenta), BOCPD (purple) and GLRT (gold) are shown with dashed lines.

4.2.1 Changepoint Across layers

We show the changepoints detected by HQCD (bold green) and the state-of-the-art methods (dashed lines) for the sum-of-all protests in Figure 9 and individual protest types in Figure 10. We can observe that HQCD, which uses the surrogate information sources and exploits the hierarchical structure, finds indicators of changes which are visually better as well as more aligned to the dates of major events (See demo at https://prithwi.github.io/hqcd_supplementary). In contrast, the state-of-the-art methods can be argued to show significantly high false alarm rate. For such real world data sources, the notion of a true changepoint is difficult to ascertain, we can instead consider for example the onset of Brazilian spring protests (2013-06-01) as an underlying changepoint to compare at the sum-of-targets and interpret notions of false alarm. Table 3 tabulates these inferences for the targets as well as the sum-of-targets. Although, a true changepoint is unknown, we note that for HQCD, the expected additive detection delay (EADD) can be estimated according to equation 2 (from in Algorithm LABEL:al:smc).


(a) Influence of lagged targets on current targets


(b) Influence of lagged surrogates on current targets
Figure 11: (Brazilian Spring) Heatmap of changepoint influences of targets on targets (a); and surrogates on targets (b). Darker (lighter) shades indicate higher (lesser) changepoint influence. (a) shows presence of strong off-diagonal elements indicating strong cross-target changepoint information. (b) shows a mixture of uninformative and informative surrogates.

4.2.2 Changepoint influence analysis

The experiments presented in the previous section can be further analyzed to ascertain the nature of progression of significant events that lead to a protest. Here we present our analysis for Brazilian Spring. As a preliminary step, from Table 3 we can see that the detected changepoints for Brazil reveal an interesting progression - significant changes in Energy related unrests (06/02) propagated to Housing/Other Govt. unrests (06/16) and culminated in mass Employment related unrests (08/18). Interestingly, we can analyze the fitted parameters of the weight vector of the rate updates (see 13) to quantize the changepoint influence of a source (target/surrogate) at time to time . For each target , we can compute the average value of the weight vector component of each target/surrogate separately. Let and denote these averages for one such source. Effectively, then measures the effect of the source at time on at before change while captures the same post change. Their percentage relative change can then be used as a measure of the changepoint influence of a particular target/surrogate source on . We plot a heatmap of these percentages in Figure 11 for both targets and surrogates, separately. From Figure 10(a), we can see that ‘Other Economic’ and ’Employment’ related protests had strong influences from ‘Housing’ related protests. Furthermore, from Figure 10(b) we can see ‘Housing’ and ‘Employment’ related protests were influenced by similar Twitter chatter clusters (cluster-01 and cluster-26) - indicating that the interaction between these protest subtypes can be inferred from social domain. Conversely, ‘Housing’ and ‘Other Economic’ related protests are only weakly correlated through Twitter chatters - thus exhibiting the robustness of HQCD which can still detect interactions between targets when surrogates fail to explain the same. In general, for a particular target we can see linked pre-cursors in other targets (strong off-diagonal elements in Figure 10(a)) and highly specific informative surrogates (few strong cells for a row in Figure 10(b)).

5 Conclusion

We have presented HQCD, a framework for online change detection in multiple data sources which can augment additional surrogate information sources in a hierarchical manner. Key properties of our framework are the following a) it is computationally inexpensive requiring a search over local change points (for each data layer) making it applicable for a large number of data/surrogate sources, b) the change detection algorithms are readily tunable to account for different false alarm requirements at different data layers, and c) it provides a systematic approach to integrate surrogate information for the same. As shown through a variety of experiments on both synthetic and real world data sets, the proposed approach uncovers interesting relationships and significantly outperforms state-of-the-art methods which do not account for surrogate information both in terms of event detection reliability (probability of false alarm) as well as the delay in detection.

Supporting Information A demo of HQCD and the datasets used in this paper can be found in https://prithwi.github.io/hqcd_supplementary. Attached appendix provides additional details on SMC.

Acknowledgements Supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center (DoI/NBC) contract number D12PC000337, the US Government is authorized to reproduce and distribute reprints of this work for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoI/NBC, or the US Government.


  • Adams and MacKay (2007) Adams RP, MacKay DJ (2007) Bayesian Online Changepoint Detection. arXiv preprint arXiv:07103742
  • Anderson (2001) Anderson JL (2001) An Ensemble Adjustment Kalman Filter for Data Assimilation. Monthly weather review
  • Carlin et al (1992) Carlin BP, Gelfand AE, Smith AF (1992) Hierarchical Bayesian analysis of Changepoint Problems. Applied statistics
  • Casella and Berger (2002) Casella G, Berger RL (2002) Statistical Inference. Duxbury Pacific Grove, CA
  • Chopin et al (2013) Chopin N, Jacob PE, Papaspiliopoulos O (2013) SMC2: An Efficient Algorithm for Sequential Analysis of State Space Models. Journal of the Royal Statistical Society: Series B (Statistical Methodology)
  • Del Moral (1996) Del Moral P (1996) Non-linear Filtering: Interacting Particle Resolution. Markov processes and related fields
  • Dessein and Cont (2013) Dessein A, Cont A (2013) Online Change Detection in Exponential Families with Unknown Parameters. In: Geometric Science of Information, Lecture Notes in Computer Science
  • Doucet and Johansen (2009)

    Doucet A, Johansen AM (2009) A Tutorial on Particle Filtering and Smoothing: Fifteen Years Later. Handbook of Nonlinear Filtering

  • Fokianos et al (2009) Fokianos K, Rahbek A, Tjøstheim D (2009) Poisson Autoregression. Journal of the American Statistical Association
  • Kalman (1960) Kalman RE (1960) A New approach to Linear Filtering and Prediction Problems. Journal of Fluids Engineering
  • Lai (1995) Lai TL (1995) Sequential Changepoint Detection in Quality Control and Dynamical Systems. Journal of the Royal Statistical Society Series B (Methodological)
  • Lai and Xing (2010) Lai TL, Xing H (2010) Sequential Change-point Detection when the pre-and post-change Parameters are Unknown. Sequential Analysis
  • Liu et al (2013)

    Liu S, Yamada M, Collier N, Sugiyama M (2013) Change-point Detection in Time-series Data by Relative Density-Ratio Estimation. Neural Networks

  • Page (1954) Page E (1954) Continuous Inspection Schemes. Biometrika
  • Painter et al (2013) Painter I, Eaton J, Lober B (2013) Using Change Point Detection for Monitoring the Quality of Aggregate Data. Online journal of public health informatics
  • Pitt and Shephard (1999) Pitt MK, Shephard N (1999) Filtering via simulation: Auxiliary particle filters. Journal of the American statistical association
  • Ramakrishnan et al (2014) Ramakrishnan N, Butler P, Muthiah S, et al (2014) ‘Beating the News’ with EMBERS: Forecasting Civil Unrest Using Open Source Indicators. In: Proceedings of the 20th ACM SIGKDD
  • Shewhart (1925) Shewhart WA (1925) The Application of Statistics as an Aid in Maintaining Quality of a Manufactured Product. Journal of the American Statistical Association
  • Shiryaev (1963) Shiryaev AN (1963) On Optimum Methods in Quickest Detection Problems. Theory of Probability & Its Applications
  • Siegmund and Venkatraman (1995) Siegmund D, Venkatraman E (1995) Using the Generalized Likelihood Ratio Statistic for Sequential Detection of a Change-point. The Annals of Statistics
  • Simon (2010) Simon D (2010) Kalman Filtering with State Constraints: A Survey of Linear and Nonlinear Algorithms. IET Control Theory & Applications
  • Veeravalli and Banerjee (2013) Veeravalli VV, Banerjee T (2013) Quickest Change Detection. Academic Press Library in Signal Processing: Array and Statistical Signal Processing
  • Wald (1945) Wald A (1945) Sequential tests of Statistical Hypotheses. The Annals of Mathematical Statistics

Appendix A Sequential Bayesian Inference

Consider a stochastic process where an observed temporal data sequence depends on unobserved latent states such that the following formulation holds:


i.e. depends only on the current estimate of the state . On the other hand, depends only on , thus exhibiting a first-order Markov property. denotes the set of parameter for the described process which are constant over time. For some , , describe the observation probability and the state transition probability, respectively. is the prior distribution for the static parameter while is the same for given a particular . Typically, at any time point the observation values are known but the latent states and the parameter are unknown. The problem of interest is then to estimate the posterior probability

This problem has been studied extensively in the context of Sequential Bayesian Inference (Casella and Berger, 2002). Kalman filters (Kalman, 1960), a class of such algorithms, are very popular when and describe linear Gaussian transitions. There have been efforts (Simon, 2010; Anderson, 2001) at relaxing these restrictions using methods such as Taylor series expansion and ensemble averages. However, for arbitrary forms of , Sequential Monte Carlo and more specifically Particle Filters are more popular. Particle Filters (Del Moral, 1996) estimate the posteriors using a large number of Monte Carlo samples from the observation and state transition models. At any time , these algorithms only need to draw new samples for time using data from only . Thus these methods are ideally suited for online learning. Standard Particle Filters are known to suffer from premature convergence (particle degeneracy) (Doucet and Johansen, 2009) or unsuitable for unknown static variables (Pitt and Shephard, 1999; Doucet and Johansen, 2009) Recently, Chopin et al.  (Chopin et al, 2013) proposed a hybrid Particle filter which interleaves Iterated batch resampling with particle filter updates to handle both static and state parameters. Given an observed sequence , SMC can be used to find the best posterior fit of the static and state parameters as given below:

a.1 Smc algorithm traces

We present the traces of the SMC algorithm below. For a more detailed treatment of the same (including theoretical proofs of convergence) we ask the readers to refer to (Chopin et al, 2013).

SMC typically starts with two parameters: (a) - the number of static parameters sampled from the prior of and (b) - the number of particles of initialized for each .

Then the Algorithm can be given as follows:

  1. Sample number of

  2. run the following particle filter

    1. Initialization:

      1. Auxiliary variable:

      2. State Proposal:

      3. Weight Update:

      4. Observation probability:

  3. Update Importance weights:

  4. Under degeneracy criterion:
    Move particles using Kernel

  5. Weight Exchange:

Here, is a Markov kernel Targeting the posterior distribution. It can be shown that such Markov moves don’t change the target distribution and can alleviate the problem of particle degeneracy.

a.2 Smc priors

We used conjugate distributions to model the priors. For, we used a mixture of Latin hypercube sampling (

) and conjugate priors as follows:


Similar to , we model the initial distribution via LHS sampling for the base values and by using the model equations as presented in Section 3.1. as follows:


The parameters of the distributions of and

are called hyperparameters in the general domain of Bayesian Inference and following standard practices are found via cross-validation.