Time series data that consists of a sequence of observations collected at regular intervals of time is ubiquitous to many real-world applications. Mining relationships in time series data is of immense interest to several disciplines such as neuroscience, climate science, and transportation. For example, in climate science, relationships are studied between time series of physical variables such as Sea Level Pressure, temperature, etc., observed at different locations on the globe. Such relationships, commonly known as ‘teleconnections’ capture the underlying processes of the Earth’s climate system [kawale2013graph]. Similarly, in neuroscience, relationships are studied between activities recorded at different regions of the brain over time [atluri2016brain, atluri2015connectivity]. Studying such relationships can help us improve our understanding of real-world systems, which in turn, could play a crucial role in devising solutions for problems such as mental disorders or climate change.
Most of the existing work on mining time series relationships assume the relation to be present for the entire duration of the two time series. However, many interesting relationships in real-world applications often are intermittent in nature, i.e., they are highly prominent only in certain sub-intervals of time and absent or occur feebly in rest of the sub-intervals. As a motivating example, consider a pair of monthly Sea Level Pressure anomaly time series during 1979-2014 in Figure 0(b) that are observed at two distinct regions on the globe in Figure 0(a)). The full-length correlation between the two time series is -0.25. However, as shown in the lower panel of Figure 0(b), there exists multiple sub-intervals where the correlation between the two time series is stronger than -0.7. As we discuss later in Section 5.5, this example is the outcome of a well-known climate phenomena called ENSO (El Nino Southern Oscillation) [glantz2001currents], that is characterized by negative correlations between the surface temperatures observed near Australia and Pacific Ocean [glantz2001currents] and is known for impacting various weather events such as floods, droughts, and forest fires [siegert2001increased, ward2014strong]. The sub-intervals shown in the lower panel correspond to the two extreme phases, ‘El-Nino’ and ‘La-nina’, of ENSO, when its impact on global climate is amplified. Similar examples are also known to exist in other domains such as neuroscience, [atluri2014discovering] and stock market data [li2016efficient].
Inspired by such examples, we propose to formally study sub-interval relationship (SIR) between two time-series. Studying SIRs in time series data poses several challenges. First, we need to formally define the notion of an SIR and devise necessary interestingness measures to characterize them. Second, given a pair of time series, a computationally efficient method needs to be designed in order to find all the sub-intervals that show strong relationships. Third, the validity of discovered relationships needs to be evaluated.
For a given pair of time series, we define SIR as the set of all the disjoint time intervals each of which are sufficiently long and capture strong relationships between the two time series. The constraint on the length of the interval is important to filter out very small intervals that are more likely to capture spurious signals. An SIR with longer and multiple such intervals can be treated to be more interesting. Thus, the interestingness of an SIR could be measured as the total sum-length of all included intervals.
The next challenge is then to obtain most interesting SIR in a given pair of time series. As we describe later in Section 4.1, the above problem can exactly be solved by a standard dynamic programming (DP) approach. However, the time complexity of this approach is quadratic in the length of time series, which makes it computationally infeasible for long time series. To the best of our knowledge, none of the existing work in the literature solves our exact problem formulation and are therefore not applicable (discussed further in Section 3). In this work, we propose a novel and efficient approach called Partitioned Dynamic Programming(PDP) to find the most interesting SIR in a given pair of time series. The basic idea of our approach is to partition the original problem into multiple sub-problems, each of which could be independently solved by standard dynamic programming methods. We show that our approach is guaranteed to find the optimal solution and has time complexity that is practically linear in the length of the time series. Finally, to evaluate the validity of discovered SIRs, we propose a randomization based procedure to assess their statistical significance.
2 Definitions and Problem Formulation
2.1 Definitions and Notations
We start with presenting some basic terms and notations that will be used throughout the paper.
Time series is a sequence of observations made at consecutive and regular timestamps.
Time interval represents a set of consecutive timestamps . Time interval with a single timestamp is given by .
Length of a time interval represents the number of timestamps included in the interval and denoted by . The length of single timestamp is equal to 1 and is denoted by .
Segment of a time series in a time interval contains all the observations of corresponding to the timestamps of given interval.
Full-length relationship between two time series and refers to the strength of relationship computed using all the observations of the two time series and is denoted by .
Local relationship between two time series and in a time interval refers to the strength of relationship computed using all the observations in the time interval and is denoted by . For brevity, we also denote it by . Certain relationship measures (e.g., Euclidean distance) could be computed over singleton timestamp , which would be denoted by .
Following these ideas, we present our formal definition of sub-interval relationships (SIR) between a pair of time series.
A Sub-Interval Relationship (SIR) between two time series and refers to the set of non-overlapping time intervals such that every interval in
captures strong relationship, i.e.
is of length at least , i.e.
where and are user-specified thresholds.
The choice of thresholds depends on the type of SIRs that are of interest to a user. For instance, setting higher and lower results in SIRs with longer intervals of mild relationships and vice-versa.
Note that our definition of SIR is quite general and not specific to any relationship measure. Further, the above definition could also be used with distance measures (e.g. euclidean distance) except that the inequality constraint in the first bullet flips, i.e. .
2.2 Problem Formulation
Our next goal is to capture the most interesting SIR for a given pair of time series. Intuitively, an SIR is likely to be more interesting if the set of selected intervals covers a large fraction of the timestamps. Therefore, we propose to measure interestingness of an SIR as the sum-length (), which is equal to sum of lengths of all the selected time intervals. Then the problem require us to find the set of long and strong time-intervals with maximum sum-length.
Formally, for a given pair of time series , our goal is to determine the most interesting SIR as the set of time-intervals with maximum sum-length such that for every interval , , and .
where and are user-specified thresholds. In the remainder of the paper, we will refer to the set with maximum sum-length as the ‘optimal set’.
3 Related Work
Learning the similarity between two time series has been well studied in many different settings. However, quite surprisingly, our problem formulation has never been studied in any of the existing works. The most prevalent work has been done in designing various similarity measures (e.g., euclidean distance, Pearson correlation, dynamic time warping) for analyzing full-length time series [kawale2013graph, keogh2002exact, liao2005clustering]. Another part of the related work goes into devising various largest common subsequence (LCS) matching problems [das1997finding, chen2007spade, faloutsos1994fast]. Other related works focus on all-pairs-similarity-search and motif discovery [yeh2016matrix, zhu2016matrix] in which the fundamental problem is to efficiently find all the subsequences of a time series that are the closest match (or match within a threshold) to a given query sequence, that represents a pattern of high interest (e.g. motif) or obtained as a subsequence from another time series. The key difference between LCS/motif-discovery and our problem formulation is that the former allows matching between observations of two time series at different timestamps by compromising on the computational complexity. In contrast, we restrict ourselves to lock-step relationship measures (i.e. time-point in a time series can be compared with only time-point in other time series) that can be solved much more efficiently and has direct relevance to various domains including climate science and neuroscience.
The closest setting related to our work was formulated in [li2013discovering, li2016efficient], where a longest subsequence of atleast threshold (overall) correlation was considered but disregards any constraints on the sub-intervals. Like ours, their work also stress on the importance of efficiently computing the subsequence correlation but rely heavily on bringing down the cost of constant factors involved in correlation computation and therefore, still bounded byliang2011improved, kim2006segmental]. However, such approaches often lack optimal guarantees due to the non-convexity of the objective function.
Our problem formulation can potentially be solved by two approaches: (i) a classical approach based on the dynamic programming, and (ii) our proposed approach – Partitioned Dynamic Programming, that is an extension of the classical dynamic programming. We now elaborate the two approaches in further details.
4.1 Classical Approach: Dynamic Programming
The problem of finding the optimal set can be treated as a classical DP problem of weighted interval scheduling [kleinberg2005algorithm] where the goal is to determine a schedule of jobs such that no two jobs conflict in time and the total sum of the weights of all the selected jobs is maximized. In our problem, we can treat every time-interval that meets the minimum strength and length criteria as a job and the interval length as the weight of the job. We could then use DP to find the set of intervals with maximum possible sum-length.
The key condition for dynamic programming is to decompose a larger problem into sub-problems such that each sub-problem can further be solved recursively. Let denote the optimal set to the problem spanning timestamps , and let denote the sum-length of . Then the recursive relation between sum-lengths of the optimal sets of different sub-problems is given as,
In other words, the optimal set is either exactly same as or can be obtained by adding the interval to the optimal set for one of the values of . By applying above recursions, the optimal sets for all the sub-problems and the original problem can be obtained as described in Algorithm 1.
4.1.1 Time Complexity
For any pair of time series, dynamic programming needs to solve all the sub-problems of sizes , each of which takes time, where is the length of the sub-problem. Therefore, both the average-case and worst-case time complexity of dynamic programming is in time.
4.2 Proposed Approach (Partitioned Dynamic Programming)
As discussed above, the time complexity of DP is , which could be computationally infeasible for longer time series. A potential approach to reduce computational cost could be to partition the original problem into multiple sub-problems of constant sizes and solving each of them independently using DP. The optimal set to the original problem could then be obtained by taking union of the optimal sets obtained for all sub-problems. The computational cost of this approach would depend on the sizes of the different sub-problems. If the size of each sub-problem is smaller than a constant , then the computational cost would be , which would be faster than DP by an order of . However, a key challenge in this approach is to partition the problem prudently such that no interesting interval gets fragmented across the two partitions, otherwise it could potentially get lost if its fragmented parts are not sufficiently long or strong enough to meet the user-specified thresholds.
To this end, in this section we propose a novel approach called Partitioned Dynamic Programming (PDP) that is significantly efficient than Dynamic Programming (DP) and is guaranteed to find the optimal set. PDP follows the above idea and breaks the original problem into multiple sub-problems such that each one of them can be solved by using DP independently. The key step in PDP is to identify safe points of partition, where the problem could be partitioned without compromising on the optimality of the solution. However, we would need to mention that unlike DP, PDP is applicable only to those relationship measures that satisfy the following three properties:
Property 1: The relationship measure could be computed over a single timestamp.
Property 2: If is known, then and could be computed in constant time.
Property 3: For a given pair of time series, let and be two adjacent time-intervals, , , and , , then .
The above three properties are satisfied by various measures that we discuss in more detail in section 4.3.
From Property 3, it follows that an interval formed by union of two adjacent weak intervals and could never be strong. Thus, a timestamp can be considered as a ‘point of partition’ if:
none of the intervals ending at are strong, i.e. . We refer to this condition as left-weakness condition.
none of the intervals beginning from are strong, i.e. . We refer to this condition as right-weakness condition.
The two conditions above ensure that all the intervals ending at or beginning from are weak, therefore no strong interval that subsumes could possibly exist. Therefore, no interesting interval would be in danger of getting fragmented, if the problem is partitioned at . Following this idea, we propose a partitioning scheme to find the points of partition before applying dynamic programming module to each of the partitions.
As summarized in Algorithm 2, PDP comprises of three major steps: In step 1, we find all timestamps such that they satisfy the left-weakness property. In step 2, we identify all the timestamps that satisfy the right-weakness property. Finally, in step 3, all the timestamps that satisfy both left-weakness and right-weakness are considered as the points of partition. The original problem is then partitioned at the obtained points of partition and the resulting sub-problems are solved independently using the dynamic programming module described in Algorithm 1.
We next explain the two procedures used in Step 1 and Step 2 to obtain points that satisfy the left-weakness and right-weakness respectively.
4.2.1 Finding timestamps with left-weakness
To find timestamps with left-weakness, we perform a left-to-right scan of timestamps as summarized in Algorithm 3. We begin our scan from the leftmost timestamp to find the first timestamp s such that is strong, i.e. . We next show that all the timestamps will satisfy left-weakness using the following lemma. Consider a timestamp that satisfies left-weakness. If , then would also satisfy left-weakness. (Proof in supplementary) Since there are no timestamps to the left of first timestamp, it trivially satisfies left-weakness. By recursively applying Lemma 4.2.1, to timestamps , we get each of them to satisfy left-weakness.
We then continue our scan beyond to find the first timestamp such that is weak, i.e. (lines 15-21). This also means that for all the timestamps , interval is strong, and therefore violates left-weakness. We next show that timestamp satisfies the left-weakness as follows,
Consider a set of timestamps such that , while . If satisfies left-weakness, then timestamp would also satisfy left-weakness. (Proof in supplementary)
We further continue our scan and repeat the above steps (lines 6-20) to find all the timestamps that satisfy left-weakness. In summary, the above procedure essentially finds streaks of points that satisfy or violate left- weakness in a single scan. Similar procedure could be followed to find timestamps that satisfy right-weakness except that the scan would proceed leftwards starting from the rightmost timestamp. An illustration of PDP is provided in supplementary.
4.2.2 Time Complexity
There are three major steps in PDP. In the first step, we use Algorithm 3 to find points that satisfy left-weakness. Notice that each timestamp is visited exactly once in the scan and under the assumption of Property 2, the computation of in line 15 of Algorithm 3 takes constant time. Therefore, the time complexity of Algorithm 3 is . By similar arguments, the complexity of Step 2 is also . Thus, the time complexity of finding points of partition is .
The total time complexity of PDP is therefore , where K is the average computational cost of the sub-problems corresponding to every partition. The worst-case complexity is , that corresponds to the cases where either no partition could be obtained or the largest partition obtained is of in size. However, the best case complexity is , when all the partitions obtained are of constant size and invariant to the overall length of the time series. In practice, this brings down the average computational complexity of PDP close to , as we will demonstrate in evaluation section.
4.3 Measures that qualify for PDP
In this section, we discuss the relationship measures that satisfy the three properties shown in section 4.2 and qualify for PDP:
Mean Square Error(MSE) This measures distances between two time series as the mean of squares of the differences in their observations. Specifically, MSE between two time series and on an interval is given by
Average Product (AP) Average Product computes the mean of element-wise product between the observations of the two time series. Specifically, the Average Product of two time series and in an interval is given by
satisfies all the three properties (proofs in supplementary). Notice that the AP shares similarities with covariance measure. Specifically, for two normalized time series (zero mean, unit variance), AP in an interval measures covariance of two time series with respect to their full-length means. Further, under the assumption that the two time series are the outcome of a stationary process, the mean and variance of the two time series remain constant in any sub-interval, in which case AP would also be equal to Pearson correlation in any sub-interval.
5 Results and Evaluation
5.1 Data and Pre-processing
5.1.1 Global Sea Level Pressure (SLP) Data
We used monthly SLP dataset provided by NCEP/National Center for Atmospheric Research (NCAR) Reanalysis Project [kistler2001ncep] which is available from 1979-2014 (36 years x 12 = 432 timestamps) at a spatial resolution of 2.5 2.5 degree (10512 grid points, also referred to as locations). For each time series, we followed the standard pre-processing steps followed in climate science to remove the annual seasonality and linear trends [kawale2013graph]. Relationships in spatio-temporal data are preferably studied between regions (sets of spatially contiguous locations) as opposed to individual locations, since they are more reliable and stable over time. Therefore, for every location , we grew a homogeneous and spatially contiguous region around it by including all the locations that showed a strong positive correlation of at least 0.85 to , thereby resulting in 10512 regions. For every region
, we then generated its representative time series as the z-scored average time series of the constituent locations, thus getting 10512 time series of regions.
5.1.2 Brain fMRI Data
Functional Magnetic Resonance Image (fMRI) data measures the amount of oxygen consumed at every 2x2x2 mm voxel in the brain and is known to indicate the amount of activity occurring at any location. We used a neuro-imaging data that has been collected in a study at [ramsay2013affective] on 50 subjects. In this study, participants viewed 30 sec video clips interleaved with 30 sec resting period while fMRI scans are being acquired. The temporal resolution of the scan is two seconds and the total duration of the scan was 480 seconds, thereby resulting in the length of every time series to be 240 observations. A number of fMRI pre-processing steps including motion correction, unwarping, and filtering have been performed that are been elaborately described in [ramsay2013affective]. In addition, we grouped the 2x2x2 mm voxels into 90 anatomical regions of the brain based on an Automated Anatomical Labeling Atlas [tzourio2002automated]. The resultant data matrix for each subject, was of size 240 90. Furthermore, we observed that some of the time series show much higher variability within the intervals of resting states compared to that of video-watching states. Note that the similarity measures proposed in section 4.3 are variant to scaling of the time series. Therefore, to avoid unwanted bias to either of the two states, we performed z-scoring on every resting and video-watching time interval.
5.2 Experimental Setup
We used measure Average Product (AP) to capture sub-interval positive correlations in fMRI dataset, whereas for SLP dataset, we studied sub-interval negative correlations using the measure negative Average Product (nAP), which is exactly equal to the negative of measure .
5.2.1 Choice of parameters
Our problem formulation requires inputs for two parameters: , the minimum length and , the minimum strength of relationship in a sub-interval that could be selected in an SIR. The combination of two parameters determines the type of SIRs obtained in the search. In climate science, a physical phenomenon typically shows up as a strong signal that lasts for at least six months, hence we chose for SLP data. Similarly, in fMRI data, we are interested in seeking relationships that might be prominent only during the resting or video-watching time periods. Since the length of each time period was 30 seconds, was set to 20 seconds (equivalent to 10 timestamps), a slightly smaller value to avoid noise factors. The other parameter was set to a high value of 1 for both experiments. Further analysis on parameter sensitivity shows robustness (see supplementary).
5.2.2 Selecting Candidate Pairs
The pairs of time series with strong full-length correlations are likely to show strong correlations for most of the observations and thus, the concept of SIR has a limited relevance for such pairs. Therefore, in this paper, we limit our analysis to the pairs of time series that have weaker full-length relationships. We refer to such pairs as ’candidate pairs’ to which we applied our proposed approach. For brain fMRI dataset, all those pairs of regions that have full-length correlation weaker than 0.25 in magnitude in one of the subjects were selected as candidate pairs. Out of pairs of regions, we selected 1331 candidate pairs in this fashion. Similarly for SLP dataset, we first obtained a set of candidate pairs that have a full-length correlation weaker than 0.25 in magnitude. Due to spatial autocorrelation, a lot of pairs in are redundant with each other and need to be discarded. We treated a pair of time series to be redundant with if their corresponding time series are highly similar, specifically . We removed redundant pairs by following a simple procedure described in supplementary material.
5.3 Computational Evaluation
We evaluated evaluated PDP against DP based on their scalability, i.e. how their computational costs vary with the length of the given time series on the 14837 candidate pairs of SLP dataset. To obtain longer time series, we used global Sea Level Pressure data simulated by GFDL-CM3, a coupled physical model that provides simulations for entire century. We obtained nine time-windows of different sizes, each starting from 1901 and ending at 1910, 1920,…,1990 and for each time window, we obtained nine sets of 14837 pairs of time series. On each of these nine sets, we obtained the total computational time taken by DP and PDP as shown in Figure 2. X-axis on this figure indicates the total length of the time series (in months) for all the nine time-windows, whereas Y-axis indicates the total computational time in seconds (log scale) taken to obtain SIRs in all the candidate pairs. As expected, the computational cost of DP (blue curve) follows a quadratic function of the length of time series. However, the same for PDP (red curve) increases linearly with the length of time series. As explained earlier in section 4.2.2, the complexity of PDP is in time, where is the average cost of solving the sub-problems corresponding to each partition that are further solved using DP. With all the partitions to be of a constant size, the computational time of DP for solving every sub-problem would also be constant, and therefore the resultant complexity of PDP is reduced to , that makes it scalable for longer time series without losing the optimality.
5.4 Evaluation of Discovered SIRs
The lack of ground truth prohibits us to do a systematic domain evaluation on all the candidate pairs. Therefore, we evaluate the validity of discovered SIRs by analyzing the statistical significance of the sum-lengths of the obtained SIRs. We followed a randomization-based approach that addresses the question: Can the sum-length of the given SIR between and be achieved by replacing one of the two time series with a random time series whose full-length correlation with the fixed time series is preserved? If this happens for a very few random time series, then the SIR obtained between and can potentially considered to be statistically significant. The approach was setup in the following fashion: For each of the discovered SIR in a pair of time series and , without loss of generality we replaced time series by a random time series and recomputed the sum-length of the optimal set corresponding to the SIR formed between and . The random time series is generated such that its full-length correlation with the other fixed time series is preserved, i.e. . This process was repeated 1000 times and a distribution of sum-lengths was obtained that was used to compute the p-value of the original sum-length of the optimal set corresponding to the SIR between and . Specifically, the p-value was computed as the fraction of the random time series that formed an SIR of higher sum-lengths with .
Using above procedure, we evaluated the statistical significance of all of the SIRs discovered in SLP and brain fMRI datasets. Figure 3 shows two scatter plots between strength of full-length correlations (X-axis) and the , the normalized sum-length of the SIRs in all the candidate pairs of SLP and brain fMRI dataset. To emphasize the statistical significance at different levels, we depict each pair with either blue (p-value ) or red (p-value ) or black (p-value ) scatter in the Figure 3. The red and black scatter together constitute total of statistically significant SIRs. Further, there are at least (black) pairs of highly significant pair with p-value . Similarly, Figure 2(b) shows 648 out of 1331 candidate pairs that were found to be statistically significant in fMRI dataset. Notice that the significance of an SIR depends on both its sum-length and the full-length correlation of the pair. In fMRI (SLP) dataset, pairs with weaker positive (negative) full-length correlations showed significant SIRs at smaller sum-lengths compared to the pairs with stronger positive (negative) full-length correlations. This is expected because in a pair with positive (negative) full-length correlations, the timestamps with high negative (positive) AP are expected to be rarer, and finding them together as long sub-intervals is furthermore unlikely, compared to the cases where such timestamps are more in number.
5.5 Applications and Domain Insights
While SIRs are interesting by themselves in finding non-trivial relationships that are otherwise difficult to be revealed using full-length relationship measures, studying them collectively in a time series data could provide further insights about the data. Here we discuss two potential approaches for their collective analysis.
5.5.1 Finding Anomalous Intervals
A potential application of this work could be to detect anomalous time intervals that experience unusually high number of relationships. Specifically, for every interval , one can obtain a score that indicates the proportion of candidate pairs that were ’active’ during entire . Intervals included in unusually high number of SIRs could potentially indicate the occurrence of a special event. Applying this idea to SIRs of SLP dataset, we obtained the scores for all possible intervals of size 6 months as shown in Figure 5. It can be seen that the scores are anomalously high for the intervals 1982 Sept-83 Mar, 1988 Sept -89 Mar, 1997 Aug -98 Feb, and 2009 Sept -10 Mar. All of the above intervals are known to have experienced the strongest el-nino and la-nina events since 1979 [ensoyears]. During these events, climate behaves quite differently compared to the general climatology. New wave patterns emerge that synchronize regions with each other that are otherwise unrelated to each other.
5.5.2 Discovery of Associated SIRs
Another potential utility of this work could be to explore interesting connections that could exist between different pairs of regions that showed significant SIRs. For instance, there could exist certain candidate pairs that show SIRs simultaneously on multiple occasions. We refer to such sets as ”Associated SIRs” due to their strong association in multiple intervals of time. Such associated SIRs can be found by applying frequent pattern mining to a given pool of pairs with SIRs, where each pair is as an item and each timestamp is a transaction.
Using this approach, we discovered a set of four SIRs (shown in Figure 4) that share highly similar intervals in the block-design fMRI dataset where the subject is exposed to interleaved video and rest periods. It is interesting to note that these SIRs are formed among parietal and temporal regions that are known to be activated by the visual and auditory stimulus [downar2001effect], respectively. It is also worth noting that one of the brain regions Cingulum participates in three of the four SIRs, which is known to be involved in integrating memory with other regions of the brain [de2005fmri]. While the regions involved in these SIRs are related to the video task at hand, it is surprising that these SIRs mostly capture intervals from the ‘rest’ segments, and not ‘video’ segments that is known to simultaneously activate the involved regions. While these observations need to be validated by domain scientists, this example underscores the utility of the proposed approach for discovering SIRs in brain fMRI data.
In this paper, we defined a notion of sub-interval relationship to capture interactions between two time series that are intermittent in nature and are prominent only in certain sub-intervals of time. We proposed a novel approach to find most interesting SIR in a pair of time series that guarantees to find the optimal SIR. We also demonstrated the scalability of our approach and showed it to be efficient by an order of compared to standard dynamic programming approach in practice. We further exhibited the utility of SIR in two real-world applications: climate and neuroscience to obtain useful domain insights.