Many pattern recognition methods strive towards deriving models from complex and noisy data. Such models try to describe the prototypical normal behavior of the system being observed, which is hard to model manually and whose state is often not even directly observable, but only reflected by the data. They allow reasoning about the properties of the system, predicting unseen data, and assessing the “normality” of new data. In such a scenario, any deviation from the normal behavior present in the data is distracting and may impair the accuracy of the model. An entire arsenal of techniques has therefore been developed to eliminate abnormal observations prior to learning or to learn models in a robust way not affected by a few anomalies.
Such practices may easily lead to the perception of anomalies as being intrinsically bad and worthless. Though that is true for random noise and erroneous measurements, there may also be anomalies caused by rare events and complex processes. Embracing the anomalies in the data and studying the information buried in them can therefore lead to a deeper understanding of the system being analyzed and to the insight that the models hitherto employed were incomplete or—in the case of non-stationary processes—outdated. A well-known example for this is the discovery of the correlation between the El Niño weather phenomenon and extreme surface pressures over the equator by Gilbert Walker  during the early 20 century through the analysis of extreme events in time-series of climate data.
Thus, the use of anomaly detection techniques is not limited to outlier removal as a pre-processing step. In contrast, anomaly detection also is an important taskper se, since only the deviations from normal behavior are the actual object of interest in many applications. Besides the scenario of knowledge discovery mentioned above, fraud detection (e.g., credit card fraud or identity theft), intrusion detection in cyber-security, fault detection in industrial processes, anomaly detection in healthcare (e.g., monitoring patient condition or detecting disease outbreaks), and early detection of environmental disasters are other important examples. Automated methods for anomaly detection are especially crucial nowadays, where huge amounts of data are available that cannot be analyzed by humans.
In this article, we introduce a novel unsupervised method called “Maximally Divergent Intervals” (MDI), which can be employed to point the expert analysts to the interesting parts of the data, i.e., the anomalies. In contrast to most existing anomaly detection techniques (e.g., [2, 3, 4, 5]), we do not analyze the data on a point-wise basis, but search for contiguous intervals of time and regions in space that contain the anomalous event. This is motivated by the fact that anomalies driven by natural processes rather occur over a space of time and, in the case of spatio-temporal data, in a spatial region rather than at a single location at a single time. Moreover, the individual samples making up such a so-called collective anomaly do not have to be anomalous when considered in isolation, but may be an anomaly only as a whole. Thus, analysts will intuitively be searching for anomalous regions in the data instead of anomalous points and the algorithm assisting them should do so as well.
We achieve this by searching for anomalous blocks
in multivariate spatio-temporal data tensors, i.e., regions and time frames whose data distribution deviates most from the distribution of the remaining time-series. To this end, we compare several existing measures for the divergence of distributions and derive a new one that is invariant against varying length of the intervals being compared. A fast novel interval proposal technique allows us to reduce the computational cost of this procedure by just analyzing a small portion of particularly interesting parts of the data. Experiments on climate data, videos, and text corpora will demonstrate that our method can be applied to a variety of applications without major adaptations.
Despite the importance of this task across domains, there has been very limited research on the detection of anomalous intervals in multivariate time-series data, though this problem has been known for a couple of years: Keogh et al.  have already tackled this task in 2005 with a method they called “HOT SAX”. They try to find anomalous sub-sequences (“discords”) of time-series by representing all possible sub-sequences of length as a
-dimensional vector and using the Euclidean distance to the nearest neighbor in that space as anomaly score. More recently, Ren et al. use hand-crafted interval features based on the frequency of extreme values and search for intervals whose features are maximally different from all other intervals. However, both methods are limited to univariate data and a fixed length of the intervals must be specified in advance.
The latter is also true for a multivariate approach proposed by Liu et al.  who compare two consecutive intervals of fixed size in a time-series using the Kullback-Leibler or the Pearson divergence for detecting change-point anomalies, i.e., points where a permanent change of the distribution of the data occurs. This is a different task than finding intervals that are anomalous with regard to all the remaining data. In addition, their method does not scale well for detecting anomalous intervals of dynamic size and is hence not applicable for detecting other types of anomalies, for which a broader context has to be taken into account.
The task of detecting anomalous intervals of dynamic size has recently been tackled by Senin et al. , who search for typical and anomalous patterns in time-series by inducing a grammar on a symbolic discretization of the data. As opposed to our approach, their method cannot handle multivariate or spatio-temporal data.
Similar to our approach, Jiang et al. 
search for anomalous blocks in higher-order tensors using the Kullback-Leibler divergence, but apply their method to discrete data only (e.g., relations in social networks) and use a Poisson distribution for modeling the data. Since their search strategy is very specific to applications dealing with graph data, it is not applicable in the general case for multivariate continuous data dealt with in our work.
Regarding spatio-temporal data, Wu et al.  follow a sequential approach for detecting anomalies first spatially, then temporally and apply a merge-strategy afterwards. However, the time needed for merging grows exponentially with the length of the time-series and their divergence measure is limited to binary-valued data. In contrast to this, our approach is able to deal with multivariate real-valued data efficiently and treats time and space jointly.
The remainder of this article is organized as follows: Section 2 will introduce our novel “Maximally Divergent Intervals” algorithm for off-line detection of collective anomalies in multivariate spatio-temporal data. Its performance will be evaluated quantitatively on artificial data in Section 3 and its suitability for practical applications will be demonstrated by means of experiments on real data from various different domains in Section 4. Section 5 will summarize the progress made so far and mention directions for future research.
2 Maximally Divergent Intervals
This section formally introduces our MDI algorithm for off-line detection of anomalous intervals in spatio-temporal data. After a set of definitions that we are going to make use of, we start by giving a very rough overview of the basic idea behind the algorithm, which is also illustrated schematically in Figure 1. The subsequent sub-sections will go into more detail on the individual aspects and components of our approach.
Our implementation of the MDI algorithm is available as open source at: https://cvjena.github.io/libmaxdiv/
Let be a multivariate spatio-temporal time-series given as 5-order tensor with 4 contextual attributes (point of time and spatial location) and behavioral attributes for all samples. We will index individual samples using 4-tuples like in .
The usual interval notation will be used in the following for discrete intervals . Furthermore, the set of all intervals with size between and along an axis of size is denoted by
The set of all sub-blocks of a data tensor complying with given size constraints can then be defined as
In the following, we will often omit the indices for simplicity and just refer to it as .
Given any sub-block , the remaining part of the time-series excluding that specific range can be defined as
and we will often simply refer to it as if the corresponding range is obvious from the context.
2.2 Idea and Algorithm Overview
The approach pursued by the MDI algorithm to compute anomaly scores for all intervals can be motivated by a long-standing definition of anomalies given by Douglas Hawkins  in 1980, who defines an anomaly as “an observation which deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism”. In analogy to this definition, the MDI algorithm assumes that there is a sub-block of the given time-series that has been generated according to “a different mechanism” than the rest of the time-series in (cf. the schematic illustration in Figure 1
). The algorithm tries to capture these mechanisms by modelling the probability densityof the data in the inner interval and the distribution in the outer interval Section 2.3.
Moreover, a measure for the degree of “deviation” of from has to be defined. Like some other works on collective anomaly detection [8, 10], we use—among others—the Kullback-Leiber (KL) divergence for this purpose. However, Section 2.5 will show that this is a sub-optimal choice when used without a slight modification and discuss alternative divergence measures.
Given these ingredients, the underlying optimization problem for finding the most anomalous interval can be described as
Various possible choices for the divergence measure will be discussed in Section 2.5.
In order to actually locate this “maximally divergent interval” , the MDI algorithm scans over all intervals , estimates the distributions and and computes the divergence between them, which becomes the anomaly score of the interval . The parameters and , which define the minimum and the maximum size of the intervals in question, have to be specified by the user in advance. This is not a severe restriction, since extreme values may be chosen for these parameters in exchange for increased computation time. But depending on the application and the focus of the analysis, there is often prior knowledge about reasonable limits for the size of possible intervals.
After the anomaly scores have been obtained for all intervals, they are sorted in descending order and non-maximum suppression is applied to obtain non-overlapping intervals only. For large time-series with more than 10k samples, we apply an approximative non-maximum suppression that avoids storing all interval scores by maintaining a fixed-size list of currently best-scoring non-overlapping intervals.
Finally, the algorithm returns a ranking of intervals, so that a user-specified number of top intervals can be selected as output.
2.3 Probability Density Estimation
The divergence measure used in (4) requires the notion of the distribution of the data in the intervals and . We will hence discuss in the following, which models we employ to estimate these distributions and how this can be done efficiently.
The choice of a specific model for the distributions and imposes some assumptions about the data which may not conform to reality. However, since the MDI algorithm estimates the parameters of those distributions for all possible intervals in the time-series, the use of models that can be updated efficiently is crucial. One such model is Kernel Density Estimation (KDE) with
using a Gaussian kernel
On the one hand, KDE is a very flexible model, but on the other hand, it does not scale well to long time-series and does not take correlations between attributes into account. The second proposed model does not expose these problems: It assumes that both the data in the anomalous interval and in the remaining time-series are distributed according to multivariate normal distributions (Gaussians) and , respectively.
2.3.2 Efficient Estimation with Cumulative Sums
Both distribution models described above involve a summation over all samples in the respective interval. Performing this summation for multiple intervals is redundant, because some of them overlap with each other. Such a naïve approach of finding the maximally divergent interval has a time complexity of with KDE and
with Gaussian distributions. This is due to the number ofintervals (with being the maximum volume of an interval), each of them requiring a summation over samples for the evaluation of one of the divergence measures described later in Section 2.5. For KDE,
distance computations are necessary for the evaluation of the probability density function for each sample, while for Gaussian distributions a summation over allsamples has to be performed for each interval to estimate the parameters of the distributions.
This would be clearly infeasible for large-scale data. However, these computations can be sped up significantly by using cumulative sums . For the sake of clarity, we first consider the special case of a non-spatial time-series . With regard to KDE, a matrix of cumulative sums of kernelized distances can be used:
This matrix has to be computed only once, which requires distance calculations, and can then be used to estimate the probability density functions of the data in the intervals and in constant time:
In analogy, a matrix of cumulative sums over the samples and a tensor of cumulative sums over the outer products of the samples can be used to speed up the estimation of the parameters of Gaussian distributions:
where and are the -th column of and the -th matrix of , respectively. Using these matrices, the mean vectors and covariance matrices can be estimated in constant time.
This technique can be generalized to the spatio-temporal scenario using higher order tensors for storing the cumulative sums. The reconstruction of a sum over a given range from such a cumulative tensor follows the Inclusion-Exclusion Principle and the number of summands involved in the computation grows, thus, exponentially with the order of the tensor, being 16 for a 4-order tensor, compared to only 2 summands in the non-spatial case. The exact equation describing the reconstruction in the general case of an -order tensor is given in Appendix A.
Thanks to the use of cumulative sums, the computational complexity of the MDI algorithm is reduced to for the case of KDE and to for Gaussian distributions.
2.4 Incorporation of Context
The models used for probability density estimation described in the previous section are based on the assumption of independent samples. However, this assumption is almost never true for real data, since the value at a specific point of time and spatial location is likely to be strongly correlated with the values at previous times and nearby locations. To mitigate this issue, we apply two kinds of embeddings that incorporate context into each sample as pre-processing step.
2.4.1 Time-Delay Embedding
Aiming to make combinations of observed values more representative of the hidden state of the system being observed, time-delay embedding  incorporates context from previous time-steps into each sample by transforming a given time-series , into another time-series , given by
where the embedding dimension specifies the number of samples to stack together and the time lag specifies the gap between two consecutive time-steps to be included as context. An illustrative example is given in Figure 2.
This method is often motivated by Takens’ theorem , which, roughly, states that for a certain embedding dimension the hidden state of the system can be reconstructed given the observations of the last time-steps.
2.4.2 Spatial-Neighbor Embedding
Correlations between nearby spatial locations are handled similarly: In addition to time-delay embedding, each sample of a spatio-temporal time-series can be augmented by the features of its spatial neighbors (cf. Figure 3) to enable the detection of spatial or spatio-temporal anomalies. This pre-processing step, which we refer to as spatial-neighbor embedding, is parametrized with 3 parameters for the embedding dimension along each spatial axis and 3 parameters for the lag along each axis.
Note that, in contrast to time-delay embedding, neighbors from both directions are aggregated, since spatial context is bilinear. For example, would mean to consider 4 neighbors along the -axis, 2 in each direction.
Spatial-neighbor embedding can either be applied before or after time-delay embedding. As opposed to many spatio-temporal anomaly detection approaches that perform temporal and spatial anomaly detection sequentially (e.g., [11, 16, 17]), the MDI algorithm in combination with the two embeddings allows for a joint optimization. However, it implies a much more drastic multiplication of the data size.
A suitable measure for the deviation of the distribution from is an essential part of the MDI algorithm. The following sub-sections introduce several divergence measures we have investigated and propose a modification to the well-known Kullback-Leibler (KL) divergence that is necessary for being able to compare divergences of distributions estimated from intervals of different size.
2.5.1 Cross Entropy
Numerous divergence measures, including those described in the following, have been derived from the domain of information theory. Being one of the most basic information theoretic concepts, the cross entropy between two distributions given by their probability density functions and may already be used as a divergence measure:
Cross entropy measures how surprising a sample drawn from is, assuming that it would have been drawn from , and is hence already eligible as a divergence measure, since the unexpectedness grows when and are very different.
Since the MDI algorithm assumes, that the data in the intervals and have been sampled from the distributions corresponding to and , respectively, the cross entropy of the two distributions can be approximated empirically from the data:
This approximation has the advantage of having to estimate only one probability density, , explicitly. This is particularly beneficial, since the possibly anomalous intervals often contain only few samples, so that an accurate estimation of the probability density is difficult.
2.5.2 Kullback-Leibler Divergence
The Kullback-Leibler (KL) divergence is a popular divergence measure that builds upon the fundamental concept of cross entropy. Given two distributions and , the KL divergence can be defined as follows:
As opposed to the pure cross entropy of and , the KL divergence does not only take into account how well is explained by , but also the intrinsic entropy of , so that an interval with a stable distribution would get a higher score than an oscillating one if they had the same cross entropy with the rest of the time-series.
Like cross entropy, the KL divergence can be approximated empirically from the data, but in contrast to cross entropy, this requires estimating the probability densities of both distributions, and :
When used in combination with the Gaussian distribution model, the KL divergence comes with an additional advantage from a computational point of view, since there is a known closed-form solution for the KL divergence of two Gaussians :
This allows evaluating the KL divergence in constant time for a given interval, which reduces the computational complexity of the MDI algorithm using the KL divergence in combination with Gaussian models to the number of possible intervals: .
Given this explicit solution for the KL divergence and the closed-form solution for the entropy of a Gaussian distribution  with mean vector and covariance matrix , which is given by
one can easily derive a closed-form solution for the cross entropy of those two distributions as well:
Compared with the KL divergence, this does not assign extremely high scores to small intervals
with a low variance, due to the subtraction of. This may be an explanation for the evaluation results in Section 3, where cross entropy in combination with Gaussian models is often superior to the KL divergence, although it does not account for intervals of varying entropy.
However, in contrast to the empirical approximation of cross entropy in (12), this requires the estimation of .
2.5.3 Polarity of the KL divergence and its effect on MDI
It is worth noting that the KL divergence is not a metric and, in particular, not symmetric: . Some authors use, thus, a symmetric variant :
This raises the question whether , , or the symmetric version should be used for the detection of anomalous intervals. Quantitative experiments with an early prototype of our method  have shown that neither nor provide good performance, as opposed to .
A visual inspection of the detections resulting from the use of with the assumption of Gaussian distributions shows that all the intervals with the highest anomaly scores have the minimum possible size specified by the user and a very low variance. An example is given in Figure 4. The scores of the top detections in that example are around 100 times higher than those yielded by .
This bias of towards small low-variance intervals can also be explained theoretically. For the sake of simplicity, consider the special case of a univariate time-series. In this case, the closed-form solution for assuming Gaussian distributions given in (15) reduces to
where , are the mean values and , are the variances of the distributions in the inner and in the outer interval, respectively. It can be seen from (19) that, due to the division by , the KL divergence will approach infinity when the variance in the inner interval converges towards . And since the algorithm has to estimate the variance empirically from the given data, it assigns high detection scores to intervals as small as possible, because smaller intervals have a higher chance of having a low empirical variance. The term cannot counterbalance this effect, though it is negative for , since its absolute value grows much more slowly than that of , as can be seen from the fact that , since .
In contrast, , where the roles of and are swapped, does not possess this deficiency, since is estimated from a much larger portion of data and, thus, is a more robust estimate.
The symmetric version is useless as well, since the scores obtained from will just be absorbed by the much higher scores of .
2.5.4 Statistical Analysis and Unbiased KL Divergence
Though does not overestimate the anomalousness of low-variance intervals as extremely as does, the following theoretical analysis will show that it is not unbiased either. In contrast to the previous section, this bias is not related to the data itself, but to the length of the intervals: smaller intervals systematically get higher scores than longer ones. This harms the quality of interval detections, because anomalies will be split up into multiple contiguous small detections (see Figure 4(a) for an example).
Recall that denotes the set of all intervals of length in a time-series with time-steps. Furthermore, let denote a -dimensional vector with all coefficients being 0 and
the identity matrix of dimensionality.
When applying the MDI algorithm to a time-series
, sampled independently and identically from plain white noise, an ideal divergence is supposed to yield constant average scores for all(for some user-defined limits ), i.e., scores independent from the length of the intervals.
For simplicity, we will first analyze the distribution of those scores using the MDI algorithm with Gaussian distributions with the simple, but for this data perfectly valid assumption of identity covariance matrices. In this case, the KL divergence of two Gaussian distributions with the mean vectors in some intervals for some arbitrary is given by . Moreover, since all samples in the time-series are normally distributed, so are their empirical means:
Thus, all dimensions of the mean vectors are independent and identically normally distributed variables. Their difference is, hence, normally distributed too:
is a vector of independent standard normal random variables and
is the sum of the squares of
independent normal variables and, hence, distributed according to the chi-squared distribution withdegrees of freedom, scaled by half the variance of the variables. The mean of a -distributed random variable is and the mean of the scores for all intervals in is, accordingly, , which is inversely proportional to the length of the interval . Thus, the KL divergence is systematically biased towards smaller intervals.
When the length of the time-series is very large, the asymptotic scale of the chi-squared distribution is and the estimated parameters of the outer distribution converge towards the parameters of the true distribution of the data. Thus, if the restriction of the Gaussian model to identity covariance matrices is weakened to a global, shared covariance matrix , the above findings also apply to the case of long time-series with correlated variables and, hence, also when time-delay embedding is applied. Because in this case, the KL divergence reduces to and the subtraction of the true mean followed by the multiplication with the inverse covariance matrix can be considered as a normalization of the time-series, transforming it to standard normal variables with uncorrelated dimensions.
For the general case of two unrestricted Gaussian distributions, the test statistic
has been shown to be asymptotically distributed according to a chi-squared distribution with degrees of freedom . This test statistic is often used for testing the hypothesis that a given set of samples has been drawn from a Gaussian distribution with known parameters . In the scenario of the MDI algorithm, the set of samples is the data in the inner interval and the parameters of the distribution to test that data against are those estimated from the data in the outer interval
. The null hypothesis of the test would be that the data inhas been sampled from the same distribution as the data in . The test statistic may then be used as a measure for how well the data in the interval fit the model established based on the data in the remainder of the time-series.
After some elementary reformulations, the relationship between this test statistic and the KL divergence becomes obvious: . This is exactly the normalization of the KL divergence by the scale factor identified in (20). Thus, we define an unbiased KL divergence as follows:
The distribution of this divergence applied to asymptotically long time-series depends only on the number of attributes and not on the length of the interval any more. However, this correction may also be useful for time-series of finite length. An example of actual detections resulting from the use of the unbiased KL divergence compared with the original one can be seen in Figure 5.
A further advantage of knowing the distribution of the scores is that this knowledge can also be used for normalizing the scores with respect to the number of attributes, in order to make them comparable across time-series of varying dimensionality. Moreover, it allows the selection of a threshold for distinguishing between anomalous and nominal intervals based on a chosen significance level. This may be preferred in some applications over searching for a fixed number of top detections.
Interestingly, Jiang et al.  have derived an equivalent unbiased KL divergence () from a different starting point based on the assumption of a Poisson distribution and the inverse log-likelihood of the interval as anomaly score.
scores and their gradients. The dashed black line indicates the mean of the scores and the dashed blue line marks a threshold that is 1.5 standard deviations above the mean. Time-delay embedding withwas applied before computing the scores.
2.5.5 Jensen-Shannon Divergence
A divergence measure that does not expose the problem of being asymmetric is the Jensen-Shannon (JS) divergence, which builds upon the KL divergence:
where and are probability density functions. is a mixture distribution, so that a sample is drawn either from or from
with equal probability (though a parametrized version of the JS divergence accounting for unequal prior probabilities exists as well, but will not be covered here).
The JS divergence possesses some desirable properties, which the KL divergence does not have: most notably, it is symmetric and bounded between and , so that anomaly scores cannot get infinitely high.
Like the KL divergence, the JS divergence can be approximated empirically from the data in the intervals and . However, there is no closed-form solution for the JS divergence under the assumption of a Gaussian distribution (as opposed to the KL divergence), since
would then be a Gaussian Mixture Model (GMM). Though several approximations of the KL divergence of GMMs have been proposed, they are either computationally expensive or abandon essential properties such as positivity. This lack of a closed-form solution is likely to be the reason why the JS divergence was clearly outperformed by the KL divergence in our quantitative experiments in Section 3 when the Gaussian model is used, despite its desirable theoretic properties.
2.6 Interval Proposals for Large-Scale Data
Exploiting cumulative sums and a closed-form solution for the KL divergence, the asymptotic time complexity of the MDI algorithm with a Gaussian distribution model could already be reduced to be linear in the number of intervals (see Section 2.3.2). If the maximum length of an anomalous interval is independent from the number of samples , the run-time is also linear in . However, due to high constant-time requirements for estimating probability densities and computing the divergence, the algorithm is still too slow for processing large-scale data sets with millions of samples.
Since anomalies are rare by definition, many of the intervals analyzed by a full scan will be uninteresting and irrelevant for the list of the top anomalies detected by the algorithm. In order to focus on the analysis of non-trivial intervals, we employ a simple proposal technique that selects interesting intervals based on point-wise anomaly scores.
Simply grouping contiguous detections of point-wise anomaly detection methods in order to retrieve anomalous intervals is insufficient, because it will most likely lead to split-up detections. However, it is not unreasonable to assume that many samples inside of an anomalous interval will also have a high point-wise score, especially after applying contextual embedding. Figure 6, for example, shows two exemplary time-series from the synthetic data set introduced in Section 3.1 along with the point-wise scores retrieved by applying the Hotelling’s method , after time-delay embedding has been applied to the time-series. Note that even in the case of the very subtle amplitude-change anomaly, the two highest Hotelling’s scores are at the beginning and the end of the anomaly. The idea is to apply a simple threshold operation on the point-wise scores to extract interesting points and then propose all those intervals for detailed scoring by a divergence measure whose first and last samples are among these points if the interval conforms to the size constraints.
This way, the probability density estimation and the computation of the divergence have to be performed for a comparatively small set of interesting intervals only and not for all possible intervals in the time-series. The interval proposal method is not required to have a low false-positive rate, though, because the divergence measure is responsible for the actual scoring. Instead, it has to act as a high-recall system so that truly anomalous intervals are not excluded from the actual analysis.
Since we are only interested in the beginning and end of the anomalies, the point-wise scores are not used directly, but the centralized gradient filter is applied to the scores for reducing them in areas of constant anomalousness and emphasizing changes of the anomaly scores.
The evaluation in Section 3.3 will show that the interval proposal technique can speed-up the MDI algorithm significantly without impairing its performance.
3 Experimental Evaluation
In this section, we evaluate our MDI algorithm on a quantitative basis using synthetic data and compare it with other approaches well-known in the field of anomaly detection.
3.1 Data Set
In contrast to many other established machine learning tasks, there is no widely used standard benchmark for the evaluation of anomaly detection algorithms; not for the detection of anomalous intervals and not even for the very common task of point-wise anomaly detection. This is mainly for the reason that the notion of an “anomaly” is not well defined and varies between different applications and even from analyst to analyst. Moreover, anomalies are, by definition, rare, which makes the collection of large-scale data sets difficult. However, even if a large amount of data were available, it would be nearly impossible to annotate it in an intersubjective way everyone would agree with. But accurate and complete ground-truth information is mandatory for a quantitative evaluation and comparison of machine learning techniques. Therefore, we use a synthetic data set for assessing the performance of different variants of the MDI algorithm.
All time-series in that data set have been sampled from a Gaussian process with a squared-exponential covariance function and zero mean function . The length scale of the GP has been set to and the noise parameter to . denotes Kronecker’s delta. Different types of anomalies have then been injected into these time-series, with a size varying between 5% and 20% of the length of the time-series:
meanshift: A random, but constant value is added to or subtracted from the anomalous samples.
meanshift_hard: A random, but constant value is added to or subtracted from the anomalous samples.
meanshift5: Five meanshift anomalies are inserted into the time-series.
meanshift5_hard: Five meanshift_hard anomalies inserted into the time-series.
amplitude_change: The time-series is multiplied with a Gaussian window with standard deviation whose mean is the centre of the anomalous interval. Here, is the length of the anomalous interval and the amplitude of the Gaussian window is clipped at . This modified time-series is added to the original one.
frequency_change: The time-series is sampled from a non-stationary GP, whose covariance function uses a reduced length scale during the anomalous interval , so that correlations between samples are reduced, which leads to more frequent oscillations .
The values in the anomalous interval are replaced with the values of another function sampled from the Gaussian process. 10 time-steps at the borders of the anomaly are interpolated between the two functions for a smooth transition. This rather difficult test case is supposed to reflect the concept of anomalies as being “generated by a different mechanism” (cf.Section 2.2).
The above test cases are all univariate, but there are as well similar multivariate scenarios meanshift_multvar, amplitude_change_multvar, frequency_change_multvar, and mixed_multvar with 5-dimensional time-series. Regarding the first three of these test cases, the corresponding anomaly is injected into one of the dimensions, while all attributes are replaced with those of the other time-series in the mixed_multvar scenario, which is also a property of many real time-series.
This results in a synthetic test data set with 11 test cases, a total of 1100 time-series and an overall number of 1900 anomalies. Examples for all test cases are shown in Figure 7.
3.2 Performance Comparison
Since the detection of anomalous regions in spatio-temporal data is rather a detection than a classification task, we do not use the Area under the ROC Curve (AUC) as performance criterion like many works on point-wise anomaly detection do, but quantify the performance in terms of Average Precision (AP) with an Intersection over Union (IoU) criterion that allows an overlap between 50% and 100%.
and the Hampel loss function are used. We obtain interval detections from those point-wise baselines by grouping contiguous detections based on multiple thresholds and applying non-maximum suppression afterwards. The overlap threshold for non-maximum suppression is set to 0 in all experiments to obtain non-overlapping intervals only. To be fair, MDI also has to compete with the baselines on the task they have been designed for, i.e., point-wise anomaly detection, by means of AUC. The interval detections can be converted to point-wise detections easily by taking the score of the interval a sample belongs to as score for that sample.
Figure 8 shows that the performance of the MDI algorithm using the Gaussian model is clearly superior on the entire synthetic data set compared to the baselines by means of Mean AP and even on the task of point-wise anomaly detection measured by AUC. The polarity of the KL divergence has been used in all experiments following the argumentation in Section 2.5.3. In addition, the performance of the unbiased variant is reported for the Gaussian model. The parameters of time-delay embedding have been fixed to which we have empirically found to be suitable for this data set. For KDE, we used a Gaussian kernel with bandwidth .
While MDI KDE is already superior to the baselines, it is significantly outperformed by MDI Gaussian, which improves on the best baseline by 286%. This discrepancy between the MDI algorithm using KDE and using Gaussian models is mainly due to time-delay embedding, which is particularly useful for the Gaussian model, because it takes correlations of the variables into account, as opposed to KDE. As can be seen in Figure 9, the Gaussian model would be worse than KDE and on par with the baselines without time-delay embedding.
Considering the Mean AP on this synthetic data set, the unbiased KL divergence did not perform better than the original KL divergence. However, on the test cases meanshift5, meanshift5_hard, and meanshift_hard it achieved an AP twice as high as that of , which was poor on those data sets (see Figure 10). Since real data sets are also likely to contain multiple anomalies, we expect to be a more reliable divergence measure in practice.
Another interesting result is that cross entropy was the best performing divergence measure. This shows the advantage of reducing the impact of the inner distribution , which is estimated from very few samples. However, it may perform less reliably on real data whose entropy varies more widely over time than in this synthetic benchmark.
The Jensen-Shannon divergence performed best for the KDE method, but worst for the Gaussian model. This can be explained by the lack of a closed-form solution for the JS divergence, so that it has to be approximated from the data, while the KL divergence of two Gaussians can be computed exactly. This advantage of the combination of the KL divergence with Gaussians models is, thus, not only beneficial with respect to the run-time of the algorithm, but also with respect to its detection performance.
The differences between the results in Figure 8 are significant on a level of 5% according to the permutation test.
3.3 Interval Proposals
In order not to sacrifice detection performance for the sake of speed, the interval proposal method described in Section 2.6 has to act as a high-recall system proposing the majority of anomalous intervals. This can be controlled to some degree by adjusting the threshold applied to the point-wise scores, where and are the empirical mean and standard deviation of the point-wise scores, respectively. To find a suitable value for the hyper-parameter , we have evaluated the recall of the proposed intervals for different values of using the usual IoU measure for distinguishing between true and false positive detections. The results in Figure 10(a) show that time-delay embedding is of a great benefit in this scenario too. Based on these results, we selected for subsequent experiments, which still provides a recall of 97% and is already able to reduce the number of intervals to be analyzed in detail significantly.
The processing of all the 1100 time-series from the synthetic data set, which took 216 seconds on an Intel Core™ i7-3930K with 3.20GHz and eight virtual cores using the Gaussian model and the unbiased KL divergence after the usual time-delay embedding with , could be reduced to 5.2 seconds using interval proposals. This corresponds to a speed-up by more than 40 times.
Though impressive, the speed-up was expected. What was not expected, however, is that the use of interval proposals also increased the detection performance of the entire algorithm by up to 125%, depending on the divergence. The exact average precision achieved by the algorithm on the synthetic data set with a full scan over all intervals and with interval proposals is shown in Figure 10(b). This improvement is also reflected by the AUC scores not reported here and is, hence, not specific to the evaluation criterion. A possible explanation for this phenomenon is that some intervals that are uninteresting but distracting for the MDI algorithm are not even proposed for detailed analysis.
4 Application Examples on Real Data
The following application examples on real data from various different domains are intended to complement the quantitative results presented above with a demonstration of the feasibility of our approach for real problems.
4.1 Detection of North Sea Storms
To demonstrate the efficiency of the MDI algorithm on long time-series, we apply it to storm detection in climate data: The coastDat-1 hindcast  is a reconstruction of various marine climate variables measured at several locations over the southern North Sea between 51° N, 3° W and 56° N, 10.5° E with an hourly resolution over the 50 years from 1958 to 2007, i.e., approximately 450,000 time steps. Since measurements are not available at locations over land, we select the subset of the data between 53.9° N, 0° E and 56° N, 7.7° E, which results in a regular spatial grid of size located entirely over the sea (cf. Figure 12). Because cyclones and other storms usually have a large spatial extent and move over the region covered by the measurements, we reduce the spatio-temporal data to purely temporal data in this experiment by averaging over all spatial locations. The variables used for this experiment are significant wave height, mean wave period and wind speed.
We apply the MDI algorithm to that data set using the Gaussian model and the unbiased KL divergence. Since North Sea storms lasting longer than 3 days are usually considered two independent storms, the maximum length of the possible intervals is set to 72 hours, while the minimum length is set to 12 hours. The parameters of time-delay embedding are fixed to .
28 out of the top 50 and 7 out of the top 10 detections returned by the algorithm can be associated with well-known historic storms. The highest scoring detection is the so-called “Hamburg-Flut” which flooded one fifth of Hamburg in February 1962 and caused 340 deaths. Also among the top 5 is the “North Frisian Flood”, which was a severe surge in November 1981 and lead to several dike breaches in Denmark.
A visual inspection of the remaining 22 detections revealed, that almost all of them are North Sea storms as well. Only 4 of them are not storms, but the opposite: they span times of extremely calm sea conditions with nearly no wind and very low waves, which is some kind of anomaly as well.
A list of the top 50 detections can be found in Appendix B and animated heatmaps of the three variables during the detected time-frames are shown on our web page: http://www.inf-cv.uni-jena.de/libmaxdiv_applications.html.
Processing this comparatively long time-series using 8 parallel threads took 27 seconds. This time can be reduced further to half a second by using interval proposals without changing the top 10 detections significantly. This supports the assumption, that the novel proposal method does not only perform well on synthetic, but also on real data.
4.2 Spatio-Temporal Detection of Low Pressure Areas
As a genuine spatio-temporal use-case, we have also applied the MDI algorithm to a time-series with daily sea-level pressure (SLP) measurements over the North Atlantic Sea with a much wider spatial coverage than in the previous experiment. For this purpose, we selected a subset of the NCEP/NCAR reanalysis  covering the years from 1957 to 2011. This results in a time-series of about 20,000 days. The spatial resolution of 2.5° degrees is rather coarse and the locations are organized in a regular grid of size covering the area between 25° N, 52.5° W and 65° N, 15° E.
Again, the MDI algorithm with the Gaussian model and the unbiased KL divergence is applied to this time-series to detect low-pressure fields, which are related to storms. Regarding the time dimension, we apply time-delay embedding with and search for intervals of size between 3 and 10 days. Concerning space, we do not apply any embedding for now and set a minimum size of , but no maximum. 7 out of the top 20 detections could be associated with known historic storms.
A visual inspection of the results shows that the MDI algorithm is not only capable of detecting occurrences of anomalous low-pressure fields over time, but also their spatial location. This can be seen in the animations on our web page: http://www.inf-cv.uni-jena.de/libmaxdiv_applications.html. A few snapshots and a list of detections are also shown in Appendix C.
It is not necessary to apply spatial-neighbor embedding in this scenario, since we are not interested in spatial outliers, but only in the location of temporal outliers. We have also experimented with applying spatial-neighbor embedding and it led to the detection of some high-pressure fields surrounded by low-pressure fields. Since high-pressure fields are both larger and more common in this time-series, they are not detected as temporal anomalies.
Since we did not set a maximum spatial extent of anomalous regions, the algorithm took 4 hours to process this spatio-temporal time-series. This could, however, be reduced to 22 seconds using our interval proposal technique, with only a minor loss of localization accuracy.
4.3 Stylistic Anomalies in Texts of Natural Language
By employing a transformation from the domain of natural language to real-valued features, the MDI algorithm can also be applied to written texts. One important task in Natural Language Processing (NLP) is, for example, the identification of paragraphs written in a different language than the remainder of the document. Such a segmentation can be used as a pre-processing step for the actual, language-specific processing.
In order to simulate such a scenario, we use a subset of the Europarl corpus , which is a sentence-aligned parallel corpus extracted from the proceedings of the European Parliament in 21 different languages. The 33,334 English sentences from the COMTRANS subset of Europarl, which is bundled with the Natural Language Toolkit (NLTK) for Python, serve as a basis and 5 random sequences of between 10 and 50 sentences are replaced by their German counterparts to create a semantically coherent mixed-language text.
We employ a simple transformation of sentences to feature vectors: Since the distribution of letter frequencies varies across languages, each sentence is represented by a 27-dimensional vector whose first element is the average word length in the sentence and the remaining 26 components are the absolute frequencies of the letters “a” to “z” (case-insensitive). German umlauts are ignored since they would make the identification of German sentences too easy.
The MDI algorithm using the unbiased KL divergence is then applied in order to search for anomalous sequences of between 10 and 50 sentences in the mixed-language text after sentence-wise transformation to the feature space. Because the number of features is quite high in relation to the number of samples in an interval, we use a global covariance matrix shared among the Gaussian models and do not apply time-delay embedding.
The top 5 detections returned by the algorithm correspond to the 5 German paragraphs that have been injected into the English text. The localization is quite accurate, though not perfect: on average, the boundaries of the detected paragraphs are off by 1.4 sentences from the ground-truth. The next 5 detections are mainly tables and enumerations, which are also an anomaly compared with the usual dialog style of the parliament proceedings.
For this scenario, we had designed the features specifically for the task of language identification. To see what else would be possible with a smaller bias towards a specific application, we have also applied the algorithm to the 1 Book of Moses (Genesis) in the King James Version of the bible, where we use word2vec  for word-wise feature embeddings. word2vec learns real-valued vector representations of words in a way, so that the representations of words that occur more often in similar contexts have a smaller Euclidean distance. The embeddings used for this experiment have been learned from the Brown corpus using the continuous skip-gram model and we have chosen a dimensionality of 50 for the vector space, which is rather low for word2vec models, but still tractable for the Gaussian probability density model. Words which have not been seen by the model during training are treated as missing values.
The top 10 detections of sequences of between 50 and 500 words according to the unbiased KL divergence are provided in Appendix D. The first five of those are, without exception, genealogies, which can indeed be considered as anomalies, because they are long lists of names of fathers, sons and wives, connected by repeating phrases. The 6 detection is a dialog between God and Abraham, where Abraham bargains with God and tries to convince him not to destroy the town Sodom. This episode is another example for stylistic anomalies, since the dialog is a concatenation of very similar question-answer pairs with only slight modifications.
Due to the rather wide limits on the possible size of anomalous intervals, the analysis of the entire book Genesis, a sequence of 44,764 words, took a total of 9 minutes, where we have not yet used interval proposals.
4.4 Anomalies in Videos
The detection of unusual events in videos is another important task, e.g., in the domain of video surveillance or industrial control systems. Though videos are already represented as multivariate spatio-temporal time-series with usually 3 variables (RGB channels), a semantically more meaningful representation can be obtained by extracting features from a Convolutional Neural Network (CNN).
In this experiment, we use a video of a traffic scene from the ViSOR repository . It has a length of 60 seconds (1495 frames) and a rather low resolution of pixels. The video shows a street and a side-walk with a varying frequency of cars crossing the captured area horizontally in both directions. At one point, a group of two pedestrians and one cyclist appears on the side-walk and crosses the area from right to left at a low speed. Another sequence at the end of the video shows a single cyclist riding along the side-walk in the opposite direction at a higher speed. Altogether, 26 seconds of the video contain moving objects and 34 seconds just show an empty street. The nominal state of the scene hence is not unambiguous.
We extract features for each frame of the video from the conv5 layer of CaffeNet , which reduces the spatial resolution to , but increases the number of feature dimensions to 256. This rather large feature space is then reduced to 16 dimensions using PCA and the MDI algorithm is applied to search for anomalous sub-blocks with a minimum spatial extent of cells and a length between 3 and 12 seconds. The time-delay embedding parameters are fixed to for capturing half a second as context without increasing the number of dimensions too much. We apply the MDI algorithm with both the unbiased KL divergence and cross entropy as divergence measures. The Gaussian distribution model is employed in both cases.
The results (some snapshots are shown in Figure 13) exhibit an interesting difference between the two divergence measures: The KL divergence detects a sub-sequence of approximately 10 seconds where absolutely no objects cross the captured area. Thus, car traffic is identified as normal behavior and long spans of time without any traffic are considered as anomalous, because they have a very low entropy and the KL divergence penalizes the entropy of all other intervals, as opposed to cross entropy which does not take the entropy of the detected interval into account. Another detection occurs when the group of pedestrians enters the area. The localization, however, is rather fuzzy and spans nearly the entire frame. Cross entropy, on the other hand, seems to identify the state of low or no traffic as normal behavior and yields two detections at the beginning and the end of the video where the frequency of cars is higher than in the rest of the video. It detects the pedestrians too, but with a better localization accuracy. This detection, however, does not cover the entire side-walk, since the pedestrians are moving from right to left and the algorithm is not designed for tracking moving anomalies.
Without using interval proposals, the comparatively high number of features combined with the large spatial search space would result in a processing time of 13 hours for this video. This can be reduced to 5 minutes using our novel interval proposal technique.
5 Summary and Conclusions
We have introduced a novel unsupervised algorithm for anomaly detection that is suitable for analyzing large multivariate time-series and can detect anomalous regions not only in temporal but also in spatio-temporal data from various domains. The proposed MDI algorithm outperforms existing anomaly detection techniques, while being comparatively time efficient, thanks to an efficient implementation and a novel interval proposal technique that excludes uninteresting parts of the data from in-depth analysis. Moreover, we have exposed a bias of the Kullback-Leibler (KL) divergence towards smaller intervals and proposed an unbiased KL divergence that is superior when applied to real data. We have also investigated other divergence measures and found that the use of cross entropy can result in improved performance for data with a low variability of entropy.
Various experiments on data from different domains, including climate analysis, natural language processing and video surveillance, have shown that the algorithm proposed in this work can serve as a generic, unsupervised anomaly detection technique that can facilitate tasks such as process control, data analysis and knowledge discovery. These application examples emphasize the importance of interval-based anomaly detection techniques, and we hope that our work is able to motivate further research in this area.
For processing data with a large spatial extent or a high number of dimensions, a full scan over all possible sub-blocks of the data would be prohibitively time-consuming. To this end, we have introduced a novel interval proposal technique that can reduce computation time significantly. However, interval proposals usually lead to less accurate detections, which is particularly noticeable with regard to spatial dimensions. Future work might hence investigate applying in-depth analysis not only to the proposed intervals themselves, but also to their neighborhood. An alternative might be a hierarchical approach of successive refinement.
Other open problems to be addressed in the future include efficient probability density estimation in the face of high-dimensional data, the automatic determination of suitable parameters for time-delay embedding, and tracking anomalies moving in space over time. Furthermore, it is often necessary to convince the expert analyst that a detected anomaly really is an anomaly. Thus, future work will include the development of an attribution scheme that can explain which variables or combinations of variables caused a detection and why.
The support of the project EU H2020-EO-2014 project BACI “Detecting changes in essential ecosystem and biodiversity properties-towards a Biosphere Atmosphere Change Index”, contract 640176, is gratefully acknowledged.
-  G. Walker, “World weather,” Quarterly Journal of the Royal Meteorological Society, vol. 54, no. 226, pp. 79–87, 1928.
-  M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, “Lof: identifying density-based local outliers,” in ACM sigmod record, vol. 29, no. 2. ACM, 2000, pp. 93–104.
-  J. Kim and C. D. Scott, “Robust kernel density estimation,” Journal of Machine Learning Research, vol. 13, no. Sep, pp. 2529–2565, 2012.
-  J. F. MacGregor and T. Kourti, “Statistical process control of multivariate processes,” Control Engineering Practice, vol. 3, no. 3, pp. 403–414, 1995.
-  B. Schölkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson, “Estimating the support of a high-dimensional distribution,” Neural computation, vol. 13, no. 7, pp. 1443–1471, 2001.
-  E. Keogh, J. Lin, and A. Fu, “Hot sax: Efficiently finding the most unusual time series subsequence,” in IEEE International Conference on Data Mining (ICDM). IEEE, 2005, pp. 8–pp.
-  H. Ren, M. Liu, X. Liao, L. Liang, Z. Ye, and Z. Li, “Anomaly detection in time series based on interval sets,” IEEJ Transactions on Electrical and Electronic Engineering, 2018.
-  S. Liu, M. Yamada, N. Collier, and M. Sugiyama, “Change-point detection in time-series data by relative density-ratio estimation,” Neural Networks, vol. 43, pp. 72–83, 2013.
-  P. Senin, J. Lin, X. Wang, T. Oates, S. Gandhi, A. P. Boedihardjo, C. Chen, and S. Frankenstein, “Grammarviz 3.0: Interactive discovery of variable-length time series patterns,” ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 12, no. 1, p. 10, 2018.
-  M. Jiang, A. Beutel, P. Cui, B. Hooi, S. Yang, and C. Faloutsos, “A general suspiciousness metric for dense blocks in multimodal data,” in IEEE International Conference on Data Mining (ICDM). IEEE, 2015, pp. 781–786.
E. Wu, W. Liu, and S. Chawla, “Spatio-temporal outlier detection in precipitation data,” inKnowledge discovery from sensor data. Springer, 2010, pp. 115–133.
-  D. M. Hawkins, Identification of outliers. Springer, 1980, vol. 11.
P. Viola and M. J. Jones, “Robust real-time face detection,”International journal of computer vision, vol. 57, no. 2, pp. 137–154, 2004.
-  N. H. Packard, J. P. Crutchfield, J. D. Farmer, and R. S. Shaw, “Geometry from a time series,” Physical review letters, vol. 45, no. 9, p. 712, 1980.
-  F. Takens, “Detecting strange attractors in turbulence,” in Dynamical systems and turbulence. Springer, 1981, pp. 366–381.
-  A. Kut and D. Birant, “Spatio-temporal outlier detection in large databases,” CIT. Journal of computing and information technology, vol. 14, no. 4, pp. 291–297, 2006.
-  T. Cheng and Z. Li, “A multiscale approach for spatio-temporal outlier detection,” T. GIS, vol. 10, no. 2, pp. 253–263, 2006.
-  J. Duchi, “Derivations for linear algebra and optimization,” Berkeley, California, 2007.
-  N. A. Ahmed and D. Gokhale, “Entropy expressions and their estimators for multivariate distributions,” IEEE Transactions on Information Theory, vol. 35, no. 3, pp. 688–692, 1989.
-  E. Rodner, B. Barz, Y. Guanche, M. Flach, M. Mahecha, P. Bodesheim, M. Reichstein, and J. Denzler, “Maximally divergent intervals for anomaly detection,” in ICML Workshop on Anomaly Detection (ICML-WS), 2016.
-  T. W. Anderson, “An introduction to multivariate statistical analysis,” Wiley New York, Tech. Rep., 1962.
-  T. Kanungo and R. M. Haralick, “Multivariate hypothesis testing for gaussian data: Theory and software,” Citeseer, Tech. Rep., 1995.
-  J. Lin, “Divergence measures based on the shannon entropy,” IEEE Transactions on Information theory, vol. 37, no. 1, pp. 145–151, 1991.
-  J. R. Hershey and P. A. Olsen, “Approximating the kullback leibler divergence between gaussian mixture models,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 4. IEEE, 2007, pp. IV–317.
-  C. Paciorek and M. Schervish, “Nonstationary covariance functions for gaussian process regression,” Advances in neural information processing systems (NIPS), vol. 16, pp. 273–280, 2004.
-  Helmholtz-Zentrum Geesthacht, Zentrum für Material- und Küstenforschung GmbH. (2012) coastdat-1 waves north sea wave spectra hindcast (1948-2007). World Data Center for Climate (WDCC). [Online]. Available: https://doi.org/10.1594/WDCC/coastDat-1_Waves
-  E. Kalnay, M. Kanamitsu, R. Kistler, W. Collins, D. Deaven, L. Gandin, M. Iredell, S. Saha, G. White, J. Woollen et al., “The ncep/ncar 40-year reanalysis project,” Bulletin of the American meteorological Society, vol. 77, no. 3, pp. 437–471, 1996.
-  P. Koehn, “Europarl: A parallel corpus for statistical machine translation,” in MT summit, vol. 5, 2005, pp. 79–86.
-  T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” Proceedings of Workshop at ICLR, 2013.
-  R. Vezzani and R. Cucchiara, “Video surveillance online repository (visor): an integrated framework,” Multimedia Tools and Applications, vol. 50, no. 2, pp. 359–380, 2010, video used in experiments: http://www.openvisor.org/video_details.asp?idvideo=339.
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,”arXiv preprint arXiv:1408.5093, 2014, network architecture and parameters: https://github.com/BVLC/caffe/tree/master/models/bvlc_reference_caffenet.
Appendix A Generalized extraction of the sum over certain range from a higher-order tensor of cumulative sums
Let with being an -order data tensor and being a tensor of cumulative sums given by
The sum over all elements of in the range can then be reconstructed from with additions/subtractions according to
For the basic case of it can easily be seen that
Now assume that theorem 1 holds for . Applying it for gives
Since the first indices of are fixed in the scope of the inner sum and only the last index varies, the basic case for can be applied to that inner sum expression, transforming the right-hand side of the equation to
Appendix B North Sea Storm Detections
Each heatmap shows the state of the three variables at the middle of the top 5 detected time frames. The static red box marks the spatial subset of the data that has been used for this experiment described in Section 4.1. Heatmaps are best viewed in color.
Animated heatmaps for more detections can be found on our web page: http://www.inf-cv.uni-jena.de/libmaxdiv_applications.html.
|1||1962-02-16 05:00 – 1962-02-18 06:00||831.635||Hamburg-Flut (Feb 16-17)|
|2||1990-12-11 22:00 – 1990-12-13 10:00||824.840||Storm 1990/Dec (Dec 12)|
|3||1965-02-13 01:00 – 1965-02-15 23:00||797.781|
|4||2000-01-29 12:00 – 2000-01-31 02:00||796.528||Cyclone Kerstin (Jan 29-31)|
|5||1981-11-23 19:00 – 1981-11-25 22:00||745.951||North Frisian Flood (Nov 24)|
|6||1989-02-13 11:00 – 1989-02-16 10:00||714.684||Storm 1989/Feb (Feb 14)|
|7||1988-02-28 04:00 – 1988-03-02 03:00||710.495|
|8||1973-12-12 16:00 – 1973-12-15 15:00||673.886||Storm 1973/Dec (2) (Dec 13-15)|
|9||1998-12-26 00:00 – 1998-12-28 05:00||658.592||Cyclone Stephen (Dec 26-27)|
|10||1984-01-02 15:00 – 1984-01-05 14:00||658.306|
|11||1977-11-13 00:00 – 1977-11-15 23:00||592.562|
|12||1980-02-26 00:00 – 1980-02-28 23:00||573.913|
|13||1999-02-04 15:00 – 1999-02-07 14:00||572.603||Storm 1999/Feb (Feb 05)|
|14||2006-10-31 07:00 – 2006-11-01 21:00||560.806||Cyclone Britta (Oct 31 - Nov 01)|
|15||1995-01-09 21:00 – 1995-01-12 20:00||554.777|
|16||1983-01-17 20:00 – 1983-01-20 14:00||545.856||Storm 1983/Jan (Jan 17-20)|
|17||1991-10-17 00:00 – 1991-10-19 23:00||537.879|
|18||1996-11-05 07:00 – 1996-11-07 05:00||519.837||Storm 1996/Nov (Nov 05-07)|
|19||1976-01-20 10:00 – 1976-01-23 02:00||508.532||Storm 1976/Jan (2) (Jan 21)|
|20||1993-01-24 02:00 – 1993-01-27 01:00||506.250||Storm 1993/Jan (2) (Jan 22-25)|
|21||1973-11-19 05:00 – 1973-11-20 16:00||494.595||Storm 1973/Nov (3) (Nov 19-20)|
|22||1992-12-23 15:00 – 1992-12-26 14:00||491.438|
|23||1977-12-29 22:00 – 1977-12-31 17:00||489.287|
|24||2004-02-07 07:00 – 2004-02-09 23:00||485.346||Cyclone Ursula (Feb 07-08)|
|25||1984-01-12 01:00 – 1984-01-15 00:00||485.231||Storm 1984/Jan (Jan 14)|
|26||1991-01-04 19:00 – 1991-01-07 18:00||471.642||Storm Undine (Jan 02-09)|
|27||1973-11-11 03:00 – 1973-11-14 02:00||471.398||Storm 1973/Nov (1) (Nov 13-14)|
|28||2004-11-17 09:00 – 2004-11-20 08:00||460.155|
|29||1973-12-04 09:00 – 1973-12-07 08:00||445.639||Storm 1973/Dec (1) (Dec 06-07)|
|30||1961-03-26 13:00 – 1961-03-29 01:00||423.340|
|31||1994-01-28 06:00 – 1994-01-31 05:00||420.520||Cyclone Lore (Jan 27-28)|
|32||1980-04-19 03:00 – 1980-04-21 18:00||420.186|
|33||1999-12-23 18:00 – 1999-12-26 05:00||418.482||Cyclone Lothar (Dec 25-26)|
|34||1988-10-02 07:00 – 1988-10-05 02:00||412.905|
|35||1970-10-19 02:00 – 1970-10-22 01:00||411.187|
|36||2007-01-10 20:00 – 2007-01-13 18:00||407.993||Cyclone Franz (Jan 11)|
|37||2007-11-07 11:00 – 2007-11-10 10:00||402.426||Cyclone Tilo (Nov 06-11)|
|38||1990-09-19 07:00 – 1990-09-22 06:00||397.632|
|39||1993-02-19 00:00 – 1993-02-21 23:00||387.598||Storm 1993/Feb (Feb 20-21)|
|40||1998-10-24 11:00 – 1998-10-27 08:00||382.914||Cyclone Xylia (Oct 27-28)|
|41||2003-12-12 15:00 – 2003-12-15 14:00||377.435||Cyclone Fritz (Dec 13-15)|
|42||1991-12-23 06:00 – 1991-12-26 03:00||374.911|
|43||2002-02-19 17:00 – 2002-02-22 16:00||374.026||Storm 2002/Feb (1) (Feb 21-23)|
|44||1997-03-08 12:00 – 1997-03-11 11:00||371.758|
|45||1959-02-17 02:00 – 1959-02-20 01:00||369.272|
|46||1974-12-11 06:00 – 1974-12-14 05:00||365.459|
|47||1994-12-06 02:00 – 1994-12-09 01:00||358.443|
|48||2001-12-20 08:00 – 2001-12-23 07:00||357.280|
|49||1992-11-18 14:00 – 1992-11-21 02:00||343.076|
|50||2006-12-29 22:00 – 2007-01-01 21:00||340.920||Cyclone Karla (Dec 30-31)|
Appendix C Low Pressure Area Detections
Each row shows the heatmap of Sea Level Pressure at the beginning, the middle, and the end of the detection. The red box marks the detected area. Details on this experiment can be found in Section 4.2.
Heatmaps are best viewed in color. An animated video showing all detections can be found on our web-page: http://www.inf-cv.uni-jena.de/libmaxdiv_applications.html.
1996-01-06 – 1996-01-15
1990-01-28 – 1990-02-06
1989-12-22 – 1989-12-31
2009-01-18 – 2009-01-27
|1||1996-01-06 – 1996-01-15||40.0° N, -52.5° E – 65.0° N, -2.5° E||5940.691|
|2||1990-01-28 – 1990-02-06||47.5° N, -52.5° E – 65.0° N, 7.5° E||5551.022||Storm Herta (Feb 01-05)|
|3||1989-12-22 – 1989-12-31||45.0° N, -52.5° E – 65.0° N, -2.5° E||5198.513|
|4||2009-01-18 – 2009-01-27||47.5° N, -52.5° E – 65.0° N, 15.0° E||4959.829||Cyclone Joris (Jan 23)|
|5||1982-12-14 – 1982-12-23||50.0° N, -52.5° E – 65.0° N, 15.0° E||4811.575|
|6||1990-12-25 – 1991-01-03||52.5° N, -52.5° E – 65.0° N, 15.0° E||4703.993||Storm Undine (Jan 02-09)|
|7||1974-01-03 – 1974-01-12||47.5° N, -52.5° E – 65.0° N, -5.0° E||4594.737|
|8||1986-12-08 – 1986-12-17||47.5° N, -52.5° E – 65.0° N, -2.5° E||4417.568||Storm 1986/Dec (Dec 14-15)|
|9||1997-12-30 – 1998-01-08||50.0° N, -52.5° E – 65.0° N, 10.0° E||4377.532||Cyclone Fanny (Jan 03-05)|
|10||1995-01-26 – 1995-02-04||47.5° N, -52.5° E – 65.0° N, 15.0° E||4376.735|
|11||2006-12-03 – 2006-12-12||47.5° N, -52.5° E – 65.0° N, 15.0° E||4306.923|
|12||1997-02-18 – 1997-02-27||52.5° N, -52.5° E – 65.0° N, 15.0° E||4249.087|
|13||1958-01-04 – 1958-01-13||50.0° N, -52.5° E – 65.0° N, 15.0° E||4206.594|
|14||1978-12-06 – 1978-12-15||45.0° N, -52.5° E – 65.0° N, 2.5° E||4151.843|
|15||1976-12-01 – 1976-12-10||47.5° N, -52.5° E – 65.0° N, 15.0° E||4139.642|
|16||1971-01-18 – 1971-01-27||45.0° N, -52.5° E – 65.0° N, 15.0° E||4030.477|
|17||1992-11-29 – 1992-12-08||47.5° N, -52.5° E – 65.0° N, 15.0° E||3962.119|
|18||1994-01-27 – 1994-02-05||50.0° N, -52.5° E – 65.0° N, 15.0° E||3933.832||Cyclone Lore (Jan 27-28)|
|19||2007-12-02 – 2007-12-11||47.5° N, -52.5° E – 65.0° N, 15.0° E||3931.694||Cyclone Fridtjof (Dec 02-03)|
|20||1959-12-18 – 1959-12-27||47.5° N, -52.5° E – 65.0° N, 15.0° E||3910.999|
Appendix D Top 10 Anomalous Paragraphs in the Book Genesis
Each detected word sequence is shown with some context colored in grey before and after the detection. See Section 4.3 for details on this experiment.
The text has been taken from the “genesis” corpus included in the Natural Language Toolkit (NLTK) for Python and is not free of noise.
Detection #1: words 3218 – 3613 (Score: 56462.266)
and called their name Adam , in the day when they were created . And Adam lived an hundred and thirty years , and begat a son in his own likeness , and after his image ; and called his name Se And the days of Adam after he had begotten Seth were eight hundred yea and he begat sons and daughters : And all the days that Adam lived were nine hundred and thirty yea and he died . And Seth lived an hundred and five years , and begat Enos : And Seth lived after he begat Enos eight hundred and seven years , and begat sons and daughte And all the days of Seth were nine hundred and twelve years : and he died . And Enos lived ninety years , and begat Cainan : And Enos lived after he begat Cainan eight hundred and fifteen years , and begat sons and daughte And all the days of Enos were nine hundred and five years : and he died . And Cainan lived seventy years and begat Mahalaleel : And Cainan lived after he begat Mahalaleel eight hundred and forty years , and begat sons and daughte And all the days of Cainan were nine hundred and ten years : and he died . And Mahalaleel lived sixty and five years , and begat Jared : And Mahalaleel lived after he begat Jared eight hundred and thirty years , and begat sons and daughte And all the days of Mahalaleel were eight hundred ninety and five yea and he died . And Jared lived an hundred sixty and two years , and he begat Eno And Jared lived after he begat Enoch eight hundred years , and begat sons and daughte And all the days of Jared were nine hundred sixty and two yea and he died . And Enoch lived sixty and five years , and begat Methuselah : And Enoch walked with God after he begat Methuselah three hundred years , and begat sons and daughte And all the days of Enoch were three hundred sixty and five yea And Enoch walked with God : and he was not ; for God took him . And Methuselah lived an hundred eighty and seven years , and begat Lamech . And Methuselah lived after he begat Lamech seven hundred eighty and two years , and begat sons and daughte And all the days of Methuselah were nine hundred sixty and nine yea and he died . And Lamech lived an hundred eighty and two years , and begat a s And he called his name Noah , saying , This same shall comfort us concerning our work and toil of our hands , because of the ground which the LORD hath cursed .
Detection #2: words 30098 – 30568 (Score: 41058.093)
his house , and his cattle , and all his beasts , and all his substance , which he had got in the land of Canaan ; and went into the country from the face of his brother Jacob . For their riches were more than that they might dwell together ; and the land wherein they were strangers could not bear them because of their cattle . Thus dwelt Esau in mount Seir : Esau is Edom . And these are the generations of Esau the father of the Edomites in mount Se These are the names of Esau ’ s sons ; Eliphaz the son of Adah the wife of Esau , Reuel the son of Bashemath the wife of Esau . And the sons of Eliphaz were Teman , Omar , Zepho , and Gatam , and Kenaz . And Timna was concubine to Eliphaz Esau ’ s son ; and she bare to Eliphaz Amal these were the sons of Adah Esau ’ s wife . And these are the sons of Reuel ; Nahath , and Zerah , Shammah , and Mizz these were the sons of Bashemath Esau ’ s wife . And these were the sons of Aholibamah , the daughter of Anah the daughter of Zibeon , Esau ’ s wife and she bare to Esau Jeush , and Jaalam , and Korah . These were dukes of the sons of Esau : the sons of Eliphaz the firstborn son of Esau ; duke Teman , duke Omar , duke Zepho , duke Kenaz , Duke Korah , duke Gatam , and duke Amalek : these are the dukes that came of Eliphaz in the land of Edom ; these were the sons of Adah . And these are the sons of Reuel Esau ’ s son ; duke Nahath , duke Zerah , duke Shammah , duke Mizz these are the dukes that came of Reuel in the land of Edom ; these are the sons of Bashemath Esau ’ s wife . And these are the sons of Aholibamah Esau ’ s wife ; duke Jeush , duke Jaalam , duke Kor these were the dukes that came of Aholibamah the daughter of Anah , Esau ’ s wife . These are the sons of Esau , who is Edom , and these are their dukes . These are the sons of Seir the Horite , who inhabited the land ; Lotan , and Shobal , and Zibeon , and Anah , And Dishon , and Ezer , and Dishan : these are the dukes of the Horites , the children of Seir in the land of Edom . And the children of Lotan were Hori and Hemam ; and Lotan ’ s sister was Timna . And the children of Shobal were these ; Alvan , and Manahath , and Ebal , Shepho , and Onam . And these are the children of Zibeon ; both Ajah , and Anah : this was that Anah that found the mules in the wilderness , as he fed the asses of Zibeon his father . And the children of Anah were these ; Dishon , and Aholibamah the daughter of Anah . And these are the children of Dishon ; Hemdan , and Eshban , and Ithran , and Cheran .
Detection #3: words 7347 – 7684 (Score: 39679.642)
may not understand one another ’ s speech . So the LORD scattered them abroad from thence upon the face of all the ear and they left off to build the city . Therefore is the name of it called Babel ; because the LORD did there confound the language of all the ear and from thence did the LORD scatter them abroad upon the face of all the earth . These are the generations of Shem : Shem was an hundred years old , and begat Arphaxad two years after the flo And Shem lived after he begat Arphaxad five hundred years , and begat sons and daughters . And Arphaxad lived five and thirty years , and begat Salah : And Arphaxad lived after he begat Salah four hundred and three years , and begat sons and daughters . And Salah lived thirty years , and begat Eber : And Salah lived after he begat Eber four hundred and three years , and begat sons and daughters . And Eber lived four and thirty years , and begat Peleg : And Eber lived after he begat Peleg four hundred and thirty years , and begat sons and daughters . And Peleg lived thirty years , and begat Reu : And Peleg lived after he begat Reu two hundred and nine years , and begat sons and daughters . And Reu lived two and thirty years , and begat Serug : And Reu lived after he begat Serug two hundred and seven years , and begat sons and daughters . And Serug lived thirty years , and begat Nahor : And Serug lived after he begat Nahor two hundred years , and begat sons and daughters . And Nahor lived nine and twenty years , and begat Terah : And Nahor lived after he begat Terah an hundred and nineteen years , and begat sons and daughters . And Terah lived seventy years , and begat Abram , Nahor , and Haran . Now these are the generations of Terah : Terah begat Abram , Nahor , and Haran ; and Haran begat Lot . And Haran died before his father Terah in the land of his nativity , in Ur of the Chaldees . And Abram and Nahor took them wives : the name of Abram ’ s wife was Sarai ; and the name of Nahor ’ s wife , Milcah , the daughter of Haran , the father of Milcah , and the father of Iscah . But Sarai was barren ; she had no child .
Detection #4: words 30585 – 30993 (Score: 28796.840)
And these are the children of Zibeon ; both Ajah , and Anah : this was that Anah that found the mules in the wilderness , as he fed the asses of Zibeon his father . And the children of Anah were these ; Dishon , and Aholibamah the daughter of Anah . And these are the children of Dishon ; Hemdan , and Eshban , and Ithran , and Cheran . The children of Ezer are these ; Bilhan , and Zaavan , and Akan . The children of Dishan are these ; Uz , and Aran . These are the dukes that came of the Horites ; duke Lotan , duke Shobal , duke Zibeon , duke Anah , Duke Dishon , duke Ezer , duke Dishan : these are the dukes that came of Hori , among their dukes in the land of Seir . And these are the kings that reigned in the land of Edom , before there reigned any king over the children of Israel . And Bela the son of Beor reigned in Edom : and the name of his city was Dinhabah . And Bela died , and Jobab the son of Zerah of Bozrah reigned in his stead . And Jobab died , and Husham of the land of Temani reigned in his stead . And Husham died , and Hadad the son of Bedad , who smote Midian in the field of Moab , reigned in his ste and the name of his city was Avith . And Hadad died , and Samlah of Masrekah reigned in his stead . And Samlah died , and Saul of Rehoboth by the river reigned in his stead . And Saul died , and Baalhanan the son of Achbor reigned in his stead . And Baalhanan the son of Achbor died , and Hadar reigned in his ste and the name of his city was Pau ; and his wife ’ s name was Mehetabel , the daughter of Matred , the daughter of Mezahab . And these are the names of the dukes that came of Esau , according to their families , after their places , by their names ; duke Timnah , duke Alvah , duke Jetheth , Duke Aholibamah , duke Elah , duke Pinon , Duke Kenaz , duke Teman , duke Mibzar , Duke Magdiel , duke Iram : these be the dukes of Edom , according to their habitations in the land of their possessi he is Esau the father of the Edomites . And Jacob dwelt in the land wherein his father was a stranger , in the land of Canaan . These are the generations of Jacob . Joseph , being seventeen years old , was feeding the flock with his brethren ; and the lad was with the sons of Bilhah , and with the sons of Zilpah , his father ’ s wiv and Joseph brought unto his father their evil report .
Detection #5: words 40473 – 40821 (Score: 28436.096)
into Egypt , Jacob , and all his seed with h His sons , and his sons ’ sons with him , his daughters , and his sons ’ daughters , and all his seed brought he with him into Egypt . And these are the names of the children of Israel , which came into Egypt , Jacob and his so Reuben , Jacob ’ s firstborn . And the sons of Reuben ; Hanoch , and Phallu , and Hezron , and Carmi . And the sons of Simeon ; Jemuel , and Jamin , and Ohad , and Jachin , and Zohar , and Shaul the son of a Canaanitish woman . And the sons of Levi ; Gershon , Kohath , and Merari . And the sons of Judah ; Er , and Onan , and Shelah , and Pharez , and Zar but Er and Onan died in the land of Canaan . And the sons of Pharez were Hezron and Hamul . And the sons of Issachar ; Tola , and Phuvah , and Job , and Shimron . And the sons of Zebulun ; Sered , and Elon , and Jahleel . These be the sons of Leah , which she bare unto Jacob in Padanaram , with his daughter Din all the souls of his sons and his daughters were thirty and three . And the sons of Gad ; Ziphion , and Haggi , Shuni , and Ezbon , Eri , and Arodi , and Areli . And the sons of Asher ; Jimnah , and Ishuah , and Isui , and Beriah , and Serah their sist and the sons of Beriah ; Heber , and Malchiel . These are the sons of Zilpah , whom Laban gave to Leah his daughter , and these she bare unto Jacob , even sixteen souls . The sons of Rachel Jacob ’ s wife ; Joseph , and Benjamin . And unto Joseph in the land of Egypt were born Manasseh and Ephraim , which Asenath the daughter of Potipherah priest of On bare unto him . And the sons of Benjamin were Belah , and Becher , and Ashbel , Gera , and Naaman , Ehi , and Rosh , Muppim , and Huppim , and Ard . These are the sons of Rachel , which were born to Jacob : all the souls were fourteen . And the sons of Dan ; Hushim . And the sons of Naphtali ; Jahzeel , and Guni , and Jezer , and Shillem . These are the sons of Bilhah , which Laban gave unto Rachel his daughter , and she
Detection #6: words 12299 – 12486 (Score: 25531.722)
And Abraham answered and said , Behold now , I have taken upon me to speak unto the LORD , which am but dust and ash Peradventure there shall lack five of the fifty righteous : wilt thou destroy all the city for lack of five ? And he said , If I find there forty and five , I will not destroy it . And he spake unto him yet again , and said , Peradventure there shall be forty found there . And he said , I will not do it for forty ’ s sake . And he said unto him , Oh let not the LORD be angry , and I will spe Peradventure there shall thirty be found there . And he said , I will not do it , if I find thirty there . And he said , Behold now , I have taken upon me to speak unto the LO Peradventure there shall be twenty found there . And he said , I will not destroy it for twenty ’ s sake . And he said , Oh let not the LORD be angry , and I will speak yet but this on Peradventure ten shall be found there . And he said , I will not destroy it for ten ’ s sake . And the LORD went his way , as soon as he had left communing with Abrah and Abraham returned unto his place . And there came two angels to Sodom at even ; and Lot sat in the gate of Sod and Lot seeing them rose up
Detection #7: words 9069 – 9287 (Score: 25480.242)
Twelve years they served Chedorlaomer , and in the thirteenth year they rebelled . And in the fourteenth year came Chedorlaomer , and the kings that were with him , and smote the Rephaims in Ashteroth Karnaim , and the Zuzims in Ham , and the Emins in Shaveh Kiriathaim , And the Horites in their mount Seir , unto Elparan , which is by the wilderness . And they returned , and came to Enmishpat , which is Kadesh , and smote all the country of the Amalekites , and also the Amorites , that dwelt in Hazezontamar . And there went out the king of Sodom , and the king of Gomorrah , and the king of Admah , and the king of Zeboiim , and the king of Bela ( the same is Zoar ;) and they joined battle with them in the vale of Siddim ; With Chedorlaomer the king of Elam , and with Tidal king of nations , and Amraphel king of Shinar , and Arioch king of Ellasar ; four kings with five . And the vale of Siddim was full of slimepits ; and the kings of Sodom and Gomorrah fled , and fell there ; and they that remained fled to the mountain . And they took all the goods of Sodom and Gomorrah , and all their victuals , and went their way . And they took Lot , Abram ’ s brother ’ s son , who dwelt in Sodom , and his goods , and departed . And there came one that had escaped , and told Abram the Hebrew ; for he dwelt in the plain of Mamre the Amorite , brother of Eshcol , and brother of An and these were
Detection #8: words 6811 – 7104 (Score: 24522.926)
And the Arvadite , and the Zemarite , and the Hamathite : and afterward were the families of the Canaanites spread abroad . And the border of the Canaanites was from Sidon , as thou comest to Gerar , unto Gaza ; as thou goest , unto Sodom , and Gomorrah , and Admah , and Zeboim , even unto Lasha . These are the sons of Ham , after their families , after their tongues , in their countries , and in their nations . Unto Shem also , the father of all the children of Eber , the brother of Japheth the elder , even to him were children born . The children of Shem ; Elam , and Asshur , and Arphaxad , and Lud , and Aram . And the children of Aram ; Uz , and Hul , and Gether , and Mash . And Arphaxad begat Salah ; and Salah begat Eber . And unto Eber were born two sons : the name of one was Peleg ; for in his days was the earth divided ; and his brother ’ s name was Joktan . And Joktan begat Almodad , and Sheleph , and Hazarmaveth , and Jerah , And Hadoram , and Uzal , and Diklah , And Obal , and Abimael , and Sheba , And Ophir , and Havilah , and Jobab : all these were the sons of Joktan . And their dwelling was from Mesha , as thou goest unto Sephar a mount of the east . These are the sons of Shem , after their families , after their tongues , in their lands , after their nations . These are the families of the sons of Noah , after their generations , in their natio and by these were the nations divided in the earth after the flood . And the whole earth was of one language , and of one speech . And it came to pass , as they journeyed from the east , that they found a plain in the land of Shinar ; and they dwelt there .
Detection #9: words 11352 – 11512 (Score: 21242.191)
and I will make him a great nation . But my covenant will I establish with Isaac , which Sarah shall bear unto thee at this set time in the next year . And he left off talking with him , and God went up from Abraham . And Abraham took Ishmael his son , and all that were born in his house , and all that were bought with his money , every male among the men of Abraham ’ s house ; and circumcised the flesh of their foreskin in the selfsame day , as God had said unto him . And Abraham was ninety years old and nine , when he was circumcised in the flesh of his foreskin . And Ishmael his son was thirteen years old , when he was circumcised in the flesh of his foreskin . In the selfsame day was Abraham circumcised , and Ishmael his son . And all the men of his house , born in the house , and bought with money of the stranger , were circumcised with him . And the LORD appeared unto him in the plains of Mamre : and he sat in the tent door in the heat of the day ; And he lift up his eyes and looked , and , lo , three men stood by
Detection #10: words 22251 – 22399 (Score: 20712.606)
And give thee the blessing of Abraham , to thee , and to thy seed with thee ; that thou mayest inherit the land wherein thou art a stranger , which God gave unto Abraham . And Isaac sent away Jacob : and he went to Padanaram unto Laban , son of Bethuel the Syrian , the brother of Rebekah , Jacob ’ s and Esau ’ s mother . When Esau saw that Isaac had blessed Jacob , and sent him away to Padanaram , to take him a wife from thence ; and that as he blessed him he gave him a charge , saying , Thou shalt not take a wife of the daughers of Canaan ; And that Jacob obeyed his father and his mother , and was gone to Padanaram ; And Esau seeing that the daughters of Canaan pleased not Isaac his father ; Then went Esau unto Ishmael , and took unto the wives which he had Mahalath the daughter of Ishmael Abraham ’ s son , the sister of Nebajoth , to be his wife . And Jacob went out from Beersheba , and went toward Haran .