1 Introduction
Many pattern recognition methods strive towards deriving models from complex and noisy data. Such models try to describe the prototypical normal behavior of the system being observed, which is hard to model manually and whose state is often not even directly observable, but only reflected by the data. They allow reasoning about the properties of the system, predicting unseen data, and assessing the “normality” of new data. In such a scenario, any deviation from the normal behavior present in the data is distracting and may impair the accuracy of the model. An entire arsenal of techniques has therefore been developed to eliminate abnormal observations prior to learning or to learn models in a robust way not affected by a few anomalies.
Such practices may easily lead to the perception of anomalies as being intrinsically bad and worthless. Though that is true for random noise and erroneous measurements, there may also be anomalies caused by rare events and complex processes. Embracing the anomalies in the data and studying the information buried in them can therefore lead to a deeper understanding of the system being analyzed and to the insight that the models hitherto employed were incomplete or—in the case of nonstationary processes—outdated. A wellknown example for this is the discovery of the correlation between the El Niño weather phenomenon and extreme surface pressures over the equator by Gilbert Walker [1] during the early 20 century through the analysis of extreme events in timeseries of climate data.
Thus, the use of anomaly detection techniques is not limited to outlier removal as a preprocessing step. In contrast, anomaly detection also is an important task
per se, since only the deviations from normal behavior are the actual object of interest in many applications. Besides the scenario of knowledge discovery mentioned above, fraud detection (e.g., credit card fraud or identity theft), intrusion detection in cybersecurity, fault detection in industrial processes, anomaly detection in healthcare (e.g., monitoring patient condition or detecting disease outbreaks), and early detection of environmental disasters are other important examples. Automated methods for anomaly detection are especially crucial nowadays, where huge amounts of data are available that cannot be analyzed by humans.In this article, we introduce a novel unsupervised method called “Maximally Divergent Intervals” (MDI), which can be employed to point the expert analysts to the interesting parts of the data, i.e., the anomalies. In contrast to most existing anomaly detection techniques (e.g., [2, 3, 4, 5]), we do not analyze the data on a pointwise basis, but search for contiguous intervals of time and regions in space that contain the anomalous event. This is motivated by the fact that anomalies driven by natural processes rather occur over a space of time and, in the case of spatiotemporal data, in a spatial region rather than at a single location at a single time. Moreover, the individual samples making up such a socalled collective anomaly do not have to be anomalous when considered in isolation, but may be an anomaly only as a whole. Thus, analysts will intuitively be searching for anomalous regions in the data instead of anomalous points and the algorithm assisting them should do so as well.
We achieve this by searching for anomalous blocks
in multivariate spatiotemporal data tensors, i.e., regions and time frames whose data distribution deviates most from the distribution of the remaining timeseries. To this end, we compare several existing measures for the divergence of distributions and derive a new one that is invariant against varying length of the intervals being compared. A fast novel interval proposal technique allows us to reduce the computational cost of this procedure by just analyzing a small portion of particularly interesting parts of the data. Experiments on climate data, videos, and text corpora will demonstrate that our method can be applied to a variety of applications without major adaptations.
Despite the importance of this task across domains, there has been very limited research on the detection of anomalous intervals in multivariate timeseries data, though this problem has been known for a couple of years: Keogh et al. [6] have already tackled this task in 2005 with a method they called “HOT SAX”. They try to find anomalous subsequences (“discords”) of timeseries by representing all possible subsequences of length as a
dimensional vector and using the Euclidean distance to the nearest neighbor in that space as anomaly score. More recently, Ren et al.
[7] use handcrafted interval features based on the frequency of extreme values and search for intervals whose features are maximally different from all other intervals. However, both methods are limited to univariate data and a fixed length of the intervals must be specified in advance.The latter is also true for a multivariate approach proposed by Liu et al. [8] who compare two consecutive intervals of fixed size in a timeseries using the KullbackLeibler or the Pearson divergence for detecting changepoint anomalies, i.e., points where a permanent change of the distribution of the data occurs. This is a different task than finding intervals that are anomalous with regard to all the remaining data. In addition, their method does not scale well for detecting anomalous intervals of dynamic size and is hence not applicable for detecting other types of anomalies, for which a broader context has to be taken into account.
The task of detecting anomalous intervals of dynamic size has recently been tackled by Senin et al. [9], who search for typical and anomalous patterns in timeseries by inducing a grammar on a symbolic discretization of the data. As opposed to our approach, their method cannot handle multivariate or spatiotemporal data.
Similar to our approach, Jiang et al. [10]
search for anomalous blocks in higherorder tensors using the KullbackLeibler divergence, but apply their method to discrete data only (e.g., relations in social networks) and use a Poisson distribution for modeling the data. Since their search strategy is very specific to applications dealing with graph data, it is not applicable in the general case for multivariate continuous data dealt with in our work.
Regarding spatiotemporal data, Wu et al. [11] follow a sequential approach for detecting anomalies first spatially, then temporally and apply a mergestrategy afterwards. However, the time needed for merging grows exponentially with the length of the timeseries and their divergence measure is limited to binaryvalued data. In contrast to this, our approach is able to deal with multivariate realvalued data efficiently and treats time and space jointly.
The remainder of this article is organized as follows: Section 2 will introduce our novel “Maximally Divergent Intervals” algorithm for offline detection of collective anomalies in multivariate spatiotemporal data. Its performance will be evaluated quantitatively on artificial data in Section 3 and its suitability for practical applications will be demonstrated by means of experiments on real data from various different domains in Section 4. Section 5 will summarize the progress made so far and mention directions for future research.
2 Maximally Divergent Intervals
This section formally introduces our MDI algorithm for offline detection of anomalous intervals in spatiotemporal data. After a set of definitions that we are going to make use of, we start by giving a very rough overview of the basic idea behind the algorithm, which is also illustrated schematically in Figure 1. The subsequent subsections will go into more detail on the individual aspects and components of our approach.
Our implementation of the MDI algorithm is available as open source at: https://cvjena.github.io/libmaxdiv/
2.1 Definitions
Let be a multivariate spatiotemporal timeseries given as 5order tensor with 4 contextual attributes (point of time and spatial location) and behavioral attributes for all samples. We will index individual samples using 4tuples like in .
The usual interval notation will be used in the following for discrete intervals . Furthermore, the set of all intervals with size between and along an axis of size is denoted by
(1) 
The set of all subblocks of a data tensor complying with given size constraints can then be defined as
(2) 
In the following, we will often omit the indices for simplicity and just refer to it as .
Given any subblock , the remaining part of the timeseries excluding that specific range can be defined as
(3) 
and we will often simply refer to it as if the corresponding range is obvious from the context.
2.2 Idea and Algorithm Overview
The approach pursued by the MDI algorithm to compute anomaly scores for all intervals can be motivated by a longstanding definition of anomalies given by Douglas Hawkins [12] in 1980, who defines an anomaly as “an observation which deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism”. In analogy to this definition, the MDI algorithm assumes that there is a subblock of the given timeseries that has been generated according to “a different mechanism” than the rest of the timeseries in (cf. the schematic illustration in Figure 1
). The algorithm tries to capture these mechanisms by modelling the probability density
of the data in the inner interval and the distribution in the outer interval. We investigate two different models for these distributions: Kernel Density Estimation (KDE) and multivariate normal distributions (Gaussians), which will be explained in detail in
Section 2.3.Moreover, a measure for the degree of “deviation” of from has to be defined. Like some other works on collective anomaly detection [8, 10], we use—among others—the KullbackLeiber (KL) divergence for this purpose. However, Section 2.5 will show that this is a suboptimal choice when used without a slight modification and discuss alternative divergence measures.
Given these ingredients, the underlying optimization problem for finding the most anomalous interval can be described as
(4) 
Various possible choices for the divergence measure will be discussed in Section 2.5.
In order to actually locate this “maximally divergent interval” , the MDI algorithm scans over all intervals , estimates the distributions and and computes the divergence between them, which becomes the anomaly score of the interval . The parameters and , which define the minimum and the maximum size of the intervals in question, have to be specified by the user in advance. This is not a severe restriction, since extreme values may be chosen for these parameters in exchange for increased computation time. But depending on the application and the focus of the analysis, there is often prior knowledge about reasonable limits for the size of possible intervals.
After the anomaly scores have been obtained for all intervals, they are sorted in descending order and nonmaximum suppression is applied to obtain nonoverlapping intervals only. For large timeseries with more than 10k samples, we apply an approximative nonmaximum suppression that avoids storing all interval scores by maintaining a fixedsize list of currently bestscoring nonoverlapping intervals.
Finally, the algorithm returns a ranking of intervals, so that a userspecified number of top intervals can be selected as output.
2.3 Probability Density Estimation
The divergence measure used in (4) requires the notion of the distribution of the data in the intervals and . We will hence discuss in the following, which models we employ to estimate these distributions and how this can be done efficiently.
2.3.1 Models
The choice of a specific model for the distributions and imposes some assumptions about the data which may not conform to reality. However, since the MDI algorithm estimates the parameters of those distributions for all possible intervals in the timeseries, the use of models that can be updated efficiently is crucial. One such model is Kernel Density Estimation (KDE) with
(5) 
using a Gaussian kernel
(6) 
On the one hand, KDE is a very flexible model, but on the other hand, it does not scale well to long timeseries and does not take correlations between attributes into account. The second proposed model does not expose these problems: It assumes that both the data in the anomalous interval and in the remaining timeseries are distributed according to multivariate normal distributions (Gaussians) and , respectively.
2.3.2 Efficient Estimation with Cumulative Sums
Both distribution models described above involve a summation over all samples in the respective interval. Performing this summation for multiple intervals is redundant, because some of them overlap with each other. Such a naïve approach of finding the maximally divergent interval has a time complexity of with KDE and
with Gaussian distributions. This is due to the number of
intervals (with being the maximum volume of an interval), each of them requiring a summation over samples for the evaluation of one of the divergence measures described later in Section 2.5. For KDE,distance computations are necessary for the evaluation of the probability density function for each sample, while for Gaussian distributions a summation over all
samples has to be performed for each interval to estimate the parameters of the distributions.This would be clearly infeasible for largescale data. However, these computations can be sped up significantly by using cumulative sums [13]. For the sake of clarity, we first consider the special case of a nonspatial timeseries . With regard to KDE, a matrix of cumulative sums of kernelized distances can be used:
(7) 
This matrix has to be computed only once, which requires distance calculations, and can then be used to estimate the probability density functions of the data in the intervals and in constant time:
(8) 
In analogy, a matrix of cumulative sums over the samples and a tensor of cumulative sums over the outer products of the samples can be used to speed up the estimation of the parameters of Gaussian distributions:
(9) 
where and are the th column of and the th matrix of , respectively. Using these matrices, the mean vectors and covariance matrices can be estimated in constant time.
This technique can be generalized to the spatiotemporal scenario using higher order tensors for storing the cumulative sums. The reconstruction of a sum over a given range from such a cumulative tensor follows the InclusionExclusion Principle and the number of summands involved in the computation grows, thus, exponentially with the order of the tensor, being 16 for a 4order tensor, compared to only 2 summands in the nonspatial case. The exact equation describing the reconstruction in the general case of an order tensor is given in Appendix A.
Thanks to the use of cumulative sums, the computational complexity of the MDI algorithm is reduced to for the case of KDE and to for Gaussian distributions.
2.4 Incorporation of Context
The models used for probability density estimation described in the previous section are based on the assumption of independent samples. However, this assumption is almost never true for real data, since the value at a specific point of time and spatial location is likely to be strongly correlated with the values at previous times and nearby locations. To mitigate this issue, we apply two kinds of embeddings that incorporate context into each sample as preprocessing step.
2.4.1 TimeDelay Embedding
Aiming to make combinations of observed values more representative of the hidden state of the system being observed, timedelay embedding [14] incorporates context from previous timesteps into each sample by transforming a given timeseries , into another timeseries , given by
(10) 
where the embedding dimension specifies the number of samples to stack together and the time lag specifies the gap between two consecutive timesteps to be included as context. An illustrative example is given in Figure 2.
This method is often motivated by Takens’ theorem [15], which, roughly, states that for a certain embedding dimension the hidden state of the system can be reconstructed given the observations of the last timesteps.
2.4.2 SpatialNeighbor Embedding
Correlations between nearby spatial locations are handled similarly: In addition to timedelay embedding, each sample of a spatiotemporal timeseries can be augmented by the features of its spatial neighbors (cf. Figure 3) to enable the detection of spatial or spatiotemporal anomalies. This preprocessing step, which we refer to as spatialneighbor embedding, is parametrized with 3 parameters for the embedding dimension along each spatial axis and 3 parameters for the lag along each axis.
Note that, in contrast to timedelay embedding, neighbors from both directions are aggregated, since spatial context is bilinear. For example, would mean to consider 4 neighbors along the axis, 2 in each direction.
Spatialneighbor embedding can either be applied before or after timedelay embedding. As opposed to many spatiotemporal anomaly detection approaches that perform temporal and spatial anomaly detection sequentially (e.g., [11, 16, 17]), the MDI algorithm in combination with the two embeddings allows for a joint optimization. However, it implies a much more drastic multiplication of the data size.
2.5 Divergences
A suitable measure for the deviation of the distribution from is an essential part of the MDI algorithm. The following subsections introduce several divergence measures we have investigated and propose a modification to the wellknown KullbackLeibler (KL) divergence that is necessary for being able to compare divergences of distributions estimated from intervals of different size.
2.5.1 Cross Entropy
Numerous divergence measures, including those described in the following, have been derived from the domain of information theory. Being one of the most basic information theoretic concepts, the cross entropy between two distributions given by their probability density functions and may already be used as a divergence measure:
(11) 
Cross entropy measures how surprising a sample drawn from is, assuming that it would have been drawn from , and is hence already eligible as a divergence measure, since the unexpectedness grows when and are very different.
Since the MDI algorithm assumes, that the data in the intervals and have been sampled from the distributions corresponding to and , respectively, the cross entropy of the two distributions can be approximated empirically from the data:
(12) 
This approximation has the advantage of having to estimate only one probability density, , explicitly. This is particularly beneficial, since the possibly anomalous intervals often contain only few samples, so that an accurate estimation of the probability density is difficult.
2.5.2 KullbackLeibler Divergence
The KullbackLeibler (KL) divergence is a popular divergence measure that builds upon the fundamental concept of cross entropy. Given two distributions and , the KL divergence can be defined as follows:
(13) 
As opposed to the pure cross entropy of and , the KL divergence does not only take into account how well is explained by , but also the intrinsic entropy of , so that an interval with a stable distribution would get a higher score than an oscillating one if they had the same cross entropy with the rest of the timeseries.
Like cross entropy, the KL divergence can be approximated empirically from the data, but in contrast to cross entropy, this requires estimating the probability densities of both distributions, and :
(14) 
When used in combination with the Gaussian distribution model, the KL divergence comes with an additional advantage from a computational point of view, since there is a known closedform solution for the KL divergence of two Gaussians [18]:
(15) 
This allows evaluating the KL divergence in constant time for a given interval, which reduces the computational complexity of the MDI algorithm using the KL divergence in combination with Gaussian models to the number of possible intervals: .
Given this explicit solution for the KL divergence and the closedform solution for the entropy of a Gaussian distribution [19] with mean vector and covariance matrix , which is given by
(16) 
one can easily derive a closedform solution for the cross entropy of those two distributions as well:
(17)  
Compared with the KL divergence, this does not assign extremely high scores to small intervals
with a low variance, due to the subtraction of
. This may be an explanation for the evaluation results in Section 3, where cross entropy in combination with Gaussian models is often superior to the KL divergence, although it does not account for intervals of varying entropy.However, in contrast to the empirical approximation of cross entropy in (12), this requires the estimation of .
2.5.3 Polarity of the KL divergence and its effect on MDI
It is worth noting that the KL divergence is not a metric and, in particular, not symmetric: . Some authors use, thus, a symmetric variant [8]:
(18) 
This raises the question whether , , or the symmetric version should be used for the detection of anomalous intervals. Quantitative experiments with an early prototype of our method [20] have shown that neither nor provide good performance, as opposed to .
A visual inspection of the detections resulting from the use of with the assumption of Gaussian distributions shows that all the intervals with the highest anomaly scores have the minimum possible size specified by the user and a very low variance. An example is given in Figure 4. The scores of the top detections in that example are around 100 times higher than those yielded by .
This bias of towards small lowvariance intervals can also be explained theoretically. For the sake of simplicity, consider the special case of a univariate timeseries. In this case, the closedform solution for assuming Gaussian distributions given in (15) reduces to
(19) 
where , are the mean values and , are the variances of the distributions in the inner and in the outer interval, respectively. It can be seen from (19) that, due to the division by , the KL divergence will approach infinity when the variance in the inner interval converges towards . And since the algorithm has to estimate the variance empirically from the given data, it assigns high detection scores to intervals as small as possible, because smaller intervals have a higher chance of having a low empirical variance. The term cannot counterbalance this effect, though it is negative for , since its absolute value grows much more slowly than that of , as can be seen from the fact that , since .
In contrast, , where the roles of and are swapped, does not possess this deficiency, since is estimated from a much larger portion of data and, thus, is a more robust estimate.
The symmetric version is useless as well, since the scores obtained from will just be absorbed by the much higher scores of .
2.5.4 Statistical Analysis and Unbiased KL Divergence
Though does not overestimate the anomalousness of lowvariance intervals as extremely as does, the following theoretical analysis will show that it is not unbiased either. In contrast to the previous section, this bias is not related to the data itself, but to the length of the intervals: smaller intervals systematically get higher scores than longer ones. This harms the quality of interval detections, because anomalies will be split up into multiple contiguous small detections (see Figure 4(a) for an example).
Recall that denotes the set of all intervals of length in a timeseries with timesteps. Furthermore, let denote a dimensional vector with all coefficients being 0 and
the identity matrix of dimensionality
.When applying the MDI algorithm to a timeseries
, sampled independently and identically from plain white noise, an ideal divergence is supposed to yield constant average scores for all
(for some userdefined limits ), i.e., scores independent from the length of the intervals.For simplicity, we will first analyze the distribution of those scores using the MDI algorithm with Gaussian distributions with the simple, but for this data perfectly valid assumption of identity covariance matrices. In this case, the KL divergence of two Gaussian distributions with the mean vectors in some intervals for some arbitrary is given by . Moreover, since all samples in the timeseries are normally distributed, so are their empirical means:
Thus, all dimensions of the mean vectors are independent and identically normally distributed variables. Their difference is, hence, normally distributed too:
Thus,
is a vector of independent standard normal random variables and
(20) 
is the sum of the squares of
independent normal variables and, hence, distributed according to the chisquared distribution with
degrees of freedom, scaled by half the variance of the variables. The mean of a distributed random variable is and the mean of the scores for all intervals in is, accordingly, , which is inversely proportional to the length of the interval . Thus, the KL divergence is systematically biased towards smaller intervals.When the length of the timeseries is very large, the asymptotic scale of the chisquared distribution is and the estimated parameters of the outer distribution converge towards the parameters of the true distribution of the data. Thus, if the restriction of the Gaussian model to identity covariance matrices is weakened to a global, shared covariance matrix , the above findings also apply to the case of long timeseries with correlated variables and, hence, also when timedelay embedding is applied. Because in this case, the KL divergence reduces to and the subtraction of the true mean followed by the multiplication with the inverse covariance matrix can be considered as a normalization of the timeseries, transforming it to standard normal variables with uncorrelated dimensions.
For the general case of two unrestricted Gaussian distributions, the test statistic
(21) 
has been shown to be asymptotically distributed according to a chisquared distribution with degrees of freedom [21]. This test statistic is often used for testing the hypothesis that a given set of samples has been drawn from a Gaussian distribution with known parameters [22]. In the scenario of the MDI algorithm, the set of samples is the data in the inner interval and the parameters of the distribution to test that data against are those estimated from the data in the outer interval
. The null hypothesis of the test would be that the data in
has been sampled from the same distribution as the data in . The test statistic may then be used as a measure for how well the data in the interval fit the model established based on the data in the remainder of the timeseries.After some elementary reformulations, the relationship between this test statistic and the KL divergence becomes obvious: . This is exactly the normalization of the KL divergence by the scale factor identified in (20). Thus, we define an unbiased KL divergence as follows:
(22) 
The distribution of this divergence applied to asymptotically long timeseries depends only on the number of attributes and not on the length of the interval any more. However, this correction may also be useful for timeseries of finite length. An example of actual detections resulting from the use of the unbiased KL divergence compared with the original one can be seen in Figure 5.
A further advantage of knowing the distribution of the scores is that this knowledge can also be used for normalizing the scores with respect to the number of attributes, in order to make them comparable across timeseries of varying dimensionality. Moreover, it allows the selection of a threshold for distinguishing between anomalous and nominal intervals based on a chosen significance level. This may be preferred in some applications over searching for a fixed number of top detections.
Interestingly, Jiang et al. [10] have derived an equivalent unbiased KL divergence () from a different starting point based on the assumption of a Poisson distribution and the inverse loglikelihood of the interval as anomaly score.
scores and their gradients. The dashed black line indicates the mean of the scores and the dashed blue line marks a threshold that is 1.5 standard deviations above the mean. Timedelay embedding with
was applied before computing the scores.2.5.5 JensenShannon Divergence
A divergence measure that does not expose the problem of being asymmetric is the JensenShannon (JS) divergence, which builds upon the KL divergence:
(23) 
where and are probability density functions. is a mixture distribution, so that a sample is drawn either from or from
with equal probability (though a parametrized version of the JS divergence accounting for unequal prior probabilities exists as well, but will not be covered here).
The JS divergence possesses some desirable properties, which the KL divergence does not have: most notably, it is symmetric and bounded between and [23], so that anomaly scores cannot get infinitely high.
Like the KL divergence, the JS divergence can be approximated empirically from the data in the intervals and . However, there is no closedform solution for the JS divergence under the assumption of a Gaussian distribution (as opposed to the KL divergence), since
would then be a Gaussian Mixture Model (GMM). Though several approximations of the KL divergence of GMMs have been proposed, they are either computationally expensive or abandon essential properties such as positivity
[24]. This lack of a closedform solution is likely to be the reason why the JS divergence was clearly outperformed by the KL divergence in our quantitative experiments in Section 3 when the Gaussian model is used, despite its desirable theoretic properties.2.6 Interval Proposals for LargeScale Data
Exploiting cumulative sums and a closedform solution for the KL divergence, the asymptotic time complexity of the MDI algorithm with a Gaussian distribution model could already be reduced to be linear in the number of intervals (see Section 2.3.2). If the maximum length of an anomalous interval is independent from the number of samples , the runtime is also linear in . However, due to high constanttime requirements for estimating probability densities and computing the divergence, the algorithm is still too slow for processing largescale data sets with millions of samples.
Since anomalies are rare by definition, many of the intervals analyzed by a full scan will be uninteresting and irrelevant for the list of the top anomalies detected by the algorithm. In order to focus on the analysis of nontrivial intervals, we employ a simple proposal technique that selects interesting intervals based on pointwise anomaly scores.
Simply grouping contiguous detections of pointwise anomaly detection methods in order to retrieve anomalous intervals is insufficient, because it will most likely lead to splitup detections. However, it is not unreasonable to assume that many samples inside of an anomalous interval will also have a high pointwise score, especially after applying contextual embedding. Figure 6, for example, shows two exemplary timeseries from the synthetic data set introduced in Section 3.1 along with the pointwise scores retrieved by applying the Hotelling’s method [4], after timedelay embedding has been applied to the timeseries. Note that even in the case of the very subtle amplitudechange anomaly, the two highest Hotelling’s scores are at the beginning and the end of the anomaly. The idea is to apply a simple threshold operation on the pointwise scores to extract interesting points and then propose all those intervals for detailed scoring by a divergence measure whose first and last samples are among these points if the interval conforms to the size constraints.
This way, the probability density estimation and the computation of the divergence have to be performed for a comparatively small set of interesting intervals only and not for all possible intervals in the timeseries. The interval proposal method is not required to have a low falsepositive rate, though, because the divergence measure is responsible for the actual scoring. Instead, it has to act as a highrecall system so that truly anomalous intervals are not excluded from the actual analysis.
Since we are only interested in the beginning and end of the anomalies, the pointwise scores are not used directly, but the centralized gradient filter is applied to the scores for reducing them in areas of constant anomalousness and emphasizing changes of the anomaly scores.
The evaluation in Section 3.3 will show that the interval proposal technique can speedup the MDI algorithm significantly without impairing its performance.
3 Experimental Evaluation
In this section, we evaluate our MDI algorithm on a quantitative basis using synthetic data and compare it with other approaches wellknown in the field of anomaly detection.
3.1 Data Set
In contrast to many other established machine learning tasks, there is no widely used standard benchmark for the evaluation of anomaly detection algorithms; not for the detection of anomalous intervals and not even for the very common task of pointwise anomaly detection. This is mainly for the reason that the notion of an “anomaly” is not well defined and varies between different applications and even from analyst to analyst. Moreover, anomalies are, by definition, rare, which makes the collection of largescale data sets difficult. However, even if a large amount of data were available, it would be nearly impossible to annotate it in an intersubjective way everyone would agree with. But accurate and complete groundtruth information is mandatory for a quantitative evaluation and comparison of machine learning techniques. Therefore, we use a synthetic data set for assessing the performance of different variants of the MDI algorithm.
All timeseries in that data set have been sampled from a Gaussian process with a squaredexponential covariance function and zero mean function . The length scale of the GP has been set to and the noise parameter to . denotes Kronecker’s delta. Different types of anomalies have then been injected into these timeseries, with a size varying between 5% and 20% of the length of the timeseries:
meanshift: A random, but constant value is added to or subtracted from the anomalous samples.
meanshift_hard: A random, but constant value is added to or subtracted from the anomalous samples.
meanshift5: Five meanshift anomalies are inserted into the timeseries.
meanshift5_hard: Five meanshift_hard anomalies inserted into the timeseries.
amplitude_change: The timeseries is multiplied with a Gaussian window with standard deviation whose mean is the centre of the anomalous interval. Here, is the length of the anomalous interval and the amplitude of the Gaussian window is clipped at . This modified timeseries is added to the original one.
frequency_change: The timeseries is sampled from a nonstationary GP, whose covariance function uses a reduced length scale during the anomalous interval , so that correlations between samples are reduced, which leads to more frequent oscillations [25].
mixed:
The values in the anomalous interval are replaced with the values of another function sampled from the Gaussian process. 10 timesteps at the borders of the anomaly are interpolated between the two functions for a smooth transition. This rather difficult test case is supposed to reflect the concept of anomalies as being “generated by a different mechanism” (cf.
Section 2.2).The above test cases are all univariate, but there are as well similar multivariate scenarios meanshift_multvar, amplitude_change_multvar, frequency_change_multvar, and mixed_multvar with 5dimensional timeseries. Regarding the first three of these test cases, the corresponding anomaly is injected into one of the dimensions, while all attributes are replaced with those of the other timeseries in the mixed_multvar scenario, which is also a property of many real timeseries.
This results in a synthetic test data set with 11 test cases, a total of 1100 timeseries and an overall number of 1900 anomalies. Examples for all test cases are shown in Figure 7.
3.2 Performance Comparison
Since the detection of anomalous regions in spatiotemporal data is rather a detection than a classification task, we do not use the Area under the ROC Curve (AUC) as performance criterion like many works on pointwise anomaly detection do, but quantify the performance in terms of Average Precision (AP) with an Intersection over Union (IoU) criterion that allows an overlap between 50% and 100%.
Hotelling’s [4] and Robust Kernel Density Estimation (RKDE) [3] are used as baselines for the comparison. For RKDE, a Gaussian kernel with a standard deviation of
and the Hampel loss function are used. We obtain interval detections from those pointwise baselines by grouping contiguous detections based on multiple thresholds and applying nonmaximum suppression afterwards. The overlap threshold for nonmaximum suppression is set to 0 in all experiments to obtain nonoverlapping intervals only. To be fair, MDI also has to compete with the baselines on the task they have been designed for, i.e., pointwise anomaly detection, by means of AUC. The interval detections can be converted to pointwise detections easily by taking the score of the interval a sample belongs to as score for that sample.
Figure 8 shows that the performance of the MDI algorithm using the Gaussian model is clearly superior on the entire synthetic data set compared to the baselines by means of Mean AP and even on the task of pointwise anomaly detection measured by AUC. The polarity of the KL divergence has been used in all experiments following the argumentation in Section 2.5.3. In addition, the performance of the unbiased variant is reported for the Gaussian model. The parameters of timedelay embedding have been fixed to which we have empirically found to be suitable for this data set. For KDE, we used a Gaussian kernel with bandwidth .
While MDI KDE is already superior to the baselines, it is significantly outperformed by MDI Gaussian, which improves on the best baseline by 286%. This discrepancy between the MDI algorithm using KDE and using Gaussian models is mainly due to timedelay embedding, which is particularly useful for the Gaussian model, because it takes correlations of the variables into account, as opposed to KDE. As can be seen in Figure 9, the Gaussian model would be worse than KDE and on par with the baselines without timedelay embedding.
Considering the Mean AP on this synthetic data set, the unbiased KL divergence did not perform better than the original KL divergence. However, on the test cases meanshift5, meanshift5_hard, and meanshift_hard it achieved an AP twice as high as that of , which was poor on those data sets (see Figure 10). Since real data sets are also likely to contain multiple anomalies, we expect to be a more reliable divergence measure in practice.
Another interesting result is that cross entropy was the best performing divergence measure. This shows the advantage of reducing the impact of the inner distribution , which is estimated from very few samples. However, it may perform less reliably on real data whose entropy varies more widely over time than in this synthetic benchmark.
The JensenShannon divergence performed best for the KDE method, but worst for the Gaussian model. This can be explained by the lack of a closedform solution for the JS divergence, so that it has to be approximated from the data, while the KL divergence of two Gaussians can be computed exactly. This advantage of the combination of the KL divergence with Gaussians models is, thus, not only beneficial with respect to the runtime of the algorithm, but also with respect to its detection performance.
The differences between the results in Figure 8 are significant on a level of 5% according to the permutation test.
3.3 Interval Proposals
In order not to sacrifice detection performance for the sake of speed, the interval proposal method described in Section 2.6 has to act as a highrecall system proposing the majority of anomalous intervals. This can be controlled to some degree by adjusting the threshold applied to the pointwise scores, where and are the empirical mean and standard deviation of the pointwise scores, respectively. To find a suitable value for the hyperparameter , we have evaluated the recall of the proposed intervals for different values of using the usual IoU measure for distinguishing between true and false positive detections. The results in Figure 10(a) show that timedelay embedding is of a great benefit in this scenario too. Based on these results, we selected for subsequent experiments, which still provides a recall of 97% and is already able to reduce the number of intervals to be analyzed in detail significantly.
The processing of all the 1100 timeseries from the synthetic data set, which took 216 seconds on an Intel Core™ i73930K with 3.20GHz and eight virtual cores using the Gaussian model and the unbiased KL divergence after the usual timedelay embedding with , could be reduced to 5.2 seconds using interval proposals. This corresponds to a speedup by more than 40 times.
Though impressive, the speedup was expected. What was not expected, however, is that the use of interval proposals also increased the detection performance of the entire algorithm by up to 125%, depending on the divergence. The exact average precision achieved by the algorithm on the synthetic data set with a full scan over all intervals and with interval proposals is shown in Figure 10(b). This improvement is also reflected by the AUC scores not reported here and is, hence, not specific to the evaluation criterion. A possible explanation for this phenomenon is that some intervals that are uninteresting but distracting for the MDI algorithm are not even proposed for detailed analysis.
4 Application Examples on Real Data
The following application examples on real data from various different domains are intended to complement the quantitative results presented above with a demonstration of the feasibility of our approach for real problems.
4.1 Detection of North Sea Storms
To demonstrate the efficiency of the MDI algorithm on long timeseries, we apply it to storm detection in climate data: The coastDat1 hindcast [26] is a reconstruction of various marine climate variables measured at several locations over the southern North Sea between 51° N, 3° W and 56° N, 10.5° E with an hourly resolution over the 50 years from 1958 to 2007, i.e., approximately 450,000 time steps. Since measurements are not available at locations over land, we select the subset of the data between 53.9° N, 0° E and 56° N, 7.7° E, which results in a regular spatial grid of size located entirely over the sea (cf. Figure 12). Because cyclones and other storms usually have a large spatial extent and move over the region covered by the measurements, we reduce the spatiotemporal data to purely temporal data in this experiment by averaging over all spatial locations. The variables used for this experiment are significant wave height, mean wave period and wind speed.
We apply the MDI algorithm to that data set using the Gaussian model and the unbiased KL divergence. Since North Sea storms lasting longer than 3 days are usually considered two independent storms, the maximum length of the possible intervals is set to 72 hours, while the minimum length is set to 12 hours. The parameters of timedelay embedding are fixed to .
28 out of the top 50 and 7 out of the top 10 detections returned by the algorithm can be associated with wellknown historic storms. The highest scoring detection is the socalled “HamburgFlut” which flooded one fifth of Hamburg in February 1962 and caused 340 deaths. Also among the top 5 is the “North Frisian Flood”, which was a severe surge in November 1981 and lead to several dike breaches in Denmark.
A visual inspection of the remaining 22 detections revealed, that almost all of them are North Sea storms as well. Only 4 of them are not storms, but the opposite: they span times of extremely calm sea conditions with nearly no wind and very low waves, which is some kind of anomaly as well.
A list of the top 50 detections can be found in Appendix B and animated heatmaps of the three variables during the detected timeframes are shown on our web page: http://www.infcv.unijena.de/libmaxdiv_applications.html.
Processing this comparatively long timeseries using 8 parallel threads took 27 seconds. This time can be reduced further to half a second by using interval proposals without changing the top 10 detections significantly. This supports the assumption, that the novel proposal method does not only perform well on synthetic, but also on real data.
4.2 SpatioTemporal Detection of Low Pressure Areas
As a genuine spatiotemporal usecase, we have also applied the MDI algorithm to a timeseries with daily sealevel pressure (SLP) measurements over the North Atlantic Sea with a much wider spatial coverage than in the previous experiment. For this purpose, we selected a subset of the NCEP/NCAR reanalysis [27] covering the years from 1957 to 2011. This results in a timeseries of about 20,000 days. The spatial resolution of 2.5° degrees is rather coarse and the locations are organized in a regular grid of size covering the area between 25° N, 52.5° W and 65° N, 15° E.
Again, the MDI algorithm with the Gaussian model and the unbiased KL divergence is applied to this timeseries to detect lowpressure fields, which are related to storms. Regarding the time dimension, we apply timedelay embedding with and search for intervals of size between 3 and 10 days. Concerning space, we do not apply any embedding for now and set a minimum size of , but no maximum. 7 out of the top 20 detections could be associated with known historic storms.
A visual inspection of the results shows that the MDI algorithm is not only capable of detecting occurrences of anomalous lowpressure fields over time, but also their spatial location. This can be seen in the animations on our web page: http://www.infcv.unijena.de/libmaxdiv_applications.html. A few snapshots and a list of detections are also shown in Appendix C.
It is not necessary to apply spatialneighbor embedding in this scenario, since we are not interested in spatial outliers, but only in the location of temporal outliers. We have also experimented with applying spatialneighbor embedding and it led to the detection of some highpressure fields surrounded by lowpressure fields. Since highpressure fields are both larger and more common in this timeseries, they are not detected as temporal anomalies.
Since we did not set a maximum spatial extent of anomalous regions, the algorithm took 4 hours to process this spatiotemporal timeseries. This could, however, be reduced to 22 seconds using our interval proposal technique, with only a minor loss of localization accuracy.
4.3 Stylistic Anomalies in Texts of Natural Language
By employing a transformation from the domain of natural language to realvalued features, the MDI algorithm can also be applied to written texts. One important task in Natural Language Processing (NLP) is, for example, the identification of paragraphs written in a different language than the remainder of the document. Such a segmentation can be used as a preprocessing step for the actual, languagespecific processing.
In order to simulate such a scenario, we use a subset of the Europarl corpus [28], which is a sentencealigned parallel corpus extracted from the proceedings of the European Parliament in 21 different languages. The 33,334 English sentences from the COMTRANS subset of Europarl, which is bundled with the Natural Language Toolkit (NLTK) for Python, serve as a basis and 5 random sequences of between 10 and 50 sentences are replaced by their German counterparts to create a semantically coherent mixedlanguage text.
We employ a simple transformation of sentences to feature vectors: Since the distribution of letter frequencies varies across languages, each sentence is represented by a 27dimensional vector whose first element is the average word length in the sentence and the remaining 26 components are the absolute frequencies of the letters “a” to “z” (caseinsensitive). German umlauts are ignored since they would make the identification of German sentences too easy.
The MDI algorithm using the unbiased KL divergence is then applied in order to search for anomalous sequences of between 10 and 50 sentences in the mixedlanguage text after sentencewise transformation to the feature space. Because the number of features is quite high in relation to the number of samples in an interval, we use a global covariance matrix shared among the Gaussian models and do not apply timedelay embedding.
The top 5 detections returned by the algorithm correspond to the 5 German paragraphs that have been injected into the English text. The localization is quite accurate, though not perfect: on average, the boundaries of the detected paragraphs are off by 1.4 sentences from the groundtruth. The next 5 detections are mainly tables and enumerations, which are also an anomaly compared with the usual dialog style of the parliament proceedings.
For this scenario, we had designed the features specifically for the task of language identification. To see what else would be possible with a smaller bias towards a specific application, we have also applied the algorithm to the 1 Book of Moses (Genesis) in the King James Version of the bible, where we use word2vec [29] for wordwise feature embeddings. word2vec learns realvalued vector representations of words in a way, so that the representations of words that occur more often in similar contexts have a smaller Euclidean distance. The embeddings used for this experiment have been learned from the Brown corpus using the continuous skipgram model and we have chosen a dimensionality of 50 for the vector space, which is rather low for word2vec models, but still tractable for the Gaussian probability density model. Words which have not been seen by the model during training are treated as missing values.
The top 10 detections of sequences of between 50 and 500 words according to the unbiased KL divergence are provided in Appendix D. The first five of those are, without exception, genealogies, which can indeed be considered as anomalies, because they are long lists of names of fathers, sons and wives, connected by repeating phrases. The 6 detection is a dialog between God and Abraham, where Abraham bargains with God and tries to convince him not to destroy the town Sodom. This episode is another example for stylistic anomalies, since the dialog is a concatenation of very similar questionanswer pairs with only slight modifications.
Due to the rather wide limits on the possible size of anomalous intervals, the analysis of the entire book Genesis, a sequence of 44,764 words, took a total of 9 minutes, where we have not yet used interval proposals.
4.4 Anomalies in Videos
The detection of unusual events in videos is another important task, e.g., in the domain of video surveillance or industrial control systems. Though videos are already represented as multivariate spatiotemporal timeseries with usually 3 variables (RGB channels), a semantically more meaningful representation can be obtained by extracting features from a Convolutional Neural Network (CNN).
In this experiment, we use a video of a traffic scene from the ViSOR repository [30]. It has a length of 60 seconds (1495 frames) and a rather low resolution of pixels. The video shows a street and a sidewalk with a varying frequency of cars crossing the captured area horizontally in both directions. At one point, a group of two pedestrians and one cyclist appears on the sidewalk and crosses the area from right to left at a low speed. Another sequence at the end of the video shows a single cyclist riding along the sidewalk in the opposite direction at a higher speed. Altogether, 26 seconds of the video contain moving objects and 34 seconds just show an empty street. The nominal state of the scene hence is not unambiguous.
We extract features for each frame of the video from the conv5 layer of CaffeNet [31], which reduces the spatial resolution to , but increases the number of feature dimensions to 256. This rather large feature space is then reduced to 16 dimensions using PCA and the MDI algorithm is applied to search for anomalous subblocks with a minimum spatial extent of cells and a length between 3 and 12 seconds. The timedelay embedding parameters are fixed to for capturing half a second as context without increasing the number of dimensions too much. We apply the MDI algorithm with both the unbiased KL divergence and cross entropy as divergence measures. The Gaussian distribution model is employed in both cases.
The results (some snapshots are shown in Figure 13) exhibit an interesting difference between the two divergence measures: The KL divergence detects a subsequence of approximately 10 seconds where absolutely no objects cross the captured area. Thus, car traffic is identified as normal behavior and long spans of time without any traffic are considered as anomalous, because they have a very low entropy and the KL divergence penalizes the entropy of all other intervals, as opposed to cross entropy which does not take the entropy of the detected interval into account. Another detection occurs when the group of pedestrians enters the area. The localization, however, is rather fuzzy and spans nearly the entire frame. Cross entropy, on the other hand, seems to identify the state of low or no traffic as normal behavior and yields two detections at the beginning and the end of the video where the frequency of cars is higher than in the rest of the video. It detects the pedestrians too, but with a better localization accuracy. This detection, however, does not cover the entire sidewalk, since the pedestrians are moving from right to left and the algorithm is not designed for tracking moving anomalies.
Without using interval proposals, the comparatively high number of features combined with the large spatial search space would result in a processing time of 13 hours for this video. This can be reduced to 5 minutes using our novel interval proposal technique.
5 Summary and Conclusions
We have introduced a novel unsupervised algorithm for anomaly detection that is suitable for analyzing large multivariate timeseries and can detect anomalous regions not only in temporal but also in spatiotemporal data from various domains. The proposed MDI algorithm outperforms existing anomaly detection techniques, while being comparatively time efficient, thanks to an efficient implementation and a novel interval proposal technique that excludes uninteresting parts of the data from indepth analysis. Moreover, we have exposed a bias of the KullbackLeibler (KL) divergence towards smaller intervals and proposed an unbiased KL divergence that is superior when applied to real data. We have also investigated other divergence measures and found that the use of cross entropy can result in improved performance for data with a low variability of entropy.
Various experiments on data from different domains, including climate analysis, natural language processing and video surveillance, have shown that the algorithm proposed in this work can serve as a generic, unsupervised anomaly detection technique that can facilitate tasks such as process control, data analysis and knowledge discovery. These application examples emphasize the importance of intervalbased anomaly detection techniques, and we hope that our work is able to motivate further research in this area.
For processing data with a large spatial extent or a high number of dimensions, a full scan over all possible subblocks of the data would be prohibitively timeconsuming. To this end, we have introduced a novel interval proposal technique that can reduce computation time significantly. However, interval proposals usually lead to less accurate detections, which is particularly noticeable with regard to spatial dimensions. Future work might hence investigate applying indepth analysis not only to the proposed intervals themselves, but also to their neighborhood. An alternative might be a hierarchical approach of successive refinement.
Other open problems to be addressed in the future include efficient probability density estimation in the face of highdimensional data, the automatic determination of suitable parameters for timedelay embedding, and tracking anomalies moving in space over time. Furthermore, it is often necessary to convince the expert analyst that a detected anomaly really is an anomaly. Thus, future work will include the development of an attribution scheme that can explain which variables or combinations of variables caused a detection and why.
Acknowledgements
The support of the project EU H2020EO2014 project BACI “Detecting changes in essential ecosystem and biodiversity propertiestowards a Biosphere Atmosphere Change Index”, contract 640176, is gratefully acknowledged.
References
 [1] G. Walker, “World weather,” Quarterly Journal of the Royal Meteorological Society, vol. 54, no. 226, pp. 79–87, 1928.
 [2] M. M. Breunig, H.P. Kriegel, R. T. Ng, and J. Sander, “Lof: identifying densitybased local outliers,” in ACM sigmod record, vol. 29, no. 2. ACM, 2000, pp. 93–104.
 [3] J. Kim and C. D. Scott, “Robust kernel density estimation,” Journal of Machine Learning Research, vol. 13, no. Sep, pp. 2529–2565, 2012.
 [4] J. F. MacGregor and T. Kourti, “Statistical process control of multivariate processes,” Control Engineering Practice, vol. 3, no. 3, pp. 403–414, 1995.
 [5] B. Schölkopf, J. C. Platt, J. ShaweTaylor, A. J. Smola, and R. C. Williamson, “Estimating the support of a highdimensional distribution,” Neural computation, vol. 13, no. 7, pp. 1443–1471, 2001.
 [6] E. Keogh, J. Lin, and A. Fu, “Hot sax: Efficiently finding the most unusual time series subsequence,” in IEEE International Conference on Data Mining (ICDM). IEEE, 2005, pp. 8–pp.
 [7] H. Ren, M. Liu, X. Liao, L. Liang, Z. Ye, and Z. Li, “Anomaly detection in time series based on interval sets,” IEEJ Transactions on Electrical and Electronic Engineering, 2018.
 [8] S. Liu, M. Yamada, N. Collier, and M. Sugiyama, “Changepoint detection in timeseries data by relative densityratio estimation,” Neural Networks, vol. 43, pp. 72–83, 2013.
 [9] P. Senin, J. Lin, X. Wang, T. Oates, S. Gandhi, A. P. Boedihardjo, C. Chen, and S. Frankenstein, “Grammarviz 3.0: Interactive discovery of variablelength time series patterns,” ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 12, no. 1, p. 10, 2018.
 [10] M. Jiang, A. Beutel, P. Cui, B. Hooi, S. Yang, and C. Faloutsos, “A general suspiciousness metric for dense blocks in multimodal data,” in IEEE International Conference on Data Mining (ICDM). IEEE, 2015, pp. 781–786.

[11]
E. Wu, W. Liu, and S. Chawla, “Spatiotemporal outlier detection in precipitation data,” in
Knowledge discovery from sensor data. Springer, 2010, pp. 115–133.  [12] D. M. Hawkins, Identification of outliers. Springer, 1980, vol. 11.

[13]
P. Viola and M. J. Jones, “Robust realtime face detection,”
International journal of computer vision, vol. 57, no. 2, pp. 137–154, 2004.  [14] N. H. Packard, J. P. Crutchfield, J. D. Farmer, and R. S. Shaw, “Geometry from a time series,” Physical review letters, vol. 45, no. 9, p. 712, 1980.
 [15] F. Takens, “Detecting strange attractors in turbulence,” in Dynamical systems and turbulence. Springer, 1981, pp. 366–381.
 [16] A. Kut and D. Birant, “Spatiotemporal outlier detection in large databases,” CIT. Journal of computing and information technology, vol. 14, no. 4, pp. 291–297, 2006.
 [17] T. Cheng and Z. Li, “A multiscale approach for spatiotemporal outlier detection,” T. GIS, vol. 10, no. 2, pp. 253–263, 2006.
 [18] J. Duchi, “Derivations for linear algebra and optimization,” Berkeley, California, 2007.
 [19] N. A. Ahmed and D. Gokhale, “Entropy expressions and their estimators for multivariate distributions,” IEEE Transactions on Information Theory, vol. 35, no. 3, pp. 688–692, 1989.
 [20] E. Rodner, B. Barz, Y. Guanche, M. Flach, M. Mahecha, P. Bodesheim, M. Reichstein, and J. Denzler, “Maximally divergent intervals for anomaly detection,” in ICML Workshop on Anomaly Detection (ICMLWS), 2016.
 [21] T. W. Anderson, “An introduction to multivariate statistical analysis,” Wiley New York, Tech. Rep., 1962.
 [22] T. Kanungo and R. M. Haralick, “Multivariate hypothesis testing for gaussian data: Theory and software,” Citeseer, Tech. Rep., 1995.
 [23] J. Lin, “Divergence measures based on the shannon entropy,” IEEE Transactions on Information theory, vol. 37, no. 1, pp. 145–151, 1991.
 [24] J. R. Hershey and P. A. Olsen, “Approximating the kullback leibler divergence between gaussian mixture models,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 4. IEEE, 2007, pp. IV–317.
 [25] C. Paciorek and M. Schervish, “Nonstationary covariance functions for gaussian process regression,” Advances in neural information processing systems (NIPS), vol. 16, pp. 273–280, 2004.
 [26] HelmholtzZentrum Geesthacht, Zentrum für Material und Küstenforschung GmbH. (2012) coastdat1 waves north sea wave spectra hindcast (19482007). World Data Center for Climate (WDCC). [Online]. Available: https://doi.org/10.1594/WDCC/coastDat1_Waves
 [27] E. Kalnay, M. Kanamitsu, R. Kistler, W. Collins, D. Deaven, L. Gandin, M. Iredell, S. Saha, G. White, J. Woollen et al., “The ncep/ncar 40year reanalysis project,” Bulletin of the American meteorological Society, vol. 77, no. 3, pp. 437–471, 1996.
 [28] P. Koehn, “Europarl: A parallel corpus for statistical machine translation,” in MT summit, vol. 5, 2005, pp. 79–86.
 [29] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” Proceedings of Workshop at ICLR, 2013.
 [30] R. Vezzani and R. Cucchiara, “Video surveillance online repository (visor): an integrated framework,” Multimedia Tools and Applications, vol. 50, no. 2, pp. 359–380, 2010, video used in experiments: http://www.openvisor.org/video_details.asp?idvideo=339.

[31]
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,”
arXiv preprint arXiv:1408.5093, 2014, network architecture and parameters: https://github.com/BVLC/caffe/tree/master/models/bvlc_reference_caffenet.
Appendix A Generalized extraction of the sum over certain range from a higherorder tensor of cumulative sums
Theorem 1.
Let with being an order data tensor and being a tensor of cumulative sums given by
(24) 
The sum over all elements of in the range can then be reconstructed from with additions/subtractions according to
(25) 
Proof.
For the basic case of it can easily be seen that
Now assume that theorem 1 holds for . Applying it for gives
Since the first indices of are fixed in the scope of the inner sum and only the last index varies, the basic case for can be applied to that inner sum expression, transforming the righthand side of the equation to
∎
Appendix B North Sea Storm Detections
Each heatmap shows the state of the three variables at the middle of the top 5 detected time frames. The static red box marks the spatial subset of the data that has been used for this experiment described in Section 4.1. Heatmaps are best viewed in color.
Animated heatmaps for more detections can be found on our web page: http://www.infcv.unijena.de/libmaxdiv_applications.html.
#  Timeframe  Score  Historical Storm 

1  19620216 05:00 – 19620218 06:00  831.635  HamburgFlut (Feb 1617) 
2  19901211 22:00 – 19901213 10:00  824.840  Storm 1990/Dec (Dec 12) 
3  19650213 01:00 – 19650215 23:00  797.781  
4  20000129 12:00 – 20000131 02:00  796.528  Cyclone Kerstin (Jan 2931) 
5  19811123 19:00 – 19811125 22:00  745.951  North Frisian Flood (Nov 24) 
6  19890213 11:00 – 19890216 10:00  714.684  Storm 1989/Feb (Feb 14) 
7  19880228 04:00 – 19880302 03:00  710.495  
8  19731212 16:00 – 19731215 15:00  673.886  Storm 1973/Dec (2) (Dec 1315) 
9  19981226 00:00 – 19981228 05:00  658.592  Cyclone Stephen (Dec 2627) 
10  19840102 15:00 – 19840105 14:00  658.306  
11  19771113 00:00 – 19771115 23:00  592.562  
12  19800226 00:00 – 19800228 23:00  573.913  
13  19990204 15:00 – 19990207 14:00  572.603  Storm 1999/Feb (Feb 05) 
14  20061031 07:00 – 20061101 21:00  560.806  Cyclone Britta (Oct 31  Nov 01) 
15  19950109 21:00 – 19950112 20:00  554.777  
16  19830117 20:00 – 19830120 14:00  545.856  Storm 1983/Jan (Jan 1720) 
17  19911017 00:00 – 19911019 23:00  537.879  
18  19961105 07:00 – 19961107 05:00  519.837  Storm 1996/Nov (Nov 0507) 
19  19760120 10:00 – 19760123 02:00  508.532  Storm 1976/Jan (2) (Jan 21) 
20  19930124 02:00 – 19930127 01:00  506.250  Storm 1993/Jan (2) (Jan 2225) 
21  19731119 05:00 – 19731120 16:00  494.595  Storm 1973/Nov (3) (Nov 1920) 
22  19921223 15:00 – 19921226 14:00  491.438  
23  19771229 22:00 – 19771231 17:00  489.287  
24  20040207 07:00 – 20040209 23:00  485.346  Cyclone Ursula (Feb 0708) 
25  19840112 01:00 – 19840115 00:00  485.231  Storm 1984/Jan (Jan 14) 
26  19910104 19:00 – 19910107 18:00  471.642  Storm Undine (Jan 0209) 
27  19731111 03:00 – 19731114 02:00  471.398  Storm 1973/Nov (1) (Nov 1314) 
28  20041117 09:00 – 20041120 08:00  460.155  
29  19731204 09:00 – 19731207 08:00  445.639  Storm 1973/Dec (1) (Dec 0607) 
30  19610326 13:00 – 19610329 01:00  423.340  
31  19940128 06:00 – 19940131 05:00  420.520  Cyclone Lore (Jan 2728) 
32  19800419 03:00 – 19800421 18:00  420.186  
33  19991223 18:00 – 19991226 05:00  418.482  Cyclone Lothar (Dec 2526) 
34  19881002 07:00 – 19881005 02:00  412.905  
35  19701019 02:00 – 19701022 01:00  411.187  
36  20070110 20:00 – 20070113 18:00  407.993  Cyclone Franz (Jan 11) 
37  20071107 11:00 – 20071110 10:00  402.426  Cyclone Tilo (Nov 0611) 
38  19900919 07:00 – 19900922 06:00  397.632  
39  19930219 00:00 – 19930221 23:00  387.598  Storm 1993/Feb (Feb 2021) 
40  19981024 11:00 – 19981027 08:00  382.914  Cyclone Xylia (Oct 2728) 
41  20031212 15:00 – 20031215 14:00  377.435  Cyclone Fritz (Dec 1315) 
42  19911223 06:00 – 19911226 03:00  374.911  
43  20020219 17:00 – 20020222 16:00  374.026  Storm 2002/Feb (1) (Feb 2123) 
44  19970308 12:00 – 19970311 11:00  371.758  
45  19590217 02:00 – 19590220 01:00  369.272  
46  19741211 06:00 – 19741214 05:00  365.459  
47  19941206 02:00 – 19941209 01:00  358.443  
48  20011220 08:00 – 20011223 07:00  357.280  
49  19921118 14:00 – 19921121 02:00  343.076  
50  20061229 22:00 – 20070101 21:00  340.920  Cyclone Karla (Dec 3031) 
Appendix C Low Pressure Area Detections
Each row shows the heatmap of Sea Level Pressure at the beginning, the middle, and the end of the detection. The red box marks the detected area. Details on this experiment can be found in Section 4.2.
Heatmaps are best viewed in color. An animated video showing all detections can be found on our webpage: http://www.infcv.unijena.de/libmaxdiv_applications.html.
Start  Middle  End  
19960106 – 19960115 

19900128 – 19900206 

19891222 – 19891231 

20090118 – 20090127 
#  Timeframe  Location  Score  Historical Storm 

1  19960106 – 19960115  40.0° N, 52.5° E – 65.0° N, 2.5° E  5940.691  
2  19900128 – 19900206  47.5° N, 52.5° E – 65.0° N, 7.5° E  5551.022  Storm Herta (Feb 0105) 
3  19891222 – 19891231  45.0° N, 52.5° E – 65.0° N, 2.5° E  5198.513  
4  20090118 – 20090127  47.5° N, 52.5° E – 65.0° N, 15.0° E  4959.829  Cyclone Joris (Jan 23) 
5  19821214 – 19821223  50.0° N, 52.5° E – 65.0° N, 15.0° E  4811.575  
6  19901225 – 19910103  52.5° N, 52.5° E – 65.0° N, 15.0° E  4703.993  Storm Undine (Jan 0209) 
7  19740103 – 19740112  47.5° N, 52.5° E – 65.0° N, 5.0° E  4594.737  
8  19861208 – 19861217  47.5° N, 52.5° E – 65.0° N, 2.5° E  4417.568  Storm 1986/Dec (Dec 1415) 
9  19971230 – 19980108  50.0° N, 52.5° E – 65.0° N, 10.0° E  4377.532  Cyclone Fanny (Jan 0305) 
10  19950126 – 19950204  47.5° N, 52.5° E – 65.0° N, 15.0° E  4376.735  
11  20061203 – 20061212  47.5° N, 52.5° E – 65.0° N, 15.0° E  4306.923  
12  19970218 – 19970227  52.5° N, 52.5° E – 65.0° N, 15.0° E  4249.087  
13  19580104 – 19580113  50.0° N, 52.5° E – 65.0° N, 15.0° E  4206.594  
14  19781206 – 19781215  45.0° N, 52.5° E – 65.0° N, 2.5° E  4151.843  
15  19761201 – 19761210  47.5° N, 52.5° E – 65.0° N, 15.0° E  4139.642  
16  19710118 – 19710127  45.0° N, 52.5° E – 65.0° N, 15.0° E  4030.477  
17  19921129 – 19921208  47.5° N, 52.5° E – 65.0° N, 15.0° E  3962.119  
18  19940127 – 19940205  50.0° N, 52.5° E – 65.0° N, 15.0° E  3933.832  Cyclone Lore (Jan 2728) 
19  20071202 – 20071211  47.5° N, 52.5° E – 65.0° N, 15.0° E  3931.694  Cyclone Fridtjof (Dec 0203) 
20  19591218 – 19591227  47.5° N, 52.5° E – 65.0° N, 15.0° E  3910.999 
Appendix D Top 10 Anomalous Paragraphs in the Book Genesis
Each detected word sequence is shown with some context colored in grey before and after the detection. See Section 4.3 for details on this experiment.
The text has been taken from the “genesis” corpus included in the Natural Language Toolkit (NLTK) for Python and is not free of noise.
Detection #1: words 3218 – 3613 (Score: 56462.266)
and called their name Adam , in the day when they were created . And Adam lived an hundred and thirty years , and begat a son in his own likeness , and after his image ; and called his name Se And the days of Adam after he had begotten Seth were eight hundred yea and he begat sons and daughters : And all the days that Adam lived were nine hundred and thirty yea and he died . And Seth lived an hundred and five years , and begat Enos : And Seth lived after he begat Enos eight hundred and seven years , and begat sons and daughte And all the days of Seth were nine hundred and twelve years : and he died . And Enos lived ninety years , and begat Cainan : And Enos lived after he begat Cainan eight hundred and fifteen years , and begat sons and daughte And all the days of Enos were nine hundred and five years : and he died . And Cainan lived seventy years and begat Mahalaleel : And Cainan lived after he begat Mahalaleel eight hundred and forty years , and begat sons and daughte And all the days of Cainan were nine hundred and ten years : and he died . And Mahalaleel lived sixty and five years , and begat Jared : And Mahalaleel lived after he begat Jared eight hundred and thirty years , and begat sons and daughte And all the days of Mahalaleel were eight hundred ninety and five yea and he died . And Jared lived an hundred sixty and two years , and he begat Eno And Jared lived after he begat Enoch eight hundred years , and begat sons and daughte And all the days of Jared were nine hundred sixty and two yea and he died . And Enoch lived sixty and five years , and begat Methuselah : And Enoch walked with God after he begat Methuselah three hundred years , and begat sons and daughte And all the days of Enoch were three hundred sixty and five yea And Enoch walked with God : and he was not ; for God took him . And Methuselah lived an hundred eighty and seven years , and begat Lamech . And Methuselah lived after he begat Lamech seven hundred eighty and two years , and begat sons and daughte And all the days of Methuselah were nine hundred sixty and nine yea and he died . And Lamech lived an hundred eighty and two years , and begat a s And he called his name Noah , saying , This same shall comfort us concerning our work and toil of our hands , because of the ground which the LORD hath cursed .
Detection #2: words 30098 – 30568 (Score: 41058.093)
his house , and his cattle , and all his beasts , and all his substance , which he had got in the land of Canaan ; and went into the country from the face of his brother Jacob . For their riches were more than that they might dwell together ; and the land wherein they were strangers could not bear them because of their cattle . Thus dwelt Esau in mount Seir : Esau is Edom . And these are the generations of Esau the father of the Edomites in mount Se These are the names of Esau ’ s sons ; Eliphaz the son of Adah the wife of Esau , Reuel the son of Bashemath the wife of Esau . And the sons of Eliphaz were Teman , Omar , Zepho , and Gatam , and Kenaz . And Timna was concubine to Eliphaz Esau ’ s son ; and she bare to Eliphaz Amal these were the sons of Adah Esau ’ s wife . And these are the sons of Reuel ; Nahath , and Zerah , Shammah , and Mizz these were the sons of Bashemath Esau ’ s wife . And these were the sons of Aholibamah , the daughter of Anah the daughter of Zibeon , Esau ’ s wife and she bare to Esau Jeush , and Jaalam , and Korah . These were dukes of the sons of Esau : the sons of Eliphaz the firstborn son of Esau ; duke Teman , duke Omar , duke Zepho , duke Kenaz , Duke Korah , duke Gatam , and duke Amalek : these are the dukes that came of Eliphaz in the land of Edom ; these were the sons of Adah . And these are the sons of Reuel Esau ’ s son ; duke Nahath , duke Zerah , duke Shammah , duke Mizz these are the dukes that came of Reuel in the land of Edom ; these are the sons of Bashemath Esau ’ s wife . And these are the sons of Aholibamah Esau ’ s wife ; duke Jeush , duke Jaalam , duke Kor these were the dukes that came of Aholibamah the daughter of Anah , Esau ’ s wife . These are the sons of Esau , who is Edom , and these are their dukes . These are the sons of Seir the Horite , who inhabited the land ; Lotan , and Shobal , and Zibeon , and Anah , And Dishon , and Ezer , and Dishan : these are the dukes of the Horites , the children of Seir in the land of Edom . And the children of Lotan were Hori and Hemam ; and Lotan ’ s sister was Timna . And the children of Shobal were these ; Alvan , and Manahath , and Ebal , Shepho , and Onam . And these are the children of Zibeon ; both Ajah , and Anah : this was that Anah that found the mules in the wilderness , as he fed the asses of Zibeon his father . And the children of Anah were these ; Dishon , and Aholibamah the daughter of Anah . And these are the children of Dishon ; Hemdan , and Eshban , and Ithran , and Cheran .
Detection #3: words 7347 – 7684 (Score: 39679.642)
may not understand one another ’ s speech . So the LORD scattered them abroad from thence upon the face of all the ear and they left off to build the city . Therefore is the name of it called Babel ; because the LORD did there confound the language of all the ear and from thence did the LORD scatter them abroad upon the face of all the earth . These are the generations of Shem : Shem was an hundred years old , and begat Arphaxad two years after the flo And Shem lived after he begat Arphaxad five hundred years , and begat sons and daughters . And Arphaxad lived five and thirty years , and begat Salah : And Arphaxad lived after he begat Salah four hundred and three years , and begat sons and daughters . And Salah lived thirty years , and begat Eber : And Salah lived after he begat Eber four hundred and three years , and begat sons and daughters . And Eber lived four and thirty years , and begat Peleg : And Eber lived after he begat Peleg four hundred and thirty years , and begat sons and daughters . And Peleg lived thirty years , and begat Reu : And Peleg lived after he begat Reu two hundred and nine years , and begat sons and daughters . And Reu lived two and thirty years , and begat Serug : And Reu lived after he begat Serug two hundred and seven years , and begat sons and daughters . And Serug lived thirty years , and begat Nahor : And Serug lived after he begat Nahor two hundred years , and begat sons and daughters . And Nahor lived nine and twenty years , and begat Terah : And Nahor lived after he begat Terah an hundred and nineteen years , and begat sons and daughters . And Terah lived seventy years , and begat Abram , Nahor , and Haran . Now these are the generations of Terah : Terah begat Abram , Nahor , and Haran ; and Haran begat Lot . And Haran died before his father Terah in the land of his nativity , in Ur of the Chaldees . And Abram and Nahor took them wives : the name of Abram ’ s wife was Sarai ; and the name of Nahor ’ s wife , Milcah , the daughter of Haran , the father of Milcah , and the father of Iscah . But Sarai was barren ; she had no child .
Detection #4: words 30585 – 30993 (Score: 28796.840)
And these are the children of Zibeon ; both Ajah , and Anah : this was that Anah that found the mules in the wilderness , as he fed the asses of Zibeon his father . And the children of Anah were these ; Dishon , and Aholibamah the daughter of Anah . And these are the children of Dishon ; Hemdan , and Eshban , and Ithran , and Cheran . The children of Ezer are these ; Bilhan , and Zaavan , and Akan . The children of Dishan are these ; Uz , and Aran . These are the dukes that came of the Horites ; duke Lotan , duke Shobal , duke Zibeon , duke Anah , Duke Dishon , duke Ezer , duke Dishan : these are the dukes that came of Hori , among their dukes in the land of Seir . And these are the kings that reigned in the land of Edom , before there reigned any king over the children of Israel . And Bela the son of Beor reigned in Edom : and the name of his city was Dinhabah . And Bela died , and Jobab the son of Zerah of Bozrah reigned in his stead . And Jobab died , and Husham of the land of Temani reigned in his stead . And Husham died , and Hadad the son of Bedad , who smote Midian in the field of Moab , reigned in his ste and the name of his city was Avith . And Hadad died , and Samlah of Masrekah reigned in his stead . And Samlah died , and Saul of Rehoboth by the river reigned in his stead . And Saul died , and Baalhanan the son of Achbor reigned in his stead . And Baalhanan the son of Achbor died , and Hadar reigned in his ste and the name of his city was Pau ; and his wife ’ s name was Mehetabel , the daughter of Matred , the daughter of Mezahab . And these are the names of the dukes that came of Esau , according to their families , after their places , by their names ; duke Timnah , duke Alvah , duke Jetheth , Duke Aholibamah , duke Elah , duke Pinon , Duke Kenaz , duke Teman , duke Mibzar , Duke Magdiel , duke Iram : these be the dukes of Edom , according to their habitations in the land of their possessi he is Esau the father of the Edomites . And Jacob dwelt in the land wherein his father was a stranger , in the land of Canaan . These are the generations of Jacob . Joseph , being seventeen years old , was feeding the flock with his brethren ; and the lad was with the sons of Bilhah , and with the sons of Zilpah , his father ’ s wiv and Joseph brought unto his father their evil report .
Detection #5: words 40473 – 40821 (Score: 28436.096)
into Egypt , Jacob , and all his seed with h His sons , and his sons ’ sons with him , his daughters , and his sons ’ daughters , and all his seed brought he with him into Egypt . And these are the names of the children of Israel , which came into Egypt , Jacob and his so Reuben , Jacob ’ s firstborn . And the sons of Reuben ; Hanoch , and Phallu , and Hezron , and Carmi . And the sons of Simeon ; Jemuel , and Jamin , and Ohad , and Jachin , and Zohar , and Shaul the son of a Canaanitish woman . And the sons of Levi ; Gershon , Kohath , and Merari . And the sons of Judah ; Er , and Onan , and Shelah , and Pharez , and Zar but Er and Onan died in the land of Canaan . And the sons of Pharez were Hezron and Hamul . And the sons of Issachar ; Tola , and Phuvah , and Job , and Shimron . And the sons of Zebulun ; Sered , and Elon , and Jahleel . These be the sons of Leah , which she bare unto Jacob in Padanaram , with his daughter Din all the souls of his sons and his daughters were thirty and three . And the sons of Gad ; Ziphion , and Haggi , Shuni , and Ezbon , Eri , and Arodi , and Areli . And the sons of Asher ; Jimnah , and Ishuah , and Isui , and Beriah , and Serah their sist and the sons of Beriah ; Heber , and Malchiel . These are the sons of Zilpah , whom Laban gave to Leah his daughter , and these she bare unto Jacob , even sixteen souls . The sons of Rachel Jacob ’ s wife ; Joseph , and Benjamin . And unto Joseph in the land of Egypt were born Manasseh and Ephraim , which Asenath the daughter of Potipherah priest of On bare unto him . And the sons of Benjamin were Belah , and Becher , and Ashbel , Gera , and Naaman , Ehi , and Rosh , Muppim , and Huppim , and Ard . These are the sons of Rachel , which were born to Jacob : all the souls were fourteen . And the sons of Dan ; Hushim . And the sons of Naphtali ; Jahzeel , and Guni , and Jezer , and Shillem . These are the sons of Bilhah , which Laban gave unto Rachel his daughter , and she
Detection #6: words 12299 – 12486 (Score: 25531.722)
And Abraham answered and said , Behold now , I have taken upon me to speak unto the LORD , which am but dust and ash Peradventure there shall lack five of the fifty righteous : wilt thou destroy all the city for lack of five ? And he said , If I find there forty and five , I will not destroy it . And he spake unto him yet again , and said , Peradventure there shall be forty found there . And he said , I will not do it for forty ’ s sake . And he said unto him , Oh let not the LORD be angry , and I will spe Peradventure there shall thirty be found there . And he said , I will not do it , if I find thirty there . And he said , Behold now , I have taken upon me to speak unto the LO Peradventure there shall be twenty found there . And he said , I will not destroy it for twenty ’ s sake . And he said , Oh let not the LORD be angry , and I will speak yet but this on Peradventure ten shall be found there . And he said , I will not destroy it for ten ’ s sake . And the LORD went his way , as soon as he had left communing with Abrah and Abraham returned unto his place . And there came two angels to Sodom at even ; and Lot sat in the gate of Sod and Lot seeing them rose up
Detection #7: words 9069 – 9287 (Score: 25480.242)
Twelve years they served Chedorlaomer , and in the thirteenth year they rebelled . And in the fourteenth year came Chedorlaomer , and the kings that were with him , and smote the Rephaims in Ashteroth Karnaim , and the Zuzims in Ham , and the Emins in Shaveh Kiriathaim , And the Horites in their mount Seir , unto Elparan , which is by the wilderness . And they returned , and came to Enmishpat , which is Kadesh , and smote all the country of the Amalekites , and also the Amorites , that dwelt in Hazezontamar . And there went out the king of Sodom , and the king of Gomorrah , and the king of Admah , and the king of Zeboiim , and the king of Bela ( the same is Zoar ;) and they joined battle with them in the vale of Siddim ; With Chedorlaomer the king of Elam , and with Tidal king of nations , and Amraphel king of Shinar , and Arioch king of Ellasar ; four kings with five . And the vale of Siddim was full of slimepits ; and the kings of Sodom and Gomorrah fled , and fell there ; and they that remained fled to the mountain . And they took all the goods of Sodom and Gomorrah , and all their victuals , and went their way . And they took Lot , Abram ’ s brother ’ s son , who dwelt in Sodom , and his goods , and departed . And there came one that had escaped , and told Abram the Hebrew ; for he dwelt in the plain of Mamre the Amorite , brother of Eshcol , and brother of An and these were
Detection #8: words 6811 – 7104 (Score: 24522.926)
And the Arvadite , and the Zemarite , and the Hamathite : and afterward were the families of the Canaanites spread abroad . And the border of the Canaanites was from Sidon , as thou comest to Gerar , unto Gaza ; as thou goest , unto Sodom , and Gomorrah , and Admah , and Zeboim , even unto Lasha . These are the sons of Ham , after their families , after their tongues , in their countries , and in their nations . Unto Shem also , the father of all the children of Eber , the brother of Japheth the elder , even to him were children born . The children of Shem ; Elam , and Asshur , and Arphaxad , and Lud , and Aram . And the children of Aram ; Uz , and Hul , and Gether , and Mash . And Arphaxad begat Salah ; and Salah begat Eber . And unto Eber were born two sons : the name of one was Peleg ; for in his days was the earth divided ; and his brother ’ s name was Joktan . And Joktan begat Almodad , and Sheleph , and Hazarmaveth , and Jerah , And Hadoram , and Uzal , and Diklah , And Obal , and Abimael , and Sheba , And Ophir , and Havilah , and Jobab : all these were the sons of Joktan . And their dwelling was from Mesha , as thou goest unto Sephar a mount of the east . These are the sons of Shem , after their families , after their tongues , in their lands , after their nations . These are the families of the sons of Noah , after their generations , in their natio and by these were the nations divided in the earth after the flood . And the whole earth was of one language , and of one speech . And it came to pass , as they journeyed from the east , that they found a plain in the land of Shinar ; and they dwelt there .
Detection #9: words 11352 – 11512 (Score: 21242.191)
and I will make him a great nation . But my covenant will I establish with Isaac , which Sarah shall bear unto thee at this set time in the next year . And he left off talking with him , and God went up from Abraham . And Abraham took Ishmael his son , and all that were born in his house , and all that were bought with his money , every male among the men of Abraham ’ s house ; and circumcised the flesh of their foreskin in the selfsame day , as God had said unto him . And Abraham was ninety years old and nine , when he was circumcised in the flesh of his foreskin . And Ishmael his son was thirteen years old , when he was circumcised in the flesh of his foreskin . In the selfsame day was Abraham circumcised , and Ishmael his son . And all the men of his house , born in the house , and bought with money of the stranger , were circumcised with him . And the LORD appeared unto him in the plains of Mamre : and he sat in the tent door in the heat of the day ; And he lift up his eyes and looked , and , lo , three men stood by
Detection #10: words 22251 – 22399 (Score: 20712.606)
And give thee the blessing of Abraham , to thee , and to thy seed with thee ; that thou mayest inherit the land wherein thou art a stranger , which God gave unto Abraham . And Isaac sent away Jacob : and he went to Padanaram unto Laban , son of Bethuel the Syrian , the brother of Rebekah , Jacob ’ s and Esau ’ s mother . When Esau saw that Isaac had blessed Jacob , and sent him away to Padanaram , to take him a wife from thence ; and that as he blessed him he gave him a charge , saying , Thou shalt not take a wife of the daughers of Canaan ; And that Jacob obeyed his father and his mother , and was gone to Padanaram ; And Esau seeing that the daughters of Canaan pleased not Isaac his father ; Then went Esau unto Ishmael , and took unto the wives which he had Mahalath the daughter of Ishmael Abraham ’ s son , the sister of Nebajoth , to be his wife . And Jacob went out from Beersheba , and went toward Haran .
Comments
There are no comments yet.