Time series clustering based on the characterisation of segment typologies

10/27/2018
by   David Guijo-Rubio, et al.
Universidad Pablo de Olavide
0

Time series clustering is the process of grouping time series with respect to their similarity or characteristics. Previous approaches usually combine a specific distance measure for time series and a standard clustering method. However, these approaches do not take the similarity of the different subsequences of each time series into account, which can be used to better compare the time series objects of the dataset. In this paper, we propose a novel technique of time series clustering based on two clustering stages. In a first step, a least squares polynomial segmentation procedure is applied to each time series, which is based on a growing window technique that returns different-length segments. Then, all the segments are projected into same dimensional space, based on the coefficients of the model that approximates the segment and a set of statistical features. After mapping, a first hierarchical clustering phase is applied to all mapped segments, returning groups of segments for each time series. These clusters are used to represent all time series in the same dimensional space, after defining another specific mapping process. In a second and final clustering stage, all the time series objects are grouped. We consider internal clustering quality to automatically adjust the main parameter of the algorithm, which is an error threshold for the segmenta- tion. The results obtained on 84 datasets from the UCR Time Series Classification Archive have been compared against two state-of-the-art methods, showing that the performance of this methodology is very promising.

READ FULL TEXT VIEW PDF

Authors

page 1

page 13

12/05/2019

Clustering Time-Series by a Novel Slope-Based Similarity Measure Considering Particle Swarm Optimization

Recently there has been an increase in the studies on time-series data m...
11/29/2018

Recurrent Deep Divergence-based Clustering for simultaneous feature learning and clustering of variable length time series

The task of clustering unlabeled time series and sequences entails a par...
02/12/2019

Machine Learning of Time Series Using Time-delay Embedding and Precision Annealing

Tasking machine learning to predict segments of a time series requires e...
05/04/2018

Using Quantum Mechanics to Cluster Time Series

In this article we present a method by which we can reduce a time series...
11/01/2019

Research and application of time series algorithms in centralized purchasing data

Based on the online transaction data of COSCO group's centralized procur...
11/08/2019

Hierarchical Clustering for Smart Meter Electricity Loads based on Quantile Autocovariances

In order to improve the efficiency and sustainability of electricity sys...
12/07/2018

A method to align time series segments based on envelope features as anchor points

In the time series analysis field, there is not a unique recipe for stud...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Related works

In this section, we begin by reviewing time series clustering methodologies, and the main problems associated to them. Then, we analyse existing clustering evaluation metrics, defining those which are going to be used in our proposal. Finally, time series segmentation methods are also briefly reviewed, given that a time series segmentation method is used as the first step of the methodology proposed.

I-a Time series clustering

There are many works proposed for time series clustering, although their objectives can be very different. Indeed, time series clustering can be classified into three categories

[4]:

  • Whole time series clustering defines each time series as a discrete object and clusters a set of time series measuring their similarity and applying a conventional clustering algorithm.

  • Subsequence clustering is considered as the clustering of segments obtained from a time series segmentation algorithm. One of its main advantages is that it can discover patterns within each time series.

  • Time point clustering combine the temporal proximity of time points with the similarity between their corresponding values.

We focus on whole time series clustering, which can be applied in three different ways [4]:

  • Shape-based approach: This method works with the raw time series data, matching, as well as possible, the shapes of the different time series. An appropriate distance measure has to be used, specifically adapted for time series. Then, a conventional clustering algorithm is applied. An example of this approach is that proposed by Paparrizos et al. [27], which uses a normalized version of the cross-correlation measure (in order to consider the time series shapes) and a method to compute cluster centroids based on the properties of this distance. Policker et al. [28]

    presented a model and a set of algorithms for estimating the parameters of a non-stationary time series. This model uses a time varying mixture of stationary sources, similar to hidden markov models (HMMs).Also, Asadi

    et al. [29] proposed a new method based on HMM ensembles, addressing the HMM-based methods problem in separating models of distinct classes.

  • Feature-based approach: In this case, time series are transformed into a set of statistical characteristics, where the length of this vector is less than the original time series. Each time series is converted into a feature vector of the same length, a standard distance measure is calculated and a clustering algorithm is applied. An example of this approach was presented by Räsänen et al. [30], based on an efficient computational method for statistical feature-based clustering. Möller-Levet et al. [31], developed a fuzzy clustering algorithm based on the short time series distance (STS), this method being highly sensitive to scale. Hautamaki et al. [32] proposed a raw time series clustering using the dynamic time warping (DTW) distance for hierarchical and partitional clustering algorithms. The problem of DTW is that it can be sensitive to noise.

  • Model-based approach: Raw time series are converted into a set of model parameters, followed by a model distance measurement and a classic clustering algorithm. McDowell et al. [33] presented a model-based method, Dirichlet process Gaussian process mixture model (DPGP), which jointly models cluster number with a Dirichlet process and temporal dependencies with Gaussian processes, demonstrating its accuracy on simulated gene expression data sets. Xiong et al. [34]

    used a model consisting of mixtures of autoregressive moving average (ARMA) models. This method involves a difficult parameter initialization for the expectation maximization (EM) algorithm. In general, model-based approaches suffer from scalability issues

    [35]. Yang et al. [36] presented an unsupervised ensemble learning approach to time series clustering using a combination of RPCL (rival penalized competitive learning) with other representations.

Many of the proposals for time series clustering are based on the combination of a distance measure and a clustering algorithm. First, we will analyse the most important distance measures proposed for time series comparison, and then we will introduce the clustering methods that can be applied based on them111Further information about time series clustering can be found in [3] or [4].

I-A1 Distance measures for time series

Two of the most important distance metrics for time series comparison are the euclidean distance (ED) [13] and the dynamic time warping (DTW) [14, 15]. The first one, ED, compares two time series, and , of length as follows:

(1)

As can be seen, ED forces both series to have the same length. In contrast, DTW follows the main idea of ED, but applying a local non-linear alignment. This alignment is achieved by deriving a matrix with the ED between any two points of and . Then, a warping path, , is calculated from the matrix of elements . By using dynamic programming [37], the previous warping path can be computed on matrix such as the following condition is satisfied [14]:

(2)

A popular alternative is to constrain the warping path in order to visit only a low number of cells on matrix is widely applied [16].

Recently, Wang et al. [12] evaluated distances measures and demonstrated that DTW is the most accurate distance measure with respect to the rest of measures, while ED is the most efficient one.

Moreover, new distances measures have arisen in recent years. Łuczak et al. [18] constructed a new parametric distance function, combining DTW and the derivative DTW distance () [38] (which is computed as the DTW distance considering the derivatives of the time series), where a single real number parameter, , controls the contribution of each of the two measures to the total value of the combined distances. This distance between time series and is defined as follows:

(3)

where

is a parameter selected by considering the best value for an internal evaluation measure known as inter-group variance (

). This novel metric is shown to outperform the results obtained by and , because it has the advantages of both.

Another state-of-the-art distance measure is based on the invariability to the scale and the translation of the time series and was proposed by Yang et al. [19]. This distance between time series and is defined as follows:

(4)

where is the time series shifted time units, and is the norm. is the scaling coefficient, that could be adjusted to the optimal one by setting the gradient to zero.

I-A2 Clustering algorithms

Clustering is a field of data mining based on discovering groups of objects without any form of supervision.

Among the most used metodologies, hierarchical clustering [39] is based on an agglomerative or a divisive algorithm. The agglomerative approach starts considering each element in a single cluster, and, for each iteration, the pair of clusters with more similarity are merged. On the contrary, the divisive algorithm starts including all elements in a single cluster, and, for each iteration, clusters are divided into smaller subgroups.

On the other hand, partitional clustering [39] divides the data into k clusters, where each cluster contains at least one element of the dataset. The idea behind this clustering is to minimize the average distance of elements to the cluster centre (also called prototype). Depending on the prototype, there are different algorithms: (1) k-means [40] uses centroids, i.e. the averaged element of the objects does not have to be an object belonging to the dataset, (2) k-medoids [32, 41] uses an object of the cluster as the prototype.

There are also some specific proposals for time series clustering. For example, Wang et al. [42] proposed a method for clustering time series based on their structural characteristics, introducing the following set of features: trend, seasonality, serial correlation, chaos, non-linearity and self-similarity.

I-B Clustering evaluation measures

Evaluating the extracted clusters is not a trivial task and has been extensively researched [43]. In this paper, we focus on numerical measures, that are applied to judge various aspects of clusters validity [44].

Different clustering algorithms obtain different clusters and different clustering structures, thus evaluating clustering results is quite important, in order to evaluate clustering structures objectively and quantitatively. There are two different testing criteria [45]: external criteria and internal criteria. External criteria uses class labels (also known as ground truth) for evaluating the assigned labels. Note that the ground truth is not used during the clustering algorithm. On the other hand, internal criteria evaluates the goodness of a clustering structure without respect to external information.

I-B1 Internal metrics

Among the different internal criteria, the most important ones are [46]:

  • Sum of squared error (SSE): This index measures the compactness of a given clustering, independently of the distance to other clusters. “Better” clusterings have lower values of SSE. It is defined as:

    (5)
  • Normalised sum of squared error (NSSE): This measure look for well-separated groups, maximizing the distance intra-clusters. This can be done by considering the following expression:

    (6)
  • Caliński and Harabasz (CH) [47]: This index is defined as the ratio between the internal dispersion of clusters and the dispersion within clusters:

    (7)

    where is the number of time series and is the number of cluster used to group segments. Moreover, and are given by:

    (8)
    (9)

    where, is the number of time series that belong to the cluster , is the centroid of cluster , and is the mean of the time series that belong to the cluster .

  • Silhouette index (SI) [48]: This measure combines both cohesion and separation, so it is based on the intra-cluster () and inter-cluster () distances respectively. This distances are given as follows:

    (10)
    (11)

    where is the Euclidean distance between and time series, as we defined before. Finally, SI index is defined as:

    (12)
  • Davies-Bouldin (DB) [49]: The validation of clustering following this measure tries to find compact clusters, which centroids are far away from each other. This index is defined as:

    (13)

    where is the average distance of all elements in cluster to centroid , and is the euclidean distance between the centroids and .

  • DUnn index (DU) [50]: The Dunn index ponders positively the compact and well-separated clusters. The Dunn index for clusters with is defined as:

    (14)
    (15)

    where is the dissimilarity between clusters and , and is the diameter of the cluster , which are given as follows:

    (16)
    (17)

    The Dunn index is very sensitive to noise, and different variants have been considered. We chose the three variants that had betters results in [46], where are referred to as GD33, GD43 and GD53. These variants have the following equations for , respectively:

    (18)
    (19)
    (20)

    For the last variant (GD53), a new definition of is included:

    (21)

    where is the Point Symmetry-Distance between the object and the cluster 222For further information, see [46].

  • COP index (COP): This index uses the distance from the points to their cluster centroids and the furthest neighbour distance. The equation is the following:

    (22)

CH, SI, COP, DU and variants have to be maximised. Conversely, DB, SSE and NSSE have to be minimised. The most common measures in the literature are CH, DU and SSE. The work of Arbelaitz et al. [46] compares 30 cluster validity indices in many different environments and demonstrated that CH and DU behave better than the other indices.

I-B2 External metrics

On the other hand, external indices measure the similarity between the cluster assignment and the ground truth, which has to be given as a form of evaluation but should not be used during the clustering. There are many metrics in the literature [51], although the most widely used is the rand index (RI) [52]. This measure penalizes false positive and false negative decisions during clustering. RI is given as:

(23)

where is the number of time series that are assigned to the same cluster and belong to the same class (according to ground truth), is the number of time series that are assigned to different clusters and belong to different classes, is the number of time series that are assigned to different clusters but belong to the same class, and is the number of time series that are assigned to the same cluster but belong to different classes.

I-C Time series segmentation

One of the steps of our proposal is based on dividing each time series in a sequence of segments. This is known as time series segmentation, which consists in cutting the time series in some specific points, trying to achieve different objectives, where, as mentioned before, the two main points of view are:

  • Discovering similar patterns: The main objective is the discovery and characterization of important events in the time series, by obtaining similar segments. The methods of Chung et al. [20], Tseng et al. [21] and Nikolaou et al. [10]

    are all based on evolutionary algorithms, given the large size of the search space when deciding the cut points.

  • Approximating the time series by a set of simple models, e.g. linear interpolation or polynomial regression: These methods could also be considered as representation methods. The main goal of these methods is to summarize a single time series, in order to reduce the difficulty of processing, analysing or exploring large time series, approximating the segments obtained by linear models. Keogh

    et al. [26] proposed some methods which use linear interpolations between the cut points. Oliver et al. [23, 24] develop a method that detect points with high variation and, then, replace each segment with the corresponding approximation. Finally, the method proposed by Fuchs et al. [25] is a growing window procedure (known as SwiftSeg), which returns unequal-length segments based on a online method. SwiftSeg is very fast, simultaneously obtaining a segmentation of the time series and the coefficients of the polynomial least squares approximation, the computational cost depending only on the degree of the polynomial instead of the window length. When compared to many other segmentation methods, SwiftSeg is shown to be very accurate while involving a low computational cost [25].

Ii A two-stage statistical segmentation-clustering time series procedure (TS3C)

Given a time series clustering dataset, , where is a time series of length , the objective of the proposed algorithm is to organize them into groups, , optimizing the clustering quality, where and .

The algorithm is based on two well-identified stages. The first stage is applied individually to each time series and acts as a dimensionality reduction. It consists of a segmentation procedure and a clustering of segments, discovering common patterns of each time series. The second clustering stage is applied to the mapped time series to discover the groups. The main steps of the algorithm are summarized in Fig. 1.

0:   Time series clustering:
0:  Time series dataset
0:  Best quality clustering
1:  for Each time series do
2:     Apply time series segmentation
3:     for Each segment do
4:        Extract the coefficients of the segment
5:        Compute the statistical features
6:        Combine the coefficients and the statistical features into a single array
7:     end for
8:     Cluster all the mapped segments
9:     Based on the previous clustering, map each time series
10:  end for
11:  Cluster mapped time series
12:  Evaluate the goodness of the clustering
13:  return  Best quality clustering
Fig. 1: Main steps of the TS3C algorithm.

Ii-a First stage

The first stage of TS3C consists of a time series segmentation, the extraction of statistical features of each segment, and the clustering of the segments for each time series. The steps of the first stage can be checked in Figure 2.

Fig. 2: The first stage consists of three steps, applied to each time series of the database : firstly, a segmentation procedure is applied to the time series . Then, segments extracted are mapped into a -dimensional space. Finally, these arrays are cluster into groups.

Ii-A1 Time series segmentation

In general, segmentation problems are used for discovering cut points in the time series to achieve different objectives. For a given time series of length , the segmentation consists in finding segments defined by cut points. In this way, the set of segments are formed by: . Specifically, in this paper, we apply SwiftSeg, a growing window procedure proposed in [25]. The algorithm iteratively introduces points of the time series into a growing window and simultaneously updates the corresponding least-squares polynomial approximation of the segment and its error. The window grows until an error threshold is exceeded. When this happens, a cut point (

) is included and the segment is finished. The process is repeated until reaching the end of the time series. We consider the following error function (standard error of prediction,

):

(24)

where, stands for Sum of Squared Errors of segment , and is the average value of segment . and are defined as:

(25)
(26)

where, is the time series value at time , and is its corresponding least-squares polynomial approximation. The advantage of this error function is that it does not take into account the scale of the values of each segment. The maximum error from which the window is not further grown is denoted as and has to be defined by the user.

Ii-A2 Segment mapping

After the segmentation process, each segment is mapped to an array, including the polynomial coefficients of the least squares approximation of the segment and a set of statistical features. Thus, each segment is projected into a -dimensional space, where is the length of the mapped segment.

The coefficients are directly obtained from the update procedure of the time series segmentation growing window specified in [25]. We discard the intercept, given that we are interested in shape of the segment, not in its relative value.

Moreover, we compute the following statistical features:

  1. The variance () measures the variability of the segment:

    (27)

    where are the time series values of the segment, and is the average of the values of the segment .

  2. The skewness (

    ) represents the asymmetry of the distribution of the time series values in the segment with respect to the arithmetic mean:

    (28)

    where

    is the standard deviation of the

    -th segment.

  3. The autocorrelation coefficient () is a measure of the correlation between the current values of the time series and the previous ones:

    (29)

Using these statistical features and the coefficients previously extracted, each segment is mapped into a -dimensional array (), which is used as the segment representation, where is the degree of the polynomial and is the number of statistical features (, in our case). The mapping is then defined by:

(30)

where are the parameters of the polynomial approximation that approximates the segment . This procedure is able to reduce the length of the segment from to .

Ii-A3 Segment clustering

A hierarchical clustering is subsequently applied to group all the segments of a time series, represented by the set of arrays . The main goal is representing all the time series with arrays of the same length and significantly reducing the size of the representation.

The hierarchical clustering used is agglomerative, using the Ward distance defined in [53] as the similarity measure. The number of clusters considered for segment mapping is , for all the datasets and time series. This value is found to be robust enough for extracting a minimum amount of information about the internal characteristics of the series.

Ii-B Second stage

The second stage of the method proposed consists of mapping the time series to a common representation, clustering them and evaluating its quality. The steps of the second stage are summarised in Figure 3.

Fig. 3: The second stage consists of four steps: firstly, each cluster is represented by a set of statistical features, which, in conjunction, represents the mapped time series, . Then, a clustering process is applied to mapped time series, clusters being denoted as . After that, the measurement of the clustering quality is performed, using different strategies based on internal indices to choose the best configuration of . Finally, an external index compares our approach to the ground truth.

Ii-B1 Time series mapping

The first stage transform each time series to a set of clustered segments. Now, a specific mapping process is used to represent all time series in the same dimensional space.

For each time series, , we extract the corresponding centroids from the process described in Section II-A3, , where , , being the number of clusters and being the number of time series. For each cluster, , we extract:

  • Its centroid , i.e. the average of all cluster points.

  • The mapping of the segment with higher variance, denoted as (in order to represent the extreme segments, i.e. the most characteristic segment of the cluster ).

In this way, the length of the mapped cluster is , where, is the length of both the centroid and the extreme segment. This process is applied to each cluster of each time series. The mapping process of a centroid can be formally specified as:

(31)

Apart from the representation of each cluster, two more characteristics of the time series are also considered:

  • The error difference () between the segment least similar to its centroid (farthest segment) and the segment most similar to its centroid (closest segment). We evaluate the error of a segment by using the Mean Squared Error (MSE) of the corresponding polynomial approximation.

  • The number of segments of the time series, .

The order in which the clusters are arranged in the mapping is important and has to be consistent along all the time series. This is done by a simple matching procedure, where the centroids of one time series are used as reference, and, for the rest of time series, the closest centroids with respect to the reference ones are matched together.

Once the matching is defined, each time series is transformed into a mapped time series , composed by the characteristics of the extracted clusters. Thus, the length of a mapped time series is , being the number of clusters, and being the number of the extra information for the time series, which is 2 in our case.

Ii-B2 Time series clustering

In this step, the algorithm receives the mapped time series and the clustering is performed, choosing again an agglomerative hierarchical methodology. The idea is to group similar time series in the same cluster. In our experiments, the number of clusters to be found, , is defined as the number of classes of the dataset (given that we consider time series classification datasets for the evaluation). In a real scenario, should be given by the user. This advantage is given to all the methods compared.

Ii-C Parameter adjustment

The TS3C algorithm previously defined involves only one important parameter that has to be adjusted by the user: the error threshold for the segmentation procedure, (see Section II-A1). We propose to adjust it considering internal clustering evaluation metrics (see Section I-B), which can be used without knowing the ground truth labels.

In this way, the algorithm is run using a set of values for this parameter, all these cases being evaluated in terms of these internal measures. Two different strategies are proposed to select the best parameter value:

  • Selecting the leading to the best Caliński and Harabasz index (CH), given that this index has been proved to be very robust [46].

  • Selecting the which obtains the best value for the highest number of internal measures. All the internal metrics defined in Section I-B are used in this case. We refer to this option as majority voting.

Iii Experimental results and discussion

In this section, the experiment results are presented and discussed. Firstly, we detail the characteristics of the datasets used in the experiments. Secondly, we explain the experimental setting. Then, we show the results and discuss them. Finally, an statistical analysis of the results is performed.

Iii-a Datasets

84 datasets from the UCR Time Series Classification Archive [54] have been considered. This benchmark repository (last updated in Summer 2015) is made of synthetic and real datasets of different domains. The repository was originally proposed for time series classification, so that each dataset was split into training and test subsets. However, for time series clustering, where the class label will only be considered for evaluating the clustering quality, we can safely merge these subsets. The details of the datasets are included in Table I. Also, we have computed the Imbalance Ratio (IR) for each dataset, as the ratio of the number of instances in the majority class with respect to the number of examples in the minority class [55]. Although the length of the time series is the same for all elements of each dataset of the repository, the TS3C algorithm could be applied to datasets with different-length time series.

Dataset #CL #EL LEN %IR Dataset #CL #EL LEN %IR
50words (50W) 50 905 270 MedicalImages (MED) 10 1141 99
Adiac (ADI) 37 781 176 MiddlePhalanxOutlineAgeGroup (MPA) 3 554 80
ArrowHead (ARR) 3 211 251 MiddlePhalanxOutlineCorrect (MPC) 2 891 80
Beef (BEE) 5 60 470 MiddlePhalanxTW (MPT) 6 553 80
BeetleFly (BFL) 2 40 512 MoteStrain (MOT) 2 1272 84
BirdChicken (BIR) 2 40 512 NonInvasiveFatalECG_Thorax1 (NO1) 42 3765 750
Car (CAR) 4 120 577 NonInvasiveFatalECG_Thorax2 (NO2) 42 3765 750
CBF (CBF) 3 930 128 OliveOil (OLI) 4 60 570
ChlorineConcentration (CHL) 3 4307 166 OSULeaf (OSU) 6 442 427
CinC_ECG_torso (CIN) 4 1420 1639 PhalangesOutlinesCorrect (PHA) 2 2658 80
Coffee (COF) 2 56 286 Phoneme (PH0) 39 2110 1024
Computers (COM) 2 500 720 Plane (PLA) 7 210 144
Cricket_X (CRX) 12 780 300 ProximalPhalanxOutlineAgeGroup (PPA) 3 605 80
Cricket_Y (CRY) 12 780 300 ProximalPhalanxOutlineCorrect (PPC) 2 891 80
Cricket_Z (CRZ) 12 780 300 ProximalPhalanxTW (PPT) 6 605 80
DiatomSizeReduction (DIA) 4 322 345 RefrigerationDevices (REF) 3 750 720
DistalPhalanxOutlineAgeGroup (DPA ) 3 539 80 ScreenType (SCR) 3 750 720
DistalPhalanxOutlineCorrect (DPC) 2 876 80 ShapeletSim (SHS) 2 200 500
DistalPhalanxTW (DPT) 6 539 80 ShapesAll (SHA) 60 1200 512
Earthquakes (EAR) 2 461 512 SmallKitchenAppliances (SMA) 3 750 720
ECG200 (EC2) 2 200 96 SonyAIBORobotSurface (SO1) 2 621 70
ECG5000 (EC5) 5 5000 140 SonyAIBORobotSurfaceII (SO2) 2 980 65
ECGFiveDays (ECF) 2 884 136 StarLightCurves (STA) 3 9236 1024
ElectricDevices (ELE) 7 16637 96 Strawberry (STR) 2 983 235
FaceAll (FAA) 14 2250 131 SwedishLeaf (SWE) 15 1125 128
FaceFour (FAF) 4 112 350 Symbols (SYM) 6 1020 398
FISH (FIS) 7 350 463 synthetic_control (SYN) 6 600 60
FordA (FOA) 2 4921 500 ToeSegmentation1 (TO1) 2 268 277
FordB (FOB) 2 4446 500 ToeSegmentation2 (TO2) 2 166 343
Gun_Point (GUN) 2 200 150 Trace (TRA) 4 200 275
Ham (HAM) 2 214 431 Two_Patterns (TWP) 4 5000 128
HandOutlines (HAN) 2 1370 2709 TwoLeadECG (TWE) 2 1162 82
Haptics (HAP) 5 463 1092 uWaveGestureLibrary_X (UWX) 8 4478 315
Herring (HER) 2 128 512 uWaveGestureLibrary_Y (UWY) 8 4478 315
InlineSkate (INL) 7 650 1882 uWaveGestureLibrary_Z (UWZ) 8 4478 315
InsectWingbeatSound (INS) 11 2200 256 UWaveGestureLibraryAll (UWA) 8 4478 945
ItalyPowerDemand (ITA) 2 1096 24 wafer (WAF) 2 7164 152
LargeKitchenAppliances (LAR) 3 750 720 Wine (WIN) 2 111 234
Lighting2 (LI2) 2 121 637 WordsSynonyms (WOS) 25 905 270
Lighting7 (LI7) 7 143 319 Worms (WOR) 5 258 900
MALLAT (MAL) 8 2400 1024 WormsTwoClass (WOT) 2 258 900
Meat (MEA) 3 120 448 yoga (YOG) 2 3300 426
TABLE I: Characteristics of the datasets used in the experiments. #CL: number of classes, #EL: number of elements (size), LEN: time series length, %IR: imbalanced ratio.

Iii-B Experimental setting

The experimental design for the datasets under study is presented in this subsection.

The degree of the polynomial of the least-square approximation is set to , given that higher order polynomials led to worse results. The number of groups for the segment clustering is , given that the nature of the different time series datasets seems to be very similar. The other parameter of the algorithm, has been adjusted using the two options described in Section II-C: (1) directly selecting the clustering leading to the best Caliński and Harabasz (CH) measure (TS3C), and (2) considering all the internal measures in Section I-B and applying a majority voting procedure to select the best one (TS3C). The range considered for the parameter is the following one .

The Rand Index (RI) is used as external measure for evaluating the results. The number of clusters (for the time series clustering stage) is set to the number of real labels in each dataset.

We compare our method against two state-of-the-art algorithms:

  • distance metric together with a hierarchical clustering algorithm (-HC) [18]. This method considers the negative intergroup variance () as the internal cluster validation measure to set the value (see Section I-A1). This is the best technique from those proposed in [18].

  • K-Spectral Centroid (KSC). This algorithm, proposed by Yang et al. [19], is able to find clusters of time series that share a distinct temporal pattern. See more details in Sections I-A1 and I-A2.

Because KSC algorithm is stochastic, it was run times, while the rest of methods (TS3C, TS3C and -HC) are deterministic (and they have been run once). The computational time needed by all the algorithms will also be analysed in this section.

Iii-C Results

The results of TS3C and TS3C are shown in Table II, including both the RI performance the computational time needed for the algorithms (average computational time in case of KSC). Note that for some datasets, the running time of -HC was higher than (maximum time of the rest of methods), so that they have been marked with “” and the results have been taken from [18]. As can be seen, we have included, as a subscript, the error threshold for the segmentation algorithm () of the best clustering configuration for the TS3C and TS3C methods (obtained using internal criteria).

From the results in Table II, the following facts can be highlighted:

  • Compared with -HC, TS3C obtains better solutions for datasets, slightly worse results for , and obtains the same solution for the remaining datasets. If -HC is compared with TS3C, our approach obtains better solutions in datasets, worse results for , and similar results for the remaining datasets.

  • Compared with KSC, TS3C leads to better solutions in datasets, while in the results are slightly worse. Finally, for the remaining dataset, the result is the same. When this method is compared with TS3C, better solutions are obtained in cases, slightly worse solutions are found for datasets, and no differences for datasets.

Analysing average performance, the mean RI values are , , and , for TS3C, TS3C, -HC and KSC, respectively.

Rand Index Time (seconds)
Dataset TS3C TS3C -HC KSC TS3C TS3C -HC KSC
50W
ADI
ARR
BEE
BFL
BIR
CAR
CBF
CHL
CIN
COF
COM
CRX
CRY
CRZ
DIA
DPA
DPC
DPT
EAR
EC2
EC5
EFC
ELE
FAA
FAF
FIS
FOA
FOB
GUN
HAM
HAN
HAP
HER
INL
INS
ITA
LAR
LI2
LI7
MAL
MEA
The best result is highlighted in bold face, while the second one is shown in italics
TABLE II: RI performance and computational time of the different algorithms for all the datasets. TS3C: TS3C algorithm using CH as strategy, TS3C: TS3C algorithm using MV as strategy, -HC: hierarchical clustering using distance, KSC: K-Spectral Centroid clustering algorithm. The best results for each dataset is highlighted in bold face. The parameter for each dataset is shown as a subscript for the TS3C and TS3C methods. (1/2)
Rand Index Time (seconds)
Dataset TS3C TS3C -HC KSC TS3C TS3C -HC KSC
MED
MPA
MPC
MPT
MOT
NO1
NO2
OLI
OSU
PHA
PHO
PLA
PPA
PPC
PPT
REF
SCR
SHS
SHA
SMA
SO1
SO2
STA
STR
SWE
SYM
SYN
TO1
TO2
TRA
TWP
TWE
UWZ
UWY
UWZ
UWA
WAF
WIN
WOS
WOR
WOT
YOG
The best result is highlighted in bold face, while the second one is shown in italics
TABLE II: RI performance and computational time of the different algorithms for all the datasets. TS3C: TS3C algorithm using CH as strategy, TS3C: TS3C algorithm using MV as strategy, -HC: hierarchical clustering using distance, KSC: K-Spectral Centroid clustering algorithm. The best results for each dataset is highlighted in bold face. The parameter for each dataset is shown as a subscript for the TS3C and TS3C methods. (2/2)

Iii-D Statistical analysis

Based on the previous results, we consider all datasets to apply a set of non-parametric statistical tests in order to determine whether the differences found are obtained by chance. Given that the mean values across all datasets do not follow a normal distribution, we run the Wilcoxon signed-rank test, which is a nonparametric test that can be used to determine whether two dependent samples were selected from populations having the same distribution

[56, 57]. This design for the statistical tests makes possible the comparison of the deterministic methods (TS3C, TS3C and -HC) with the stochastic method (KSC, for which the average RI from the runs is used).

Results of the tests made using average RI are shown in TABLE III. As can be observed, the differences are statistically significant for between TS3C and -HC, and between TS3C and -HC. Also, if we consider , the methodology TS3C is statistically better than KSC. Consequently, these results show that the proposed methodology obtains more robust results than these state-of-the-art alternatives.

TS3C vs -HC vs KSC vs -HC vs KSC vs KSC vs
TS3C TS3C TS3C TS3C TS3C -HC
z-score
p-value (*) (+) (*) (+)
* : Significant differences were found for 0.05.
+: Significant differences were found for 0.10.
TABLE III: Wilcoxon tests for the comparison of the different algorithms: adjusted -values, using average RI as the test variable

On the other side, results of the tests made using average computational time are shown in TABLE IV. In this case, considering , there are statistically significant differences between: TS3C and TS3C, TS3C and -HC, TS3C and -HC and, KSC and -HC. This means that both TS3C methods are more efficient than -HC, and that there are no significant differences when comparing them to KSC.

TS3C vs -HC vs KSC vs -HC vs KSC vs KSC vs
TS3C TS3C TS3C TS3C TS3C -HC
z-score
p-value (*) (*) (*) (*)
* : Significant differences were found for 0.05.
TABLE IV: Wilcoxon tests for the comparison of the different algorithms: adjusted -values, using average time as the test variable

Iv Conclusions

In this paper, we have presented and tested a novel time series clustering approach, for the purpose of exploiting the similarities that can be found in subsequences of the time series analysed. The method is a two-stage statistical segmentation-clustering time series procedure, TS3C, which is based on: (1) a least squares polynomial segmentation procedure, using the growing window method, (2) an extraction of features of each segment (polynomial trend coefficients, variance, skewness and autocorrelation coefficient), (3) a clustering of these features us