I Introduction
Spatiotemporal data arise in broad areas of engineering and environmental sciences. Data mining techniques have been used extensively for spatiotemporal analysis [atluri2018spatio]. Georeferenced time series are a subset of spatiotemporal data, where fixed locations over a geographical area observes some features for a time period in a synchronous way. Traffic data is a complex example of Georeferenced data, which is multivariate time series data, including the flow, speed and occupancy of a large number of sensors, and in which there are correlations and similarities in spatial and temporal neighborhood. Spatiotemporal analysis of traffic data have a pivotal role in future research to improve the performance of transportation systems [nagy2018survey], such as reducing traffic congestion and air pollution [chowdhury2017data], understanding the behaviour of a transportation network [rempe2016spatio], predicting traffic speed and flow [zang2018long], [asadi2020spatio], and detecting nonrecurrent congestion events [anbaroglu2014spatio].
The volume and variety of spatiotemporal data has increased with the advent of new sensing technologies, such as cameras, GPS and sensors [toch2019analyzing]. Increases in the volume of traffic data requires the development of largescale machine learning algorithms and big data analytics [zhu2018big], and datadriven approaches on traffic data [wang2016soft]. Deep learning models have been recently successfully applied on spatial and temporal domains [wang2019deep]. The models especially outperforms traditional machine learning and statistical methods on largescale data. Several studies shows the success of deep learning solutions, such as traffic flow forecasting [ma2020daily]
, missing data imputation
[chen2019traffic] and spatiotemporal modelling of traffic flow data [dixon2019deep]. Success of deep learning models in various domains along with the challenges of applying the deep learning models on spatiotemporal traffic data are the main motivations to further study the problem.Ia Clustering of traffic data
Spatiotemporal clustering of traffic data has been broadly studied with various goals. First, congestion detection and prediction can assist travelers and traffic management systems to improve the efficiency of existing systems. Second, detecting similarity in traffic patterns can help machine learning models to find similar regions in a transportation network. This can improve missing data imputation and traffic forecasting models, or can identify anomalies in the data.
In [cheng2018classifying]
, they propose an improvement of fuzzy kmeans clustering to classify traffic states into five groups ranging from mild to extreme traffic. Also, in
[celikoglu2014dynamic], a clustering of traffic flow data is obtained based on congestion levels. They describe clusters in the temporal domain based on levels of congestion. While these works cluster traffic data based on traffic congestion, they did not consider spatial domains in their analysis. Moreover, in [wei2020spatio], they propose a method to better understand how traffic conditions are correlated in spacetime. They cluster traffic data based on four congestion levels using an improved spatiotemporal Moran scatterplot. These works cluster traffic data based on level of congestion. However, we consider clustering of traffic data based on similarities of patterns, which can be more generalizable to various machine learning problems, such as traffic flow prediction and anomaly detection. In [anbaroglu2014spatio], they define a measure, called the Link Journey Time, and they obtain spatiotemporal clusters of nonrecurrent events. Each spatiotemporal cluster is a nonrecurrent detected event, where it represents neighboring spatial and temporal features. Their model consider a similarity measure to obtain spatiotemporal data. However, their model is only designed to find nonrecurrent events, and not generalized to find similar regions or temporal patterns.In [shi2019detection], they consider the problem of clustering of traffic flow data to obtain spatial and temporal similar patterns. They propose spatiotemporal clustering of traffic flow by considering topology of the network and similarity of time series data, where clusters are made by successive connections of neighbors. This work considers prior assumptions about the data and the topology of the network. A datadriven approach is expected to find spatiotemporal clusters without any prior assumptions, which would be more generalizable to different problems and scenarios [kim2017data]. In [cheng2007mining], similarities of urban traffic flow are explored with a discrete wavelet transform. In [chunchun2011traffic], they proposed a fuzzy clustering method on traffic flow segments. Dynamic Time Warping (DTW), as a temporal similarity function, is used to identify locations with temporal similarities. They consider the problem of clustering of time series segments. In [nguyen2019feature]
, they represent spatiotemporal data as imagelike representation. They propose a pointbased and segmentbased clustering of speed to represents classes of traffic congestion in spatial and temporal domain. A segmentbased clustering is a similar approach to our model, where we find the clustering based on time series segments. However, they use a filter to obtain features from computer vision. On the other hand, we consider a temporal similarity distance to represent similarities in traffic flow data. Moreover, they evaluate clusters by visually assessing the model’s output, where we use such a visualization method to represent interesting insight in the clusters of traffic flow data.
Similarity of traffic patterns not only detect traffic congestion, but also detect spatial and temporal heterogeneous neighborhoods. In [tang2019short]
, a kmeans clustering is applied to find traffic flow variations based on spatiotemporal correlations. The clusters of similar locations are the input to a neural network which predicts traffic flow with higher performance. In
[ku2016clustering], cluster of similar locations have been used with an autoencoder to impute missing values. In
[qiu2019traffic], a clustering method finds road segments based on their features and missing data imputation is applied on incomplete speed data. In [salamanis2017identifying], a clustering model is used to identify anomalies in traffic flow data. We consider the problem of discovering spatiotemporal similarities in traffic data. Clusters of traffic data, such as speed, flow and occupancy can represent levels of congestion. However, traffic flow data can be represents locations and time stamps with similar patterns with the goal of finding heterogeneous spatial and temporal domains.IB Spatiotemporal clustering with deep learning
Since we consider clustering of traffic flow segments, here we review some of the recent research regarding time series clustering and in the rest, we describe the literature review for deep learning models for clustering problems. In [aghabozorgi2015time], they describe a broad range of time series clustering applications. The main components of time series clustering are studied including time series representations, similarity and distance measures, clustering prototypes and time series clustering. In [soheily2016generalized]
, they describe the challenges of kmeans clustering with time warp measures. They propose weighted and kernel time warp measures for kmeans clustering. Their method has a faster estimation of clusters. Further investigation of time series clustering is studied in
[paparrizos2017fast]. These works illustrate that novelty in time series representations and distance measures are the main approaches of improving temporal clustering.There are a broad range of clustering models applied to spatiotemporal data, such as kmeans [huang2016time], DBSCAN [birant2007st], agglomerative clustering [yao2018stepwise], and matrix factorization based clustering [zhou2018visual]. However, increases in the size of datasets requires more scalable models such as deep learning models. When there is a huge dataset that includes data points with spatial and temporal properties, applying traditional clustering methods such as kmeans on traffic data is computationally expensive and can have poor performance [huang2016time]
. More efficient heuristic methods for kmeans clustering of traffic flow data have been studied
[tang2015hybrid]. Complex spatiotemporal patterns in traffic data necessitate further consideration of spatial and temporal information in the models. Deep learning models significantly improve performance of various machine learning problems, such as computer vision and natural language processing. Deep learning models have been broadly used for various largescale spatiotemporal problems
[wang2019deep], [asadi2019convolution]. Moreover, deep learning models for clustering tasks are broadly studied in [min2018survey]. Deep embedded clustering is primarily introduced in [xie2016unsupervised]. Variations of the model have been studied in broad domains. Joint training of the model to preserve the latent feature space structure is proposed in [guo2017improved]. In [yang2017towards], they analyze clusteringfriendly latent representations, which jointly optimize dimension reduction using both a neural network and kmeans clustering. While most of the research applies deep embedded clustering on images, there are few studies to show their performance on time series data. In [tzirakis2019time], they jointly cluster and train the model. They also segment time series data with agglomerative clustering. In [8970987], they propose a DEC with a cluster tree structure to dynamically obtain the number of clusters, while the original DEC has a fix number of clusters. However, these models do not consider any prior relation among the clusters. In [ren2019semi], they propose a variation of DEC, which considers the pairwise distance between data points. The model uses the prior distances as a measure to classify unlabeled data points. This work consider a relation among the clusters for unsupervised learning, and it is similar to our work, as we consider any prior relation among the clusters based on their geographical information. In
[madiraju2018deep], they evaluate various similarity metrics to obtain clusters with DEC. While these models consider temporal similarity in a DEC, there is a lack of development of a deep learning model where it not only finds clusters based on temporal similarity, but also prior spatial features would be considered.IC Contributions of the work
The aforementioned works on clustering of time series data show the importance, advantages and applications of applying deep learning models to cluster time series and spatiotemporal traffic data. It also shows that recently there have been several deep learning models developed for clustering problems, where their goal is to modify latent feature space. There are various works that develop deep learning models for clustering of time series data. Clustering of time series data finds cluster of a transportation network based on the similarity of traffic flow data [asadi2019spatio]. However, considering prior geographical information in designing clusters is a challenging problem.
In this paper, we focus on clustering of time series with prior geographical information. We propose a model where it modify latent feature representation based on geographical information. The model is a variation of DEC, where it finds spatiotemporal clusters by adding a new loss function to the model.
The contributions of the paper are as follows:

We formulate spatiotemporal clustering of traffic flow data as the clustering of time series segments.

A spatial deep embedded clustering (SpatialDEC) model is proposed which considers prior geographical information within the latent feature representation. To the best of our knowledge, this is the first work which considers prior geographical information to obtain spatial clusters with the DEC model.

We illustrate the application clustering of traffic flow data in transportation systems.

The spatiotemporal clusters obtained by deep learning models are evaluated on traffic flow data available in PeMS.
In section II, we describe the problem definition. In section III, we describe the technical background of the proposed model. In section IV, a deep learning model, SpatialDEC, is proposed for spatiotemporal clustering. In Section V the models are evaluated on traffic flow data. Section VI describes the conclusions and future works.
Ii Problem Definition
Spatiotemporal data is represented with a three dimensional matrix , where is the number of sensors, is the number of time stamps and is the number of traffic features, including flow, speed and occupancy. Each fixed location has its own multivariate time series data . A sliding window method, given a time window , generates a sequence of data points. In other words, the function receives input data and time window size , and outputs data points , which consists of all data points, time series segments, at time stamp and location , represented with . Throughout the paper, we represent each data point with two indices for location index and for temporal index. A clustering method assigns a data point into a cluster , where , and is the given number of clusters. While in alternative approaches, one can consider the problem of clustering of the whole time series , e.g. clustering of trajectory data [sabarish2018clustering], or sub sequences of spatiotemporal data . A clustering model finds similar data points based on a distance function, such as euclidean distance. Here, we define a temporal cluster, when its members have high temporal similarity, which can be obtained with a DTW distance function. It is desirable to have a more dense and compact temporal cluster. We define a spatial cluster, which includes location indices where their data points have high temporal similarity.
Traffic flow data is a spatiotemporal data. In Fig. 1, we represent an example of traffic flow for three sensors and three days. Finding temporal similarity among road networks is challenging with pointwise clustering. To prevent from the fluctuations in the clusters, we consider segmentwise clustering. In Fig 2, we describe a schematic representation of the input and output of the spatiotemporal clustering. The input data points are time series segments for location , three road segments, and time stamp , two time stamps. The selected data points are from PeMS traffic flow data, but the time stamps and road segments are arbitrary and are presented with the purpose of clarification of the problem definition. Each data point is a time series of length 12 in the figure. For 5min time stamps, each data point represents one hour of traffic flow data for one road segment. The horizontal axis is time stamps, and vertical axis is traffic flow, normalized in range . The three output clusters represents similar data points. The clusters represent similar patterns over different days and hours. They also represents the locations that are similar on a transportation networks.
This spatiotemporal clustering problem is challenging. First, the clustering method should consider both temporal and spatial similarities. Second, given a large number of time stamps, e.g. six months, and a large number of road segments of a city , a sliding window method generates of data points, which can be a very large number of data points. While kmeans clustering methods have been proposed for time series segments, their performance drops when they faced with large number of data points, and it has expensive computational time. Moreover, a kmeans clustering method has some limitations to be modified and consider both spatial and temporal similarities. Hence, in this work, we propose a deep learning model, SpatialDEC, to solve spatiotemporal clustering problem.
Iii Technical Background
Iiia Autoencoders
An autoencoder is primarily proposed in [vincent2010stacked]. It consists of an encoder and a decoder
, where the activation function and the dropout function are represented with
and , respectively. An encoder is the first neural network component, which reduces the dimension of input data to a latent feature space , where . The second neural network component is a decoder, which reconstructs the input data from its latent representation.In a deep autoencdoer, the encoder and decoder consist of several layers. The encoder and decoder are a symmetric and multilayered neural network. The loss function, e.g. mean square error, reduces the difference of input data and its reconstruction. In other words, the input and target data are both . For the given spatiotemporal data, the reconstruction loss function is as follows,
(1) 
where is the number of time stamps, and is the number of sensors or locations. Also, is the target of an autoencoder, which is the same as input data . Minimization of this objective function results in learning the latent feature representation of input data. We consider weight of for reconstruction loss throughout the paper, and in our representation of autoencoders the weight is .
IiiB Deep Embedded Clustering
A Deep embedded clustering neural network is introduced in [xie2016unsupervised]. The encoder transforms into latent feature space . The clustering layer is connected to latent feature layer. The weights of clustering layer are initialized with cluster centers obtained by kmeans clustering. Cluster center is represented with . Given as the number of clusters, and as latent feature size, the clustering layer is represented with a dense layer
. In other words, it converts latent features into a vector of size
, whichth element represents the probability that the data point is assigned to the cluster
.Given initial cluster centers , obtained by kmeans clustering, and latent features , a student’s tdistribution measures the similarity between cluster centers and data points as follows,
(2) 
where the degree of freedom of the student’s tdistribution is one. The probability of assigning a data point
to a cluster with center point is represented with . The assigned cluster is . The clustering algorithm iteratively adjusts clusters by learning from high confidence assignments. To learn from high confidence assignments, an auxiliary target distribution is as follows,(3) 
where is the number of elements in cluster . KLdivergence loss between and learns the high confidence soft cluster assignment,
(4) 
In [guo2017improved], they train the DEC with joint learning of clustering loss and reconstruction loss. In joint training, the loss function of the neural network on spatiotemporal data is as follows,
(5) 
where is the given number of clusters, is the weight of mean square error term and is the weight of clustering loss term. Minimization of the loss function in equation (5) results in learning the latent feature representation and the output clusters. The model receives input data and target data are for clustering layer and for decoder’s output. For DEC, the value of and represents the importance of each of loss functions. Higher value of reduces loss function for clustering, while higher value of better keep the structure of autoencoder’s latent features [guo2017improved].
Iv Spatial Deep Embedded Clustering
Here, we describe the proposed method for spatiotemporal clustering of traffic data. Algorithm 1 is the procedure of finding spatiotemporal clusters on traffic flow data.
A deep embedded clustering (DEC) receives data points, for all locations and time stamps , represented with . The encoder transforms each data point to its latent feature representation . Given the number of clusters , a kmeans clustering on latent feature representations finds mean of the clusters for latent features. The mean of a cluster is obtained by kmeans clustering and is stored into the clustering layer.
The data points with high temporal similarity are close to each other in the latent feature space, examined in Fig. 7. Hence, each cluster includes data points with high temporal similarity. However, not only the clusters should represents data points with high temporal similarity, but also they should consider data points of spatial neighborhood. If a cluster represent data points of locations, far from each other or distributed in a geographical area, it is not our desired cluster. Hence, our objective is to obtain clusters with both temporal similarity and spatial closeness. In the rest of this section, we describe our modification to deep embedded clustering and introduce SpatialDEC, its architecture is presented in Fig. 3. The objective is to modify the DEC’s loss function, so if and are close (or far from) each other, then their latent representations and are also close (or far from) each other.
First, we make the latent feature representations conditional to the prior geographical location. The proposed model needs the location indices as the input data, because it maps the data points to latent feature space based on both their time series values and location indices. For a given
sensors, we generate a onehot encoding of the locations
, where the location has the input vector , in which and the rest of the values are zeros. This computation of onehot encoding is obtained in the algorithm 1. The SDEC receives as the input data. The encoder outputs latent feature representations . Given the time series segment and its location the encoder outputs low dimensional representation of the data.Next, we add a loss function into the DEC, and propose SpatialDEC (SDEC), where its latent features are constructed given prior geographical information. We add spatial loss term, , to the latent feature layer. The encoder’s input are and . The encoder’s output is . The encoder’s output in the last training steps is stored in . In other words, the model uses as the target value for the latent feature layer. In Algorithm (1), it is shown that once the DEC obtains as the target value for the clustering layer, we also obtains as the target value for latent feature layer .
To implement the new loss function effectively, first we change the size of the input data. The goal is to have input data and target data for each pair of locations and . We repeat each data point, a row in , times. The new training data are stored in . Moreover, we reshape the latent features of the last training step, . Each block of rows of the are repeated times. The reshaped latent features are represented with . The SpatialDEC has as an input data point, and as the target for latent feature layer. After changing the size of input data and target data, for any given and its encoder’s output , there are target data for all . This modification allows the model to control the distance of latent features at location with all locations . The loss function optimizes the distance of to all previously obtained latent features at the same time stamp, represented with for all sensors . The loss function should increase (decrease) the distance of and , if location and are far from (close) each other. In the rest, we define the weight matrix which controls the distance of latent features.
We define a weight value for the loss function, represented with , where it is weight that represent the distance of two locations and . A transportation network can be represented with a graph, and is the adjacency matrix that represents distances of locations. Here we assume that all locations are on a line without loss of generality. We define as the weight matrix, which represents the spatial distance of locations, where represents the distance of two locations and . If the value is close to +1, then two locations and are close each other, and if the value is close to 1, then two locations are far from each other. Given as the ith row of , we define as the diagonal matrix of the elements of , that is all element except diagonal elements are zero. We obtain by aggregating for all locations on the first dimension. The calculation is represented with in the algorithm 1. The spatial loss term is as follows,
(6) 
In a backpropagation method, the gradients of the Equation (6) along with the gradient of autoencoder’s loss function are propagated in the neural network. We only describe the backpropagation for Equation (6). We refer the reader to [xie2016unsupervised] for further theoretical analysis of gradient propagation for DEC. Given as the spatial loss function, the gradient of Equation (6) is as follows,
(7) 
The model finds the gradient with respect to the encoder’s output
. In stochastic gradient descent algorithm, the gradient is propagated to update the weights of the neural network. The loss function is similar to
[ren2019semi]. They uses pairwise distances among clusters and apply the model on clustering of images for unsupervised learning.Here we describe the reason that the value of directly affects the structure of latent features and the clustering model. The encoder’s output for a given input data point is . The value of encoder’s output for last training step and same time stamp is stored in . Given the data point at location , the model considers a target value for all . Since the neural network minimizes the loss function, the value of controls the distance given the last estimation of for all . If has a positive value, then the loss value is positive for the distance of and . Training a neural network with such loss value reduces the distance of latent features of and . On the other hand, a negative value for increases the distance of latent features of and . In section 5.2, we validate the effect of new loss function on a sample data. The weight for spatial loss function, represented in Equation (6), is . In the experimental results, , and are the weights of spatial loss, Equation 6, clustering loss, Equation 4, and reconstruction loss, Equation 1, respectively.
In algorithm (1), we describe procedure of finding spatiotemporal clusters. Lines 29 are the preprocessing of a DEC model, which includes pretraining an autoencoder, kmeans initialization, and building a DEC model. In line 5, the pretraining of autoencoder is with loss weight values of , and . In line 11, the model finds value of latent features for the last pretraining step, and it follows by obtaining and as target values. Line 13 is the function in Equation (3), introduced in DEC. Line 14 generates batch size for input and output data. To clarify our notation, we illustrate the input arguments of a neural network with , and output arguments with , represented in .
Lastly, we analyze the computational time of the model. In traffic flow data, we have a large number of data points, , where it is the multiplication of the number of location and total time stamps. A kmeans clustering method finds the clusters in order of steps. In each step, the method requires to calculates DTW distance, where it is for time series segments of . The DEC model maps time series segments to a lower dimension . In our experiments, we consider and . A DEC model finds kmeans initialization by a subsample of data points, and train the model in steps, where
is the number of epochs. With minibatch gradient descent we expect to train the model in less than 100 epochs based on our experiments. In each step, a DEC model requires to apply backpropagation, where its computational time depends on size of the neural network.
V Experimental Results
Here we illustrate the results for clustering of traffic flow data. The deep learning model is implemented with Keras. We use a fullyconnected autoneocder with 7 layers. All of the layers have Relu activation function and dropout rate of 0.2. The number of hidden units are
in seven fullyconnected layers. The batch size of 288, one day with 5min time stamp, and Adam optimizer are selected.We compare the performance of three models, kmeans on latent features of an autoencoder, DEC and SpatialDEC. We also have three loss terms, spatial loss, clustering loss, and reconstruction loss, with corresponding weights , and , respectively. For kmeans with autoencoder, we have following weights, , and . For DEC, we have , and . For SpatialDEC, we have , and .
Va Traffic data
Traffic flow data are obtained from the PeMS [californiapems]. Traffic flow data is gathered from mainline loop detector sensors every 30 seconds and aggregated to every 5 minutes. The data are for US101 South and, I280 South and I680 South highways, in the Bay Area of California, which includes 26 and 16 mainline sensors, respectively, illustrated in Fig. 4. We represent the average of values on these selected data. We select the data for the first five months of 2016. The models are trained on the first three months, and evaluated on the next two months. In a preprocessing step, we rescale the data into the range of , and subtract each time window of size from its mean value. A time window of size 12, one hour, is selected. In the model we assume that sensors are on one line in a highway, and the average of results for these two highways is presented.
VB Validation of the spatial loss function
In SpatialDEC, we use the spatial loss function introduced in Equation 6. The loss function decreases (increases) the latent feature representations, if two locations are close (far from) each other. Here, we examine the correctness of the model by visualizing the latent feature representations. We consider the first six successive sensors on the highway US101S.
We obtain the spatial weights, which represents distances of location on a line. We need any arbitrary function to obtain weights in following ways. If two locations are close (far), their weights should be close to +1 (1). We assume that there are six locations on one line. We use a distance function of for , and zero for . Throughout the experiment, we notice that it better stabilize the clusters. As an example, the spatial weights for location one are , and for location two are .
We train the SpatialDEC on a sample data. Fig. 5 represents the latent features in two dimensional with different values of . We set to 0, 1.0, 10.0, in Fig. 5.a, 5.b and 5.c respectively. With , the model is an autoencoder without spatial loss. The data points are scattered in latent feature space without any prior geographical information. With a higher value of , we train the SpatialDEC model. The results is in Fig. 5.b, where the order of locations from 1 to 6 is preserved in the latent feature space. For a spatial loss weight of , the latent features are completely separable. The data points are mapped into latent feature with their corresponding order of distances, from top left to bottom right. In the rest of implementations, we use a the weight . it can be more comparable with DEC and kmeans based on temporal similarity and it also finds latent feature with spatial closeness. We also notice that an early stopping of the deep learning results can prevent from completely separable data points like Fig 5.c, and results in Fig 5.b.
Clustering evaluation (percentage results)  

Models  Sum Square Error to Mean of Clusters  Connectivity  disconnectivity 
kmeans  0.27  0.19  0.21 
kmeansDTW  0.28  0.17  0.19 
DEC  0.22  0.19  0.18 
SpatialDEC  0.23  0.45  0.11 
VC Analysis of temporal clusters
After pretraining the autoencoders, the first step in training SpatialDEC is to initialize the clusters with kmeans clustering. To obtain an appropriate number of clusters, we use an Elbow method, i.e. the optimum value can be obtained, when the reduction in inertia, as the sum of squares of data points, becomes linear, represented in Fig. 6, where we find 80 as the best number of clusters.
To show that latent features are directly related to temporal features, the latent feature representation of one sensor’s traffic states is shown for five weekdays in Fig. 7. A tdistributed stochastic neighbor embedding (TSNE) [maaten2008visualizing] method is used for representing latent features in two dimension, with parameters of 40, 300, 500 for preplexity, number of iterations and learning rate, respectively. The color of each data point represents the hours of a day. One day is grouped into 10 colors, for every 2 hours. This visualization of latent features show that the data points are distinguishable based on their time stamps and latent feature preserve temporal properties of data.
To represent temporal similarity in traffic flow data, Dynamic Time Warping have been broadly used [lv2020temporal]. In our problem, warping window size can be obtained in the range of 1 to 12, the size of time series segments, where its smaller value reduces the computational time. Comparing the value of Rand Index based on warping window is a method to obtain the best value of warping window [dau2018optimizing]. We notice that there is not a significant change in the clusters, obtained by kmeans clustering with DTW distance function, when we reduce warping window size from 12 to 6. Hence, we selected six as the best warping window. To show that latent feature space preserve the temporal distances, for any given data points, the euclidean distance of latent features is calculated. Also the DTW among their time series is calculated. The correlation between latent features and dynamic time warping is obtained. For latent feature size from 1 to 10, the correlation changes from 0.9 to 0.98. The maximum correlation between dynamic time warping and euclidean distance of latent features is 0.98 with latent feature 4. Hence, we selected a latent feature of size 4 in our analysis. We also conclude that the latent feature space preserve temporal similarity of data points.
In the rest, we analyze the clusters obtained by kmeans, DEC and SpatialDEC. Unlike supervised learning, in unsupervised learning there is not a clear approach to evaluate clusters. Hence, we describe properties that we expect to see in the clusters, and based on them define evaluation metrics. We expect to have clusters that they are compact and include data points with high temporal similarities. A featurebased clustering of time series data can improve the time series forecasting performance
[bandara2020forecasting]. Our clustering models are featurebased, where the clustering method is applied on the latent feature representation of data points. Hence, we define a temporal similarity measure as follows,where is the set of members of cluster , the mean of the cluster and is the element of the cluster. We consider the element which its latent feature is the closest to the mean of the cluster as or the mediods of the cluster.
In Fig. 8, we compare the compactness of the clusters using for DTWtime series, which shows the compactness of clusters based on temporal similarity. A more compact cluster better represents data points with temporal similarity. We compare the implemented DEC and SpatialDEC, where both have similar temporal similarity. It is important to show that while SpatialDEC finds more connected clusters, it does not significantly reduces temporal similarity in the clusters. The other model is kmeans, applied on latent feature of time series.
VD Analysis of spatial clusters
Here, we evaluate spatial clusters obtained by SpatialDEC. We define a spatial cluster as the set of locations which all have a similar assigned temporal cluster. In other words, for a given time stamp , all locations of one spatial cluster have equal assigned cluster . In traffic flow data analysis, a spatial cluster represents road segments with similar traffic flow patterns for a given time stamp. We define a connected spatial cluster , which includes location indices which have similar assigned cluster in a given time stamp and they all are neighbors. We define spatial connectivity as a evaluation metric for the analysis of spatial clusters. A spatial connectivity shows the total size of connected spatial clusters. If the size of connected spatial cluster of location is one, then it means that the assigned cluster of location is not equal to the assigned cluster of its neighbors. Such a clustering output is not desirable, as it cannot show similarity of locations. A desired spatial cluster should have high temporal similarity and spatial connectivity. We also mention that a high spatial connectivity can reduce temporal similarity, because the cluster includes larger road segments, which can have lower temporal similarity. A spatial connectivity is defined as follows,
(8) 
where is the size of connected spatial cluster.
On the other hand, if a spatial cluster includes location indices, disconnected in a geographical area, then the cluster is not desirable. We define a evaluation metric spatial disconnectivity as follows. For each location and time stamp , we define , as the set of location indices which have an equal temporal cluster to , but they are not in . The spatial disconnectivity is obtained as follows.
(9) 
Fig 9 represents the spatial connectivity and disconnectivity of the obtained clusters. Higher value of connectivity shows that data points of closer locations are assigned into same temporal clusters, which is a more desired cluster. On the other hand, lower value of disconnectivity represents that the clusters are not disconnected in a geographical area. The figure shows that SpatialDEC can significantly increase connectivity, and decreases the disconnectivity of clusters.
Overall, Fig. 9 shows that the clusters of SpatialDEC are more compact than kmeans in terms of temporal similarity. In other words, all the data points of one cluster have higher temporal similarity.
We define spatial metric as . Here, we represent that the mean of for all
obtained by SpatialDEC is significantly higher than DEC. The null hypothesis
is that the mean of SpatialDEC and DEC is equal. The alternate hypothesis is that their mean is not equal. Since,can be represented with a normal distribution with positive mean for both DEC and SpatialDEC. We apply a ttest on
obtained by DEC and SpatialDEC. The pvalue is 0.0012, where we can reject the nullhypothesis with significant level of . It shows that the increase in connectivity of spatial clusters is statistically significant.VE Analysis of traffic flow clusters
Here we visualize and further analyze clusters of traffic flow data. Spatiotemporal clusters shows that how road segments are similar over time periods, represented in Fig 10. The figure shows time stamps for one day on yaxis and location indices on xaxis. Each color represents the assigned cluster. To better visualize clusters and represents their similarities, we only consider 8 clusters. The areas with same colors have temporal similarity. The figure shows how 26 locations are similar over time periods.
The above representations shows spatiotemporal and spatial clusters. Our clusters are based on temporal similarity. If a data point is far from the mean of clusters, it means that the data point is rarely occur in temporal domain, or it is an anomaly. Here, we visualize such an example to clarify this interpretation. In Fig 11, the heatmap of distance of data points from the center of clusters is represented and we can see that a portion of values are far form the centers. Time stamps close to 45 and the first 4 locations have light values. Regardless of the reason of anomaly, we look at traffic flow values in Fig. 12, for the first four location. The area close to time stamp 45 includes a big reduction in traffic flow values. This could be the result of an accident; however, in this paper, we do not analyze the reasons behind anomalies and the performance of anomaly detection. These analysis shows different application and importance of having spatial, temporal and spatiotemporal clusters in a transportation network.
Vi Conclusion and Future Work
A spatiotemporal clustering is an important method for transportation systems. One of the challenging problems is to find spatiotemporal similarities in a transportation network. To obtain these similarities in traffic flow data, the problem definition is represented in Section II. Finding dynamic clusters of locations in a transportation network, illustrated in Section V.E, is necessary to analyze traffic congestion propagation and to improve traffic flow prediction and missing data imputation. Moreover, finding temporal patterns in traffic flow data is a method for more efficient prediction and detection of anomalies, illustrated in Section V.E. While these applications are important in transportation systems, there are few studies in the literature to develop deep learning models for spatiotemporal clustering of traffic flow data.
Increasing in the availability of traffic data requires further development of clustering models for complex and highdimensional data. In this paper, We propose SpatialDEC, a variation of Deep Embedded Clustering, to obtain spatiotemporal clusters, and illustrate its performance for finding dense and compact temporal clusters in Section V.C and spatially connected clusters in V.D. The contributions of this work are both in model architecture examined its validity in Section V.B and defining evaluation metrics for spatial and temporal clusters in Section V.C and V.D. The proposed model uses the loss function introduced in Eq. 6, and finds spatiotemporal clusters. Such a model can be useful not only for traffic data, but also for other spatiotemporal problems, such as environmental science and smart cities domains. Also, we consider a graphstructure for latent feature representation, which can be further studied in development of deep learning models for spatiotemporal data.
Comments
There are no comments yet.