I Introduction
In the event of increasing number of traffic on roads, it is difficult for the human operators to monitor and understand traffic behavior in realtime. Developing Intelligent transportation systems (ITS) can help the traffic administrations to learn the scene and detect anomalies efficiently.
In order to develop such systems, it is important to learn/model the traffic scenes from the data captured through sensors. With the widespread use of surveillance cameras, traffic video feeds are most useful sources to understand the scene. With the advances in machine learning algorithms, it is possible to infer hidden patterns using unsupervised learning techniques. Most of the existing unsupervised learning techniques require the number of patterns to be specified apriori for the learning of traffic patterns
[40]. Nonparametric modeling of data is one of the key features when the number of hidden patterns are not known a priori. Learning the normal trajectory patterns is the key to understand the traffic behavior at roadways. Some of the typical traffic situations such as detection of congestion, illegal stoppages of vehicles, converse driving, crashes, lane breaking, illegal Uturns, pedestrian movement on road can be categorized as anomaly [39]subjective to the context and application. If model of normal patterns of data can be learned, it will be easy to classify a new data as normal or abnormal. Moreover, the traffic behavior changes over time during the course of the day. Hence it is important for the model to be adaptive so as to discriminate the traffic behavior correctly.
Thus, an ideal system should be able to learn the scene in an unsupervised and nonparametric way. We propose an unsupervised method for learning the scene using video trajectories with nonparametric modeling based on Dirichlet Process Mixture Model (DPMM). We utilize the sequential dependencies among the trajectories that are typically ignored by the traditional clustering schemes
[10, 14] to achieve better performance without compromising the accuracy. The proposed model, we refer to as Temporally Incremental Gravity Model (TIGM) is temporally extended as a Dynamic State Model (DSM) to understand the dynamics of different traffic paths. This can help the traffic authorities to understand how the traffic changes during the course of time such that better transportation strategies can be devised. Knowing the traffic state of roadways can also help the authorities to do efficient monitoring, congestion management, traffic flow analysis, traffic signal management, traffic planning for road construction, etc.Ia Related Work
Trajectory learning and classification are two important components of video analytic methods. Trajectorybased methods have been used for flow analysis [21], trajectory classification [31, 4], activity recognition [32], interaction analysis [12], abnormal object detection [8] and object classification and tracking [26]. The aforementioned research works use supervised approaches where labeled datasets are required. Unsupervised approaches have been used for vehicle behavior understanding[29], trajectory classification [3] and trajectory clustering[33, 28, 16, 7]. These methods employ unlabeled dataset to cluster similar trajectories and use clustered data to train the models for classification. Incremental clustering has been used in trajectory modeling [16]
[35], where data is processed sequentially.Typically trajectory learning methods use extracted trajectory features for clustering. Meanshift clustering is used in [33] and [2]
with Discrete Wavelet Transform (DWT) coefficients and multifeature vector, respectively. Particle swarm optimization based clustering used in
[18] with Dynamic Time Wrapping (DTW) uses all the trajectory points. The work proposed in [41] performs trajectory clustering using instantaneous position as observation and trajectory as a mixture of topics to extract semantic regions in a scene.The work presented in [16] performs trajectory clustering using TimeSensitive Dirichlet Process Mixture Model (tDPMM) [46]. Authors of [43]
have used adaptive multikernelbased estimation together with Kmeans to produce accurate clusters. The work in
[13] uses Dirichlet Process Mixture model (DPMM) [36] to extract repeating patterns (Motifs) and their occurrence time in a scene.A learned model representing trajectory patterns can be used for different purposes irrespective of the training methods employed. Such models can be used for abnormality detection [35, 23, 19], or classification and abnormality detection together [21, 8]. Trajectory retrieval [16, 5] is another possible application. Learned models can be used for online classification and abnormality detection even with partially observed tracks. The problem has been addressed in [21, 8, 4]. This is important when timely actions are to be taken in response to an observed event.
However, conventional convergencebased methods [36, 27, 9] suffer from lower accuracy, especially for clustering sequential data. Thus, we propose a new clustering method that is accurate and fast as compared to the stateoftheart techniques when applied on trajectory as well as other domain data.
IB Motivation and Research Contributions
A careful observation of start and end points of tracks shown in Fig. 1 reveals that they follow typical distributions and each of the distributions are distance apart. If the likelihood function can model the points in close proximity as a cluster, grouping of tracks can be done easily in a nonparametric way. Conventional DPMM, as expressed in (1  4) can be an ideal model for nonparametric modeling of data to learn distributions.
(1) 
(2) 
(3) 
(4) 
Here,
is a random variable representing the data and
corresponds to the latent variable representing cluster labels, where and is the number of data points. takes one of the values from , where is the number of clusters. , referred to as mixing proportion, is a vector of lengthrepresenting the probabilities of
to be . is the parameter of cluster and denotes the distribution defined by . denotes the concentration parameter and its value decides the number of clusters formed. Initially, we pick from a Discrete distribution given in (1) and then generate data from a distribution parameterized by as given in (2). Here, is derived from a Dirichlet distribution as given in (3) and is derived from distribution of priors as represented in (4). The model [22] is graphically presented in Fig. 2(a). In our earlier work [37], we proposed a method to express the concentration parameter () in terms of distance measure using the deterministic relation , where is referred to as concentration radius.We cannot directly use the modified DPMM as shown in Fig. 2(b) for representing trajectory data as timing information is not considered in the model. In our proposed method, rather than representing the trajectories as a collection of unrelated data, we consider their temporal correlations. Moreover, as the concentration parameter is represented in terms of distance function, it can provide valuable intuition on how to choose the parameters for learning a traffic scene. It has been observed that, a single iteration of Gibbs sampling [34] is sufficient to associate tracks to clusters by learning the most frequently used paths in a given scene. In the proposed model DSM, learned parameters change temporally, thus allowing us to detect unusual events without the need to go through the entire data, unlike other statistical methods [20, 10, 14].
What can be categorized as normal events is highly contextual and it has a correlation in temporal domain. For example, during busy traffic, speed of a vehicle can be reasonably slow. If a person drives significantly faster as compared to other vehicles, that can be termed abnormal. However, in a sparse traffic, similar speed can be termed as normal. This is referred to as scene dynamics and it has been diligently addressed in our model. Another important characteristic is the temporal aspect, i.e. objects moving in a neighborhood around the same time have similar temporal correlations. To illustrate this, we assume traffic movements at a traffic junction signal. When the signal turns green, vehicles at the front of the queue starts moving followed by the vehicles behind. Similarly, there is a temporal relation between any moving objects entering and exiting the scene. These aspects are considered in the clustering, unlike other nonparametric, unsupervised methods such as DBSCAN [14] or mean shift [10].
Our main objective is to understand the spatiotemporal characteristics of the frequently present patterns from the trajectories obtained from a scene. This can be a possible building block for traffic behavior analysis. While accomplishing the above objective, we have made the following contributions:

We propose a DPMM guided model that is referred to as Temporally Incremental Gravity Model (TIGM) and an inference scheme to create spatiotemporal clusters using conditional dependencies among the trajectories.

As typical DPMM clustering depends on the concentration parameter () which is difficult to estimate, we have proposed a method to represent alpha using a distancebased method.

We also show how the proposed model can be used to estimate traffic state using Dynamic State Model (DSM) that has been built with the help of TIGM. We also discuss how DSM can be used for designing realtime traffic analysis frameworks.

It has been shown in this work that the proposed TGIMDSM model can be used for clustering sequential data of other domains.
Rest of the paper is organized as follows. In Section II, rationale for building the proposed methodology is discussed. In Section III, we discuss experimental setup, datasets, parameters and analysis of results. In Section IV, we discuss other relevant methods in comparison to the proposed method followed by complexity analysis. We also discuss a few limitations of the proposed methods. Section V concludes our work with peek into the future directions of our work.
Ii Proposed Methodology
Firstly, we discuss the terminologies used in the paper. The words observation and data are used interchangeably. Here, they represent the tracks or trajectories. Similarly, we refer cluster or topic to denote distribution of data and is represented by a label. They indicate the most frequently used paths. A model is used to represent a realworld phenomenon. We use graphical model [22] for representing the mixture models. A graphical model represents the generative model of the data. A parametric model has a fixed number of parameters, while the number of parameters grows with the amount of training data in nonparametric models. We follow an unsupervised method for learning a scene and our model is nonparametric. This characteristic is essential to address the scene dynamics, which is defined later.
Iia Temporally Incremental Gravity Model (TIGM)
We have the following observations/assumptions to apply the proposed model for traffic analysis:

The start or end points of the objects in motion constitute a distribution of data in 2D.

If tracks can be clustered based on the proximity, frequently used paths can be found.
In order to explain the TIGM, we discuss the inference process. Here, represents trajectory and be the corresponding cluster label. Our goal is to find for all tracks. corresponds to a discrete distribution and each has a set of observations associated with it. The cluster has a proportion of observations. It can be observed that, is surely associated with one cluster of unknown distribution parameterized by . Here, and represent mean and covariance of the distribution.
The inference method can be written using (56) [34], where corresponds to the parameters excluding data. Here, (5) represents the probability of an observation becoming a new cluster while (6) represents the probability of an observation belonging to one of the K existing clusters. corresponds to the likelihood of in . , when no observation is considered. denotes the number of observations for which clustering label is already assigned, excluding the observation . represents the concentration parameter of the Dirichlet distribution. These equations can be viewed as the posterior represented in terms of likelihood and prior. In other words, the probability of an observation belonging to a cluster (given other parameters) is proportional to the probability of the observation being generated from the cluster multiplied by the proportion of the observations present in the cluster.
(5) 
(6) 
We select a likelihood function in such a way that the probability of an observation to be associated with a cluster increases when the observation is nearer to the cluster mean. The association probability reduces as the observation goes away from the cluster mean. The exponential decay function follows the above characteristics [37]. Here, represents the distance of from . The inference equation can be written as (7). The condition, denotes the probability of the observation forming a new cluster and the later part signifies the probability of going to one of the existing clusters.
(7) 
When an observation forms a new cluster, the likelihood function as the distance of an observation to itself is . The inference equation can be further simplified to (8) by removing the proportionality symbol and common denominator , where is normalization constant and = . We call as the concentration radius. This inference process is derived without the assumption , unlike [36, 34, 17].
(8) 
Above equation is the key to find the value of the concentration parameter () as can be expressed in terms of distance from the center to a point at periphery of the distribution. If an observation forms a new cluster, it has to be at an infinitesimal distance () higher than that of the distribution’s periphery from the mean (). This implies, . Therefore, the value of can be estimated using (9).
(9) 
It may be observed that, can be ignored as . , yields . Once the initial clustering is done using an appropriate , actual can be obtained and with some finetuning of , better clustering results can be produced.
The distance () can be euclidean distance, mahalanobis distance, dynamic time warping (DTW) [6]
, etc. We have used Euclidean distance even though mahanlanobis can also approximate elliptical distributions. This is done intentionally to limit distribution variance in
plane. For realtime road traffic applications, DTW can be costly as the distance calculation itself is , whereis the number of track points. However, DTW can be useful in applications such as handwriting matching, signature matching, etc. To simplify the model, we learn the posterior probability, not the likelihood function. The model is represented graphically as shown in Fig.
3(a).We refer the model as Temporally Incremental Gravity Model (TIGM) for the following reasons: (i) The clusters are able to attract more observations from the neighborhood. This is not only due to the spatial proximity, but also for incremental temporal order. Thus, clustering does not strictly become bounded by distance. The gravitational strength (a large group of closely appearing observations attracts more nearby observations) is also increased with more observations getting added to the cluster. In order to illustrate this idea, let us consider a list of points equally apart in a straight line segment as shown in Fig. 3(b). Let us visualize how the mean of the points will shift if mean calculation is done incrementally. If we decide to cluster the points strictly based on a radius of half the length of the line segment, it may form more than one clusters based on the order of sampling. If done incrementally, it will form exactly one cluster. Moreover, even if we consider a radius less than half the line segment, our model can form a single cluster as mixing proportion factor attracts more observations to the cluster. As explained earlier, when a signal turns green at a traffic junction, vehicles at the beginning usually move earlier. Unless front vehicles move, vehicles behind cannot move. In real life situations, this is exactly how many objects moves. This is the rationale behind modeling the conditional dependencies in TIGM. It means, clustering based on temporal order of their arrival can yield the best results as we consider the dependency of observations. In typical statistical approaches, where all observations are considered for clustering, multiple iterations of sampling may be required to achieve similar clustering results. (ii) Temporal information is maintained as inference is done in an incremental fashion. This model is different from DPMM as we represent the temporal correlation between observations and the corresponding latent variables using the conditional dependence. The generative process can be represented using (1013).
(10) 
(11) 
(12) 
(13) 
IiB Dynamic State Model (DSM)
TIGM cannot give instantaneous state (for example highly crowded, moderately crowded, less crowded) of a scene as scene dynamics is not included in the model. We have extended the model to reflect the temporal states of a scene. Before proceeding further, we quantify what we referred to as scene dynamics. We denote it by subsequently. We define as a parameter vector of length such that . Here, represents cluster dynamics in the temporal segment of length for the cluster . It is defined as , where denotes the mean of the cluster , denotes its covariance and denotes the number of observations in cluster . To complete the model, we have the following additional assumptions:

When data is captured from a static camera, the cluster statistics do not change significantly in short duration.

If we take longer intervals, the traffic state can be completely different and clustering over a longer windows can be meaningful in realtime monitoring system.

If we fix the time segment over which the clustering is done, with prior information about the learned cluster, semisupervised learning can be used for clustering and classification of trajectories in subsequent temporal segments.
Based on the above assumptions, new trajectories can be added and old trajectories that are not in the time segment can be removed from the cluster during learning. If Gibbs sampling is performed using as a prior for the frame, cluster labels can be maintained between consecutive frames. The inference can be done using (14) with exactly one iteration of the Gibbs sampling at a specified time interval. The rationale behind using a single iteration is to make sure that the temporal aspects are not lost in clustering. If we do resampling at specified intervals, stabilization of scene dynamics can be achieved. Here, is different from the discussed earlier. It represents the set of all cluster assignments except for , such that it excludes the observations and corresponding cluster assignments before . is the parameter representing the distribution corresponding to cluster by excluding observation . Here, is the mean of distribution. denotes the number of observations in and is the normalization constant.
(14) 
Our proposed model is shown in Fig. 4. It can be represented using (1518). Here, corresponds to the observation at time excluding the observations before and corresponds to the latent variable representing cluster labels, taking one of the values from . is the number of observations and is the number of clusters at . is a vector of length . represents the mixing proportion of observations among the clusters. is the parameter of cluster and denotes the distribution defined by . First, we pick from a Discrete distribution given in (15). The data is then generated from a distribution parameterized by as given in (16), where is derived from a Dirichlet distribution as given in (17). is derived from another distribution of prior as represented in (18). The model is different from the original TIGM shown in Fig. 3. Here, and . This is meaningful as the distributions do not change significantly within short intervals. For example, vehicle density usually does not very significantly within short interval (in a span of 45 minutes) during peak hours. Thus, there is a conditional dependence of both and in between successive time slots.
(15) 
(16) 
(17) 
(18) 
The inference method for cluster assignment uses Gibbs sampling [34]. The process is described in Algorithm 1. Here, can be , where and represent start and end positions and represents the length of the track. The rationale for using only these features (start and end points, and trajectory length) is that, in a road traffic scenario, most of the vehicles follow typical patterns. Therefore, those following similar patterns will be closely distributed in a multidimensional space. also can be , , and a combination of both. Then, it can be used to find the tracks originating at a particular position or/and ending at a particular regions.
Iii Experimental Results
In this section, we present results obtained using the proposed model applied on various public datasets.
Iiia Experimental Setup and Datasets
We have used OpenCV to implement the proposed framework. Experiments have been conducted on four publicly available traffic video datasets, namely UCF [1], QML [24], MIT [41] and Grand Central Station, New York (GCS) [45]. UCF is a traffic junction video of 1 minute duration captured from the top of a building covering the vehicle movements across different roadways of a complex junction. QML is a video dataset of approximately 50 min duration of a busy traffic junction containing vehicles and pedestrian movement containing many anomalies. We have extracted the trajectories of the above datasets using context racker[11]. For TIGM experiments on QML, we have the trajectories of first 10 minutes duration. MIT is a video trajectory dataset of a parking lot captured from the top of a nearby building. GCS is also a trajectory dataset of containing the tracks of people motion at a busy underground metro station, namely Grand Central Station, New York. DSM experiments on QML dataset is conducted by extracting the trajectories for the entire video duration using [44].
IiiB Variation
Deciding a correct value of for clustering has to be more intuitive for a particular scene. From our analysis, we have observed that for start/end pointsbased clustering, initial value can be the approximate radius in the XY plane corresponding to the start/end point distribution. More than , choosing correct value has bigger influence in clustering, as is very small compared to . Finetuning has been done using the term depending on the number of trajectories in the clusters. For higher dimensions, initial value is set as two times (for additional two dimensions) or three times (for additional three dimensions) the finetuned value that has been obtained using previous step. Further finetuning is done similar to the method applied for lower dimensions.
As we have replaced the concentration parameter () with the concentration radius (), it is important to see how variation influences the clustering results. The results shown in Fig. 5, demonstrate that there is in inverse relation between and the number of clusters formed. However, the relation is more intuitive as compared to that of the relation between and the number of clusters.
IiiC Convergence Vs. Correctness
The experiments on three datasets, namely QML, UCF and GCS reveal, in continuous monitoring applications, temporal correlation of the trajectories is more important than the convergence. To illustrate this point, start pointbased clustering is applied on QML dataset. The results of TIGM without and with convergence are presented in Figs. 7 (a) and (b), respectively. It reveals, with convergence, some tracks of cluster form a new cluster () as shown in Fig. 7 (d). Similarly, track becomes part of new cluster . Though this grouping can be acceptable due to the remoteness of track from cluster , tracks of cluster should logically belong to cluster . Such incorrect clustering happens due to the loss of temporal information with convergence. The convergence plots for QML, UCF, and GCS datasets are shown in Fig. 6. Though convergence using multiple iterations is faster, clustering results are not meaningful. Hence, single iteration of Gibbs sampling has been adopted.
IiiD Experiments on Vehicle Trajectory Datasets
Clustering results are presented in Figs. 8  11. As expected, variation gives different groupings of tracks. However, acceptable values can be derived as discussed earlier. The tracks are incrementally clustered to make sure subsequent tracks following similar same path are grouped together. This happens due to the neighboring positions of the new tracks with that of previous one in the multi dimensional space. The clusters with more number of tracks form Motifs (frequently occurring trajectory patterns) [13]. Individual clusters are shown with color gradients indicating the motion direction. Tracks start with red, goes through a change of color and ends with pink to demonstrate temporal effect. This representation is used to give clear idea of the direction of the trajectory and are useful in representing Motifs as discussed in [13].
In Figs. 8  9, clustering has been performed considering all five dimensions of the tracks. The results clearly show that trajectories in a cluster are similar in nature. Though a few clusters in Fig. 9 look similar, they have been grouped separately due temporal variations. It has been observed that the number of clusters formed in MIT dataset is high. This is due to a large number of entry and exit points in the parking area, where many vehicles did not strictly follow the paths while moving in or out. If there is no restrictions on the paths to be taken by the drivers, clustering may be performed based on entry or exit points to find out most frequently used entry and exit points, as given in Fig. 11.
IiiE Experiments on Crowd Dataset
Applicability of the model on crowd datasets has been tested to understand the underlying patterns of crowd movements. Experiments on a difficult dataset like GCS reveal that the proposed method can produce interesting results, including detection of frequently used paths, unusual patterns of people movement, most frequently used entry and exit points, etc. We have used the tracks of length larger than to avoid the truncated trajectories. Clustering results as demonstrated in Figs. 12  14 reveal interesting patterns including most frequently used entry and exit points. In Fig. 12, we present the clusters obtained considering all five dimensions. Strength of a pattern is indicated by the number of tracks present in it. It also indicates whether the pattern is dense or less frequently appearing. The clusters with lesser strength indicate anomalous or rare behavior. Since the space covered in the scene does not restrict movement of people, start or end points guided clustering can reveal important information about the entry and exit points in the scene as shown in Figs. 13  14.
IiiF Experiments using DSM
In order to conduct experiments for detecting scene dynamics, we have used GCS and QML datasets. In order to depict the appearance of the tracks temporally, a plot of the frequency of new tracks vs. appearance time is shown in Figs. 17  17 for the GCS dataset. The inference is applied based on start points by taking a time segment () of K frames for the DSM model. The method reveals the scene dynamics representing different underlying states of the scene in temporal plane. This information can be used for crowd management at the station. For example, deciding the number of service personal for issuing tickets or better scheduling of trains can be done based on scene dynamics. The results of clustering, when observed in conjunction with the video, clearly reflect the scene dynamics. In order to illustrate the results, we have taken three clusters corresponding to three entry points in the scene. Fig. 17 shows the dynamics of different entry points, i.e. , at equally spaced four time segments. Figs. 17 (b)  (e) reveal that, the crowd density reduces by and again increases slightly by at the bottomright entry point. However, the dynamics do not change significantly for the clusters originating at the middlebottom entry point as can be seen from Figs. 17 (f)  (i). Density is observed to be highest among all the entry points. Clusters corresponding to the topleft entry point are lightly dense and the dynamics are not changing over time as can be seen from Figs. 17 (j)  (m). It reveals that, traffic inflow is lesser at this entry point irrespective of time. The above mentioned behavior can be verified from the histograms of arrival frequency, as shown in Fig 17. This means, our model is able to represent the scene dynamics correctly as seen in Fig. 17. In this experiment, we have considered entry pointbased clustering on DSM. However, this can be extended by employing a combination of all five dimensions to get the states of traffic.
For the QML dataset, we have used a of K frames. Noisy trajectories are removed so as to contain only the vehicle trajectories. The results of the experiments are presented in Figs. 18  19. The results reveal important information on how the traffic changes over time at the junction. This information can be used effectively for better scheduling of traffic signals apart from sharing the information to the subsequent signals in a sequential camera network.
Here, we want to highlight how cluster dynamics represented in terms of can be useful for futuristic traffic applications as it represents a set of features. Hence it can be further grouped using different features such as traffic speed, traffic density, spread of trajectory distribution to get meaningful interpretation of traffic scene. For example, traffic can be classified as low density, medium density or high density. Traffic at a particular roadway can be observed on how the densities change over time. This can help the authorities for better planning of road traffic. This means, cluster dynamics can be mapped to states (low, medium or high), to reflect traffic speed in a path, traffic density, degree of chaos based on spread of distribution, etc. Clusters can be collected for all time segments () of the day based on the trajectories found from a scene. Using as an observation, clustering can be done to learn different states either by using hard threshold or by using unsupervised techniques, such as Kmeans [20]. Once the state parameters are known, dynamics can be mapped to different states for meaningful interpretation of traffic scene. Such analysis can help the traffic authorities for better traffic situation assessment, transportation planning, scheduling of traffic at junctions, routing of traffic through different roadways, etc.
IiiG Model Evaluation and Comparison
Next, we compare the clustering approach with two unsupervised and nonparametric approaches, namely mean shift [10] and DBSCAN [14]. We have applied clustering on QML dataset based on start point. The results are shown in Fig. 20. Our method outperforms these techniques in most cases. With proper tuning these methods can perform similarly. However, in some cases, these techniques fail to include all trajectories in the right cluster. For example, as depicted in Fig. 20, track is getting clustered into cluster 10. Our proposed method resolves such cases. However, we cannot claim it is incorrect clustering (done by DBSCAN) since the criteria used are different. Similarly, in mean shift clustering, tracks and are present in cluster . Even though track can be logically grouped to cluster , in TIGM and DBSCAN, it forms a new cluster. It may be wrong to group track with cluster . Ideally it can be part of cluster , as in TIGM and DBSCAN.
In order to evaluate these methods quantitatively, we have conducted experiments to calculate learning accuracy, precision and recall. Firstly, logical grouping of clusters has been done by an independent evaluator to prepare ground truths. Ground truths correspond to unique trajectory patterns that are possible in a typical traffic scene. For example, in the case of road ground truth represents the trajectory patterns on the road. While in GCS, they are the common movement patterns between a pair of source and destination. Wrong inclusion of a trajectory in the predicted group is categorized as False Positive (FP), and wrong exclusion from a coherent group is categorized as False Negative (FN). Rest of the trajectories are either True Positives (TP), or True Negatives (TN). The computational analysis on QML dataset using start point based clustering is depicted in Figs.
24  24. Fig. 24 reveals that TIGM takes linear time. The plots are shown to demonstrate the predictability of our model. It is observed that DBSCAN matches with TIGM in cluster prediction with respect to ground truth and their accuracies. Though the cluster prediction of mean shift gets better with more tracks, accuracies are found to be unpredictable.Table I summarizes the experimental results in terms of its accuracy, precision and recall. For UCF ( trajectories) and QML ( trajectories), all five dimensions are used. In MIT ( trajectories) and GCS (initial trajectories) datasets, we have omitted the duration (), as it was difficult to clearly discriminate between shorter and longer trajectories with same start and end positions. The parameter values are adjusted to create common patterns corresponding to source destination pairs such that the methods can be compared across various datasets. It may be observed that the performance of TIGM is close to that of DBSCAN when applied on road trajectories. However, in GCS it performs much better as compared to both DBSCAN and Meanshift. From the analysis, it has been found that DBSCAN performs poorly due to the spread of starting and ending points in the scene.When the trajectories are dense at a particular locality, DBSCAN typically groups the nearby trajectories to form bigger groups as illustrated in Fig. 25. By lowering , such effect can be avoided. However, this leads to higher number of cluster formation causing accuracy reduction due to higher false negatives.
Dataset  Parameter  

Method  A (%)  P (%)  R (%)  
TIGM ()
DBSCAN (, ) Meanshift () 
97.37
98.68 97.37 
98.67
100 100 
98.67
98.68 97.37 

TIGM ()
DBSCAN (, ) Meanshift () 
92.47
91.10 89.73 
95.74
97.08 97.04 
96.43
93.66 92.25 

TIGM ()
DBSCAN (, ) Meanshift() 
90.00
91.28 80.28 
96.43
94.18 94.31 
93.10
96.74 84.36 

TIGM ()
DBSCAN (, ) Meanshift() 
84.80
47.20 67.80 
94.43
80.55 85.39 
89.26
53.27 76.70 
: Quantile). Note: normalized features have been used used for DBSCAN and meanshift clustering
IiiH Discussion of Results
Trajectory based learning is one of the most common ways to analyze and interpret activities in a scene. From the experiments, it has been observed that the proposed method is able to learn unique trajectory patterns in the scene. Using a semisupervised approach (by preinitializing normal clusters with representative trajectories), the learned patterns can be identified as normal or abnormal using posterior probabilities represented in (8). For a new trajectory, classification and behavior analysis can be done using the learned model. With more specific features such as velocity or curvature, activity detection and classification can be done. Statistics corresponding to unique trajectory patterns can be used by traffic authorities for better planning of road networks, safety analysis, risk assessment, etc. The Temporal extension of TIGM, i.e., the DSM reflects the traffic states in the temporal domain. This is useful in traffic flow analysis, congestion study, traffic density analysis, adaptive traffic signal scheduling, traffic rerouting strategies as the cluster dynamics provide valuable traffic information about the value at specific paths, entry and exit points. For example, cluster density at a traffic path can be used for better signal duration prediction for a particular lane, while traffic volume information at the start or end points can be used for better traffic routing at the preceding or proceeding traffic signals with suitable communication through centralized/ Internet of Things (IoT) devices.
IiiI Performance Comparison
In addition to nonparametric clustering algorithms, we compared the performance of other unsupervised methods, namely Minibatch Kmeans (MKmeans) [38], Ward [30], Affinity propagation (AP) [15], Spectral [25]
, Gaussian Mixture Model (GMM)
[27]. TIGM has been found to be very fast as compared to other unsupervised clustering methods as shown in Fig. 26.IiiJ Experiments in Other Domains
Due to its generic nature, DSM can be used in different domains. It can be applied where data points have temporal relations and the dynamics change over time. This includes grouping of people discussing similar ideas around the same time in social networks or over phone calls, mobile devices with GPS over a geographical area for giving better directions to the users to avoid crowd in roads; analyzing how network bandwidth is varied over time; calculating radiating power required for handling cell phones based on the number of people in cell, etc.
We have carried out an experiment on the data usage of different customers of an ISP over time using the proposed DSM. We have used a publicly available dataset in the form of pcap file [42]. We have mapped the source IP to a customer and the average data usage to be the value of random variable , as this information is readily available with the ISP. Fig. 28 shows a typical representation of the network considered. There are source IPs corresponding to the customers. It may be observed that the data usage changes over time for different customers. We have used a random time s segment length to demonstrate the scene dynamics. Fig. 28 shows how the user behavior changes for a particular cluster over time. ISPs can use this for adaptive scheduling of network bandwidth for users when some customers are not using the bandwidth.
Iv Discussion
Iva Key Aspects: TIGMDSM Vs. Relevant Methods
In spite of producing strikingly similar results in terms of coherent groups, proposed TIGM has other advantages over existing methods. Our method clusters the trajectories in time, while mean shift takes for lower dimensions the time complexity tends towards . Here, is the number of observations and the number of clusters. DBSCAN does clustering in in the average case and in the worst case. DBSCAN and mean shift clustering algorithms are statistical in nature, i.e. they consider all data at a time. Such methods are useful for data mining related analysis. Though our method can be used in a statistical way by going for iteration convergence, temporal information may be lost in the clustering process. DBSCAN and mean shift have significance in clustering in a wide variety of applications. In traffic monitoring applications, trajectories arrive in temporal order, one at a time, making the incremental model ideal to represent scene dynamics. Moreover, by introducing the parameter (concentration radius) as a distance, it is intuitive to estimate the parameter unlike DBSCAN or mean shift. Also, proposed clustering starts with no prior (in the inference), rather we build the prior incrementally during the clustering process. Inference mechanism allows us to unassign any observation without affecting the cluster labels. This is the key in building the DSM that can predict/describe states of the scene across different time frames. A model supporting different states is a key feature of the proposed method as there can be different rules at different time segments of the day. Thus, abnormal activity detection is more adaptive rather than following a strict threshold in traffic monitoring applications. Moreover, there is no need to map the cluster labels as they are maintained in the temporal plane.
Like DynamicDualHDP[41], the proposed DSM is able to model a scene dynamically, thus catering to learning scenes dynamically. Moreover, only essential information need to be stored using the proposed method, rather than storing complete trajectory. However, DynamicDualHDP looks for convergence, thus loosing some important temporal characteristics. Unlike [46], we handle temporal aspects inherently by using observations in temporal order. In the work [16], though a new model based on DFT features ha been introduced, it does not explore much on trajectories from different contexts. We have done detailed semantic analysis using only five feature points. Moreover, the proposed method has the capability to handle scene dynamics as done in DSM with a simple extension of TIGM. Even without handling the shrunk trajectories, we are able to produce meaningful semantic regions on a difficult dataset like GCS as compared to [43].
IvB Complexity Analysis
As discussed earlier, TIGM clustering complexity is . Even with DSM, the algorithm complexity does not change. Our algorithm takes one observation at a time and inference is applied to get clusters. At the time of unassigning the older observations from the cluster, exactly one more revisit is required on any observation. Each of the observations is checked against each of the clusters, exactly once. If resampling is involved, it is done at most twice. Thus, if there are trajectories and clusters, worst case complexity of the clustering is . Under normal circumstances, the value will be much smaller than , hence the complexity can be approximated by .
IvC Limitations
For the simplicity of modeling, we have assumed a single parameter for the entire clustering. However, it may happen that, different clusters can have distributions of varying spread. In such cases, needs to be learned to avoid observations going to wrong clusters, even though they are spatially apart. This can happen to observations belonging to less frequent patterns with their means closer to a cluster of higher spread. Our method cannot be used in situations where trajectory patterns are random in nature, as accuracy may be affected due to the approximation of points to start and end positions along with duration. The proposed method is suitable for applications to monitor road traffic or crowd traffic that are supposed to follow patterns. By incorporating more dimensions like curvature, slope or direction, better accuracy can be achieved. The proposed model does not restrict to use other suitable distance measures. If noisy trajectories such as truncated tracks are present, the clusters formed may be the true representation of the scene as this method relies on the accuracy of tracking algorithms. Such noisy trajectories typically form new clusters as their characteristics (or features) are different from the normal onces. With the advances in machine learning, better tracking algorithms are available, thus such situations can be avoided.
V Conclusion
The paper introduces a trajectory model with the help of DPMM through an intuitive perspective by using a distance measure and temporal correlation of the observations. The model is temporally extended to consider the dynamic behavior of a given scene to reflect the traffic states. An incremental approach has been used to build clusters without a prior and indicative number of clusters as required in unsupervised learning methods. The model has been validated on a wide range of video datasets. The proposed model is able to cluster trajectories meaningfully with high accuracy taking lesser computation time. Our model can be applied to videos for building realtime traffic analysis framework since it can learn and refine frequently occurring patterns in an unsupervised and nonparametric way. Though the primary focus of building the proposed model is for visual surveillance applications, it can be used in applications requiring representation of changing scene dynamics involving observations of temporal dependencies.
We foresee room for improvement at various levels. Firstly, we need to learn likelihood to cater distributions of different spreads. Secondly, Mahanolabis distance can be used for better approximation of elliptical distributions. Lastly, application for realtime monitoring of road traffic can be built.
References
 [1] S. Ali and M. Shah. A lagrangian particle dynamics approach for crowd flow segmentation and stability analysis. In CVPR, 2007.
 [2] N. Anjum and A. Cavallaro. Multifeature object trajectory clustering for video analysis. IEEE Transactions on Circuits and Systems for Video Technology, 18(11):1555–1564, Nov 2008.

[3]
V. Bastani, L. Marcenaro, and C. Regazzoni.
Unsupervised trajectory pattern classification using hierarchical dirichlet process mixture hidden markov model.
In MLSP, 2014.  [4] V. Bastani, L. Marcenaro, and C. Regazzoni. A particle filter based sequential trajectory classifier for behavior analysis in video surveillance. In ICIP, 2015.
 [5] V. Bastani, L. Marcenaro, and C. S. Regazzoni. Online nonparametric bayesian activity mining and analysis from surveillance video. IEEE Transactions on Image Processing, 25(5):2089–2102, May 2016.
 [6] D. J. Berndt and J. Clifford. Using dynamic time warping to find patterns in time series. In KDD, 1994.
 [7] B. Cancela, A. Iglesias, and M. G. Ortega, M.and Penedo. Unsupervised trajectory modelling using temporal information via minimal paths. In CVPR, 2014.

[8]
F. Castaldo, F. A. N. Palmieri, V. Bastani, L. Marcenaro, and C. Regazzoni.
Abnormal vessel behavior detection in port areas based on dynamic bayesian networks.
In ICIF, 2014.  [9] D. Comaniciu and P. Meer. Distribution free decomposition of multivariate data. Pattern Anal. Appl., 2(1):22–30, April 1999.
 [10] D. Comaniciu and P. Meer. Mean shift: A robust approach toward feature space analysis. IEEE Transactions on pattern analysis and machine intelligence, 24(5):603–619, 2002.
 [11] T. B. Dinh, N. Vo, and G. Medioni. Context tracker: Exploring supporters and distracters in unconstrained environments. In CVPR, 2011.
 [12] A. Dore and C. Regazzoni. Interaction analysis with a bayesian trajectory model. IEEE Intelligent Systems, 25(3):32–40, May 2010.
 [13] R. Emonet, J. Varadarajan, and J. M. Odobez. Temporal analysis of motif mixtures using dirichlet processes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(1):140–156, Jan 2014.
 [14] M. Ester, J. Kriegel, H. P.and Sander, X. Xu, et al. A densitybased algorithm for discovering clusters in large spatial databases with noise. In KDD, 1996.
 [15] B. J. Frey and D. Dueck. Clustering by passing messages between data points. science, 315(5814):972–976, 2007.
 [16] W. Hu, X. Li, G. Tian, S. Maybank, and Z. Zhang. An incremental dpmmbased method for trajectory clustering, modeling, and retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(5):1051–1065, May 2013.
 [17] H. Ishwaran and M. Zarepour. Exact and approximate sum representations for the dirichlet process. Canadian Journal of Statistics, 30(2):269–283, 2002.
 [18] Z. Izakian, M. S. Mesgari, and A. Abraham. Automated clustering of trajectory data using a particle swarm optimization. Computers, Environment and Urban Systems, 55:55–65, 2016.

[19]
F. Jiang, Y. Wu, and A. K. Katsaggelos.
A dynamic hierarchical clustering method for trajectorybased unusual video event detection.
IEEE Transactions on Image Processing, 18(4):907–913, April 2009.  [20] X. Jin and J. Han. KMeans Clustering, pages 563–564. Springer US, Boston, MA, 2010.
 [21] K. Kim, D. Lee, and I. Essa. Gaussian process regression flow for analysis of motion trajectories. In ICCV, 2011.
 [22] D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques  Adaptive Computation and Machine Learning. The MIT Press, 2009.
 [23] R. Laxhammar and G. Falkman. Online learning and sequential anomaly detection in trajectories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(6):1158–1173, June 2014.
 [24] C. C. Loy, T. Xiang, and S. Gong. From local temporal correlation to global anomaly detection. In ECCV, 2008.

[25]
U. V. Luxburg.
A tutorial on spectral clustering.
Statistics and computing, 17(4):395–416, 2007.  [26] L. Marcenaro, L. Marchesotti, and C.S. Regazzoni. Selforganizing shape description for tracking and classifying multiple interacting objects. Image and Vision Computing, 24(11):1179 – 1191, 2006.

[27]
T. K. Moon.
The expectationmaximization algorithm.
IEEE Signal processing magazine, 13(6):47–60, 1996.  [28] B. T. Morris and M. M. Trivedi. Trajectory learning for activity understanding: Unsupervised, multilevel, and longterm adaptive approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(11):2287–2301, Nov 2011.
 [29] B. T. Morris and M. M. Trivedi. Understanding vehicular traffic behavior from video: a survey of unsupervised approaches. Journal of Electronic Imaging, 22(4):041113–041113, 2013.
 [30] F. Murtagh and P. Legendre. Ward’s hierarchical agglomerative clustering method: Which algorithms implement ward’s criterion? J. Classif., 31(3):274–295, October 2014.
 [31] J. C. Nascimento, M. A. T. Figueiredo, and J. S. Marques. Trajectory classification using switched dynamical hidden markov models. IEEE Transactions on Image Processing, 19(5):1338–1348, May 2010.
 [32] J. C. Nascimento, M. A. T. Figueiredo, and J. S. Marques. Activity recognition using a mixture of vector fields. IEEE Transactions on Image Processing, 22(5):1712–1725, May 2013.
 [33] T. Nawaz, A. Cavallaro, and B. Rinner. Trajectory clustering for motion pattern extraction in aerial videos. In ICIP, 2014.
 [34] R. M. Neal. Markov chain sampling methods for dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9(2):249–265, 2000.
 [35] C. Piciarelli, C. Micheloni, and G. L. Foresti. Trajectorybased anomalous event detection. IEEE Transactions on Circuits and Systems for Video Technology, 18(11):1544–1554, Nov 2008.
 [36] C. E. Rasmussen. The infinite gaussian mixture model. In S. A. Solla, T. K. Leen, and K. Müller, editors, Advances in Neural Information Processing Systems 12, pages 554–560. MIT Press, 2000.
 [37] K. K. Santhosh, D. P. Dogra, and P. P. Roy. Temporal unknown incremental clustering model for analysis of traffic surveillance videos. IEEE Transactions on Intelligent Transportation Systems, pages 1–12, 2018.
 [38] D. Sculley. Webscale kmeans clustering. In WWW, 2010.
 [39] A. A. Sodemann, M. P. Ross, and B. J. Borghetti. A review of anomaly detection in automated surveillance. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6):1257–1272, Nov 2012.
 [40] B. Tian, B. T. Morris, M. Tang, Y. Liu, Y. Yao, C. Gou, D. Shen, and S. Tang. Hierarchical and networked vehicle surveillance in its: A survey. IEEE Transactions on Intelligent Transportation Systems, 18(1):25–48, Jan 2017.
 [41] X. Wang, K. T. Ma, G. W. Ng, and W. E. L. Grimson. Trajectory analysis and semantic region modeling using nonparametric hierarchical bayesian models. International journal of computer vision, 95(3):287–312, 2011.
 [42] J. Weber. Wireshark Layer 23 pcap Analysis w/ Challenges. Onlinehttp://www.netresec.com/, 2017. [Onlinehttps://blog.webernetz.net/2017/03/29/wiresharklayer23pcapanalysiswchallengesccnpswitch/; accessed 25May2017].
 [43] H. Xu, Y. Zhou, W. Lin, and H. Zha. Unsupervised trajectory clustering via adaptive multikernelbased shrinkage. In ICCV, 2015.
 [44] B. Zhou, X. Tang, and X. Wang. Measuring crowd collectiveness. In CVPR, 2013.
 [45] B. Zhou, X. Wang, and X. Tang. Understanding collective crowd behaviors: Learning a mixture model of dynamic pedestrianagents. In CVPR, 2012.
 [46] X. Zhu, Z. Ghahramani, and J. Lafferty. Timesensitive dirichlet process mixture models. Technical report, DTIC Document, 2005.
Comments
There are no comments yet.