Online Pedestrian Group Walking Event Detection Using Spectral Analysis of Motion Similarity Graph

by   Vahid Bastani, et al.
Università di Genova

A method for online identification of group of moving objects in the video is proposed in this paper. This method at each frame identifies group of tracked objects with similar local instantaneous motion pattern using spectral clustering on motion similarity graph. Then, the output of the algorithm is used to detect the event of more than two object moving together as required by PETS2015 challenge. The performance of the algorithm is evaluated on the PETS2015 dataset.



There are no comments yet.


page 5


Moving Object Detection for Event-based vision using Graph Spectral Clustering

Moving object detection has been a central topic of discussion in comput...

Towards Object Detection from Motion

We present a novel approach to weakly supervised object detection. Inste...

Compressive Spectral Clustering

Spectral clustering has become a popular technique due to its high perfo...

Fundamental Matrices from Moving Objects Using Line Motion Barcodes

Computing the epipolar geometry between cameras with very different view...

Refining Similarity Matrices to Cluster Attributed Networks Accurately

As a result of the recent popularity of social networks and the increase...

Smart Motion Detection System using Raspberry Pi

This paper throws light on the security issues that modern day homes and...

Detecting Biological Locomotion in Video: A Computational Approach

Animals locomote for various reasons: to search for food, find suitable ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In the video surveillance applications, video analysis and scene understanding usually involve object detection, tracking and behavior recognition


. A particularly important task in these applications is crowd analysis, which has been a field of great interest in computer vision and cognitive science. The crowd phenomenon has been identified as a topic of great interest in a large number of applications such as crowd management, public space design, visual surveillance and intelligent environments


Detection of events has gained lots of attention in video surveillance domain, it is the case of research related to the identification of lying pose recognition [14, 12], detection of people running [4, 18], crowd safety [Könnecke2014501, 15], etc. In videos that include civil interactions, modeling the social behaviors of people plays an important role in describing the individual and group behaviors on crowded scenes [17].

This work focuses on the understanding of events in video sequences where there are more than one person involved and groups of people walking together are conformed. Group event detection is an important application in automatic video surveillance for understanding situations that involve bunches of people doing a common activity. Identification of pedestrian groups is a key topic in crowd monitoring for the automatic recognition of anomalous behaviors that can threaten the safety of a place. For defining pedestrian groups, it is necessary to define the notion of interaction among people.

The work described in this paper studies the clustering of pedestrians into groups base on their local motion features. The output can be used for detection of events when there are pedestrian groups of more than two people in the scene. Bounding boxes represent the observation of pedestrians on each frame. The position and dimensions of bounding boxes and the speed of change of these features is tracked using Kalman filters for each object. The filtered state density of Kalman filter is used to form motion similarity graph that represent at each frame how close are the motions of pairs of pedestrians. Then, using spectral clustering techniques the group of people are identified as connected components of the graph. The proposed algorithm is tested on six videos from a dataset provided by PETS 2015 and compared with ground-truth.

2 Proposed Method

The task of detecting objects moving together in the video can be viewed as determining whether they move in a same local trajectory pattern. A simple yet effective way for modeling trajectories is through flow functions [10, 9, 2, 6]. In these approaches a trajectory pattern is characterized by a flow function that basically shows the speed field at each point in the environment for that pattern. Given a set of flow function corresponding to a trajectory classes, it is possible to determine the trajectory pattern class of moving objects. However, in group walking event detection task there is no flow function available beforehand. In this case the problem is to understand if the motion of two contemporary objects belong to a same flow function or not.

In this section a method is introduced for online clustering of moving pedestrians based on their motion pattern. the clustering output then is used to detect the event where more than two people walking together. To this end, first it is introduced how space-dependent flow can be estimated online for each moving object which is used for measuring the similarity of instantaneous motion pattern between objects. Then a graph will be made to represent the situation at each frame, where nodes are objects and edges are weighted by the similarity rate of motion patterns between two pair of objects. Finally, spectral graph analysis techniques are used for clustering object into groups with similar motion pattern.

2.1 Object Detection and Tracking

The input to the proposed algorithm at each frame is a set of bounding boxes of detected pedestrians in the scene. It is also assumed that each detection is identified using appropriate data association technique. Although this is a strong assumption, recent results on pedestrian detection [3] and multi-object tracking [1] have shown its feasibility. Let be the set of observations extracted from frame , where

is the observation vector of object

consisting of coordinates and dimension of its bounding box and is the number of objects present in frame . The flow of coordinates and dimension of bonding boxes have also great importance when we want to analyze the motion of an object. Therefore, it is reasonable to estimate coordinates and dimension together with their flow using the sequence of observations. Kalman filter can be incorporated for this task where the state of the filter is defined as consisting of coordinates and dimension of object bounding box and respective flows.

Kalman filter at each frame provides the posterior filtered state density as a Gaussian distribution


where and are the filtered state and its covariance matrix up to time . Note that the state of Kalman filter in this case consists of position and flow of all four components. Each instance of the state vector is one sample from trajectory pattern flow function. For this reason we use the posterior state of Kalman filter to compute the similarity of the motion of objects at each frame.

2.2 Motion Pattern Similarity Measure

it is possible to use the euclidean distance between the estimated state vector of each object to measure their similarity. However, since the output of Kalman filter has the form of probability distribution, it is more effective to use distance metrics specific for probability distribution. Kullback-Leibler (KL) divergence is a measure of difference between probability distributions denoted by

. It is available in closed form for Gaussian distributions [5]. However, KL is not a symmetric measure since . Thus, here the symmetrized version of KL divergence is used to measure differences of two state distribution of moving objects:


is a positive value that increases as the difference between and becomes larger. A normalized similarity between two state vector distribution then is calculated as


is a scaling factor whose motivation is to compensate the effect of distance of objects from camera. Note that as objects move away from camera their speed and respective distance get lower in the image plane. Thus a global measure of similarity should be equalized as far as possible in order to give the same score for the same situation whether it is close or far from camera. The scaling factor is defined as


Which is a linear function of the mean of the square root of the area of two objects. Here the area of the objects is used to understand the relative distance of them from camera. and both are positive so that the scaling factor increases as the objects become close to camera. In the similarity measure (3), the scaling factor compensates the larger relative distance of objects when they are close to camera.

is one when and are completely similar and monotonically decreases toward zero as they become more different. The pairwise similarity scores then are used to form similarity graph such as the one shown in the Fig.1, which is an undirected graph whose nodes represent moving objects and edges are weighted according their motion similarity. The graph later will be used to understand relations between objects and detection of the event when more than two people walking together.

Figure 1: Example of similarity graph for 5 moving objects.

2.3 Group Walking Event Detection

The spectral clustering algorithm [8] is a simple yet effective algorithm for clustering data sets that can be represented using similarity graphs. It often outperforms many conventional algorithms. In the proposed algorithm we use spectral clustering on the generated motion similarity graph in order to find groups of objects moving in a same way. The adjacency matrix for a graph such as Fig.1 is the matrix [13]. The main quantity for spectral clustering is graph Laplacian matrix which is defined as


where is the graph degree matrix defined as a diagonal matrix whose th diagonal element is the th node degree . The Laplacian matrix can be used to find connected components of the graph. In the application of this paper the connected components of the graph represent groups of people walking together.


be the eigenvalues of Laplacian matrix


be corresponding eigenvectors, e.g.

for . If the number of connected components of the graph is known, the connected components and their corresponding nodes can be found using spectral clustering algorithm [8] as follows:

  • Let be the matrix whose columns are the first eigenvectors .

  • For , let be the vector corresponding to th row of .

  • Cluster vectors into clusters using -means algorithm into clusters .

  • Return cluster indicator variables for .

The event of a group of people walking is then triggered when the number of members in at least one connected component (cluster) is greater than or equal to three.

The number of connected components (the number of groups of people in scene) however is not known in this problem. A particular way to estimate number of connected components is eigngap heuristic

[13]. Here is chosen such that all eigenvalues are small and is relatively large. This is shown in the example of Fig.2 where three first three eigenvalues are close to zero and there is a big gap from third to forth eigenvalues, which shows that there are three connected components in the corresponding graph. Thus, to find using eigngap heuristic the following procedure is done:

  • Calculate for

  • Find such that

The above procedure searches for the first gap in the sequence of eigenvalues . Since the first eigenvalue is always zero [13], the first gap shows the number of connected components in the graph.

Figure 2: Example of Laplacian matrix eigenvalues for a graph with 5 nodes and 3 connected components. From motion similarity graph of frame 47 of sequence N1_ARENA-Gp_ENV_RGB_3.

3 Experimental results

The PETS2015 dataset111 is used to evaluate the proposed algorithm. The data sets consist of three different camera views of two situations P5 and ARENA, in which a group of walking people appears in the scene. In ARENA sequences a group of three people appear in an environment in which there are some other individual pedestrians. This group eventually splits into three separate pedestrians. In P5 sequences, a group of six people are shown which eventually splits into one individual, a group of two and a group of three people while walking.

The detection of pedestrians done manually using Viper-GT tool222 The bounding boxes of objects then are feed sequentially to the algorithm. The id of objects are also passed to algorithm in order to bypass data association problem. For evaluation purpose a ground-truth of group indexes of objects in each frame of video is made by human observer. The ground-truth group (cluster) indicator variables denoted by is compared with the algorithm output . Since the problem is formulated here as a clustering problem, it is possible to use clustering performance measures such as Adjacent Mutual Information (AMI) [11] to evaluate the algorithm. Given the ground-truth of the cluster assignments this metric compares the result of clustering by returning values in range [0 1]. AMI Values close to zero means two assignments are highly independent and AMI values close to one indicates match between two indexes. The AMI score is calculated at each frame by comparing with . The mean AMI value for every frame of each sequence then is reported here.

The measurement noise covariance matrix in Kalman filter is set to , and the process noise covariance matrix is set to

The algorithm is evaluated for different choices of parameters and of (4) to see how its behavior changes with different parameter setting. Parameter is chosen from set and parameter is chosen from set . Fig 3 shows snapshots of the output of the algorithm where objects that are in same group have same color bounding boxes. The quantitative results are depicted in Fig. 4 where for each sequence the value of mean AMI is plotted versus the value of parameter for different value parameter . As can be observed from Fig. 4 the performance depends on the situation, camera view and parameters setting. However, the performances in any case is reasonably good since the value of mean AMI is close to one. The best average AMI score over all sequences is 0.8566, which is achieved for parameter set and .

(d) W1_P5-Gp_TH_3
(e) W1_P5-Gp_VS_1
(f) W1_P5-Gp_VS_3
Figure 3: Snapshots of the output of the algorithm from P5 and ARENA sequences. Same color boxes represent same group.
(d) W1_P5-Gp_TH_3
(e) W1_P5-Gp_VS_1
(f) W1_P5-Gp_VS_3
Figure 4: Mean AMI value versus different choices of parameters (horizontal axis) and for six sequences from P5 and ARENA dataset of PETS2015.

4 Conclusions

In this paper a method for online clustering of walking people is proposed for detecting of the event when more than two people are walking together. The method is based on measuring the similarity of the motion patterns of pairs of moving objects in the scene to form a motion similarity graph. The graph then is used to cluster objects based on spectral clustering algorithm. Experimental results show that the proposed method is able to identify separate walking groups efficiently. This method however is instantaneous and does not take into account the history of objects. An improvement can be achieved by applying probabilistic filtering like Hidden Markov Model (HMM) on the output group indexes to eliminate sporadic joining and splitting of groups.


  • [1] S. Bae and K. Yoon (2014-07) Robust online multiobject tracking with data association and track management. Image Processing, IEEE Transactions on 23 (7), pp. 2820–2833. External Links: Document, ISSN 1057-7149 Cited by: §2.1.
  • [2] V. Bastani, L. Marcenaro, and C. Regazzoni (2015)

    A Particle Filter Based Sequential Trajectory Classifier for Behaviour Analysis In Video Surveillance

    In 2015 IEEE International Conference on Image Processing ICIP, Quebec City. Cited by: §2.
  • [3] P. Dollar, C. Wojek, B. Schiele, and P. Perona (2012-04) Pedestrian detection: an evaluation of the state of the art. Pattern Analysis and Machine Intelligence, IEEE Transactions on 34 (4), pp. 743–761. External Links: Document, ISSN 0162-8828 Cited by: §2.1.
  • [4] S. Fouche, M. Lalonde, and L. Gagnon (2011-06) A system for airport surveillance: detection of people running, abandoned objects, and pointing gestures. In SPIE 8056, Visual Information Processing XX,, pp. 8. External Links: Document Cited by: §1.
  • [5] J.R. Hershey and P.A. Olsen (2007-04)

    Approximating the kullback leibler divergence between gaussian mixture models

    In Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on, Vol. 4, pp. IV–317–IV–320. External Links: Document, ISSN 1520-6149 Cited by: §2.2.
  • [6] K. Kim, D. Lee, and I. Essa (2011-11) Gaussian process regression flow for analysis of motion trajectories. In Computer Vision (ICCV), 2011 IEEE International Conference on, pp. 1164–1171. External Links: Document, ISSN 1550-5499 Cited by: §2.
  • [7] T. Li, H. Chang, M. Wang, B. Ni, R. Hong, and S. Yan (2015-03) Crowded scene analysis: a survey. Circuits and Systems for Video Technology, IEEE Transactions on 25 (3), pp. 367–386. External Links: Document, ISSN 1051-8215 Cited by: §1.
  • [8] J. Malik (2000) Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (8), pp. 888–905. External Links: Document, ISSN 01628828 Cited by: §2.3, §2.3.
  • [9] J.C. Nascimento, M.A.T. Figueiredo, and J.S. Marques (2013-05) Activity recognition using a mixture of vector fields. Image Processing, IEEE Transactions on 22 (5), pp. 1712–1725. External Links: Document, ISSN 1057-7149 Cited by: §2.
  • [10] J. C. Nascimento, M. Figueiredo, and J. S. Marques (2010-05) Trajectory classification using switched dynamical hidden Markov models.. IEEE transactions on image processing : a publication of the IEEE Signal Processing Society 19 (5), pp. 1338–48. External Links: Document, ISSN 1941-0042 Cited by: §2.
  • [11] N. X. Vinh, J. Epps, and J. Bailey (2010-03) Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance.

    The Journal of Machine Learning Research

    11, pp. 2837–2854.
    External Links: ISSN 1532-4435 Cited by: §3.
  • [12] M. Volkhardt, F. Schneemann, and H.-M. Gross (2013-10) Fallen person detection for mobile robots using 3d depth data. In Systems, Man, and Cybernetics (SMC), 2013 IEEE International Conference on, pp. 3573–3578. External Links: Document Cited by: §1.
  • [13] U. von Luxburg (2007) A tutorial on spectral clustering. Statistics and Computing 17 (4), pp. 395–416 (English). External Links: ISSN 0960-3174, Document, Link Cited by: §2.3, §2.3.
  • [14] S. Wang, S. Zabir, and B. Leibe (2011-06) Lying pose recognition for elderly fall detection. In Proceedings of Robotics: Science and Systems, Los Angeles, CA, USA. Cited by: §1.
  • [15] H. Yin, D. Li, and X. Zheng (2014) An energy based method to measure the crowd safety. Transportation Research Procedia 2 (0), pp. 691 – 696. Note: The Conference on Pedestrian and Evacuation Dynamics 2014 (PED 2014), 22-24 October 2014, Delft, The Netherlands External Links: ISSN 2352-1465, Document, Link Cited by: §1.
  • [16] B. Zhan, DorothyN. Monekosso, P. Remagnino, SergioA. Velastin, and L. Xu (2008) Crowd analysis: a survey. Machine Vision and Applications 19 (5-6), pp. 345–357 (English). External Links: ISSN 0932-8092, Document, Link Cited by: §1.
  • [17] Y. Zhang, L. Qin, H. Yao, and Q. Huang (2012-Sept) Abnormal crowd behavior detection based on social attribute-aware force model. In Image Processing (ICIP), 2012 19th IEEE International Conference on, pp. 2689–2692. External Links: Document, ISSN 1522-4880 Cited by: §1.
  • [18] Y. Zhu, Y. Zhu, W. Zhen-Kun, W. Chen, and Q. Huang (2012) Detection and recognition of abnormal running behavior in survillance video. Mathematical Problems in Engineering 2012 (), pp. 14. Note: External Links: Document Cited by: §1.