Semantic segmentation of trajectories with improved agent models for pedestrian behavior analysis

by   Toru Tamaki, et al.

In this paper, we propose a method for semantic segmentation of pedestrian trajectories based on pedestrian behavior models, or agents. The agents model the dynamics of pedestrian movements in two-dimensional space using a linear dynamics model and common start and goal locations of trajectories. First, agent models are estimated from the trajectories obtained from image sequences. Our method is built on top of the Mixture model of Dynamic pedestrian Agents (MDA); however, the MDA's trajectory modeling and estimation are improved. Then, the trajectories are divided into semantically meaningful segments. The subsegments of a trajectory are modeled by applying a hidden Markov model using the estimated agent models. Experimental results with a real trajectory dataset show the effectiveness of the proposed method as compared to the well-known classical Ramer-Douglas-Peucker algorithm and also to the original MDA model.



There are no comments yet.


page 5

page 9

page 11

page 17

page 19


Semantic segmentation of trajectories with agent models

In many cases, such as trajectories clustering and classification, we of...

MG-GAN: A Multi-Generator Model Preventing Out-of-Distribution Samples in Pedestrian Trajectory Prediction

Pedestrian trajectory prediction is challenging due to its uncertain and...

Building Prior Knowledge: A Markov Based Pedestrian Prediction Model Using Urban Environmental Data

Autonomous Vehicles navigating in urban areas have a need to understand ...

Pedestrian Motion Model Using Non-Parametric Trajectory Clustering and Discrete Transition Points

This paper presents a pedestrian motion model that includes both low lev...

Representation Learning of Pedestrian Trajectories Using Actor-Critic Sequence-to-Sequence Autoencoder

Representation learning of pedestrian trajectories transforms variable-l...

Pedestrian Models based on Rational Behaviour

Following the paradigm set by attraction-repulsion-alignment schemes, a ...

Benchmarking Pedestrian Odometry: The Brown Pedestrian Odometry Dataset (BPOD)

We present the Brown Pedestrian Odometry Dataset (BPOD) for benchmarking...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The analysis of the behavior and trajectories of pedestrians captured by video cameras is an important topic in the computer vision field, and has been widely studied over the decades

[16, 17, 18, 1, 7, 9]. When researchers handle trajectories, they frequently perform segmentation to reduce the computation cost and extract local information. There are three typical approaches [23, 4]:

  • Temporal segmentation: a trajectory is split at the points at which two observed locations are temporally at a distance from each other.

  • Shape-based segmentation: a trajectory is split at points of larger curvature, which indicate that the target may change its direction at that point. This is used for simplifying the shape of trajectories; the Ramer-Douglas-Peucker (RDP) algorithm [14, 3] is a well-known approach of this type.

  • Semantic segmentation: a whole trajectory is divided into semantically meaningful segments; many methods have been proposed for different tasks [19, 6, 21, 22, 20].

In this paper, we focus on the third approach, semantic segmentation of trajectories based on models of human behavior (or agents). We represent the semantics of segments using agent models to capture the direction in which pedestrians walk when entering and exiting a scene. Hence, our task is to segment a trajectory into several different sub-trajectories represented using different agent models. It would be beneficial if segments associated behavior [10, 8] could be obtained; for example, a long-term temporal change of behavior from a sequence of segments may be found, which is not possible using a series of raw trajectory coordinates. Furthermore, collective analysis by using many trajectory segments with associated behavior may be useful to find potential crowded regions in a scene. However, no segmentation methods of trajectories have been proposed for facilitating the task of human behavior analysis. Our proposed method first estimates agent models by using the Mixture model of Dynamic pedestrian Agents (MDA) [24], and then segments trajectories using the learned agent models by applying hidden Markov models (HMM) [2, 15]. 111A conference version of this paper was presented in [11]. This paper extends that version with the extension of the method description, and more extensive evaluations.

2 Related work

The RDP algorithm [14, 3] is frequently used for trajectory simplification. It segments a trajectory while preserving important points in order to retain the trajectory’s shape to the greatest extent possible. First, the start and end points of a trajectory are preserved. Then, RDP finds the point located the greatest distance away from the line between two preserved points and keeps it if the distance is larger than threshold . This process iterates recursively until no further points are preserved. Finally, all the preserved points are used to segment the trajectory. This method is simple and preserves the approximate shape of the trajectory; however, an appropriate value of has to be specified.

Task oriented methods have also been proposed. Yuan et al. [19] proposed a system called T-Finder, which recommends to taxi drivers locations at which as many potential customers as possible exist, and to users locations where they can expect to find taxis. For this purpose, these methods estimate the locations of taxis based on their driving trajectories and segment the trajectories as a pre-processing procedure. Lee et al. [6]

proposed the trajectory outlier detection algorithm (TRAOD), an algorithm for finding outliers in trajectories based on segmentation by using the minimum description length (MDL) principle. Zheng et al. estimated transportation modes

[21, 22, 20], such as walking, car, bus, and bicycle, and used them for semantic segmentation in terms of the mode of transportation.

In contrast, our proposed method uses semantic human behavior models, called agent models, learned from pedestrian trajectories in videos. This task is entirely different from that of the RDP algorithm, in which merely the simplified shape of a trajectory is taken into account and segments have no relation to behavior models. Furthermore the objective of our study differed from that of previous studies on semantic segmentation of trajectories [21, 22, 20], in which a trajectory is composed of different vehicles. Our goal was to perform the segmentation of a person’s trajectory by dividing it into different behavior (agent) models.

3 Learning agent models

In this section, we describe the MDA model [24]

proposed by Zhou et al. and our improved model, called improved MDA (iMDA). The MDA model is a hierarchical Bayesian model for representing pedestrian trajectories by using a mixture model of linear dynamic systems and common start and goal locations (called beliefs). The parameters of the dynamics and beliefs of each agent are estimated by using an expectation maximization (EM) algorithm that alternates the E step (expectation over hidden variables) and M step (parameter estimation). In other words, soft clustering of trajectories to the estimated agents and optimization by using weighted sums are iterated.

It is reasonable to use MDA for the task of semantic segmentation of pedestrian trajectories, because it estimates agent models that reflect pedestrian behaviors: the direction in which the person is walking, his/her speed, and the locations from which and to which the person is moving. These are modeled by agents using dynamics and beliefs. However, MDA was proposed for performing clustering of trajectories, and therefore, we extend it by adding HMM for segmentation. Furthermore, the original MDA suffers a convergence problem, and hence, we propose an improved version of MDA.

3.1 Formulation

Let be two-dimensional coordinates at time of a pedestrian trajectory , and be the corresponding state of the linear dynamic system



represents a normal distribution with covariance matrices

, and is the state transition matrix and

is the translation vector. This means that the state transition is assumed to be a similar transformation. In this study, we explicitly used the translation vector for similar transformations, while Zhou et la.

[24] used homogeneous coordinates for their formulation.

MDA represents pedestrian trajectories modeled by agents with dynamics and belief . Here, dynamics describes a pedestrian’s movement in the two-dimensional scene. Belief describes the starting point and end point of the trajectory, each represented by normal distributions:


that is, belief is represented as , describing the common starting and end locations. The mixture weights are written as with hidden variable , which indicates that the trajectory is generated by the -th agent.

Trajectory observation may not start and end at the exact start and end points and . For example, a pedestrian is visible to the camera and tracked over video frames to generate ; however, the common starting point may be occluded by walls or signboards. Therefore, we model states before and after the observed points of the trajectory:


The length of the observation is , and the length of the states before the observation is and after the observation .

Figure 1 shows a graphical model of MDA. Observed trajectory is generated by states , and all states are governed by a single hidden variable that switches agent models and .

Figure 1: Graphical model of the original MDA. [24]

3.2 Learning

Given trajectories , where is -th observation, MDA [24] estimates agents by maximizing the log likelihood function


where the joint probability is given by


with respect to parameters .

The abovementioned equations used in [24] have many hidden variables, and we simplify the likelihood by writing the hidden variables as as follows.


The EM algorithm is used for estimation by alternating the E and M steps, as is not observed.

3.2.1 E step of the original MDA

The E step of MDA [24] takes the expectation of the log likelihood with respect to the hidden variables :



is computed by using the modified Kalman filter

[12, 24]. Note that should have the subscript , because it differs for different agents ; however, we omit it for simplicity.


are posterior probabilities given as


By assuming independence among hidden variables , , and , we have


By removing and by assuming them to be uniform, we have


where likelihood is also computed by using the modified Kalman filter [12, 24].

3.2.2 E step of improved MDA

The E step of MDA described above suffers a convergence problem in practice, because it does not explicitly take belief into account in . In fact, it is implicitly included in the form of ; however, the modified (or ordinal) Kalman filter does not deal with belief parameters.

We solve the problem by introducing two improvements. First, we separate and from the other states and use them as hidden variables. This is because these starting and end states are in fact hidden states when and are non-zero (as is usual). Belief parameters are affected by these starting and end states only, and therefore, it is necessary to include them as the information of beliefs in the E step. Hereafter, denotes the sequence of states , except and .

Second, we explicitly model and

with Poisson distribution:


Uniform distributions are assumed in the E step of the original MDA; however, this may prevent the iteration from converging, because any number of states before and after observations is allowed with equal possibilities. This means that a trajectory can start from a location very distant from the beginning of the observation, although usually this does not occur. This is illustrated in Figure 2. An example trajectory observation is shown as dots in white and starts from the right bottom of the scene and ends at the left. This trajectory is expected to go out from the left exit because there are no further observations. The uniform distribution assigns the same probabilities to both the cases of the left and right exits; however, the latter case is much less likely to happen. In contrast, the Poisson distribution reasonably assigns a higher probability to the case of the left exit.

Figure 2: States after the observations for the case of the left exit are shown in orange, and the states for the case of the right exit are shown in yellow. The image is from [16].

As in the E step above, we take the expectation with hidden variables, including and :


There is an approximation in the third line, because marginalizing with respect to and is computationally expensive. The effect is, however, negligible, because usually the differences between next states are very small: the difference between and , and the difference between and are smaller than the distances to the locations of start and goal .

We further approximate it by omitting and :


The effect of this omission on is very small because only the first and last states from a long sequence of states exist. We compute (as an approximation, again by omitting and ) by using the modified Kalman filter [12, 24], as in the original E step.

There are two differences between above and that in the original E step. First, the original has all states , but our improved version of , shown above, has (except and ). Second, weights are different. Here, we derive as


Note that we assume independence among , and ; however, and are modeled by Poisson distribution, and hence, and remain. In the last line, we approximate it again by omitting and for computing the modified Kalman filter.

3.2.3 M step of improved MDA

In the M step, we find by solving a system of equations obtained by differentiating with respect to .

In the following formulas of the proposed iMDA, we introduce two improvements. First, we derive the formulas for the parameters of Poisson distribution and . Second, in fact the MDA formulas (particularly for and ) are incorrect and we show the correct formula with its derivation in the Appendix. In the MDA formulation, homogeneous coordinates are used for similar transformation with a 33 matrix, which is not useful for differentiation. In our formulation, we use explicitly translation vector instead of homogeneous coordinates, which leads to


Here, is the vectorization operator and

is the tensor product. Notation

means that is fixed to in the summation over . The ranges of summation are from 1 to for trajectory , from 1 to for agent , and from 1 to for time , if observation is involved, and otherwise from to , i.e., between the start and end points of the observations. Note that the search range of and is reduced by an ad hoc technique used in [24].

Figure 3: Graphical model of the proposed method.

4 Trajectory segmentation with agent models

Figure 3 shows the graphical model of the proposed trajectory segmentation. We propose using agent models obtained by iMDA for segmentation with HMM. In contrast to the MDA model (Fig. 1) that shares the hidden variable across all states , our model has different hidden variables indicating for which agent model the state is generated. However, it is difficult to use a single two-dimensional point for inferring the agent to which the point belongs, because agent models represent dynamics and beliefs, which are difficult to infer using a single point. Instead, we use successive states in the MDA model ( in Fig. 3 as is used in the experiments) as a single state corresponding to a hidden variable . Note that in Fig. 3 we collect three successive states without overlapping; however, the following discussion is effective without modification for the overlapping case (e.g., and , and so on).

Our model, shown in Fig. 3, is considerably more complicated than the MDA model and it is difficult to learn all the parameters of HMM and iMDA jointly. Instead, we propose a two-stage algorithm composed of agent estimation followed by segmentation. First, the agent models ( and ) are estimated with the iMDA described in the previous section. We denote these agents by . As a byproduct, states in the MDA model are also obtained through the modified Kalman filter as for each agent . Therefore, we use these estimated states to construct states .

Second, we fix these states during the segmentation procedure; in other words, states are used as observations for HMM. For HMM training, we use the Baum-Welch algorithm [2] to estimate the state transition matrix , an matrix, the element of which is transition probability from agent to agent . Each state is supposed to be generated based on the output (or emission) probability matrix . We define it as


This is a likelihood representing how the sequence fits the dynamics of agent . Here, we do not use the belief parameters, because state corresponds to a short trajectory segment, and it is not stable to find the start and goal locations from a short segment . Note that we do not estimate the initial distribution of agents during the training, but instead use weights estimated in the agent estimation.

To estimate , the Baum-Welch algorithm performs the EM algorithm to maximize the following log likelihood given trajectories.




When has been estimated, we use the Viterbi algorithm [15] for estimating hidden variables for the test trajectories. This is a MAP estimate that maximizes the following posterior probability of , given new test trajectory state :


When a new test trajectory is given, states are obtained by using the modified Kalman filter with the estimated agent models, and then, the HMM observation sequence is constructed so that and so on. After the Viterbi algorithm has been performed, is obtained and then converted to a sequence of the same length as as a segmentation result (), so that and so on.

5 Experiments

We compared the proposed method, denoted by iMDA+HMM, with the RDP algorithm [14, 3] in terms of segmentation accuracy. Trajectories in the Pedestrian Walking Path Dataset [16] were used for the experiments. This dataset contains 12684 pedestrian trajectories in videos of size pixels. We evaluated the methods using real trajectories from the dataset.

Figure 4: The Pedestrian Walking Path Dataset [16]. Left: typical frame. Right: one hundred randomly sampled trajectories. Each trajectory is shown as a series of points with a larger (faint) circle at the first point of the trajectory.

5.1 Metrics

We define two evaluation metrics used in the experiments, positional error and step error, as defined in Algorithm

1, and shown in Figure 5.

Estimated segments should match actual segments, regardless of the agent models, in terms of segmentation accuracy. Therefore, we manually specified ground truth “segmentation points,” where the trajectory is segmented at these points, for each of the training and test trajectories. Then, we converted the segmentation result of trajectory into the detection result of segmentation points; a sequence of Boolean values of the same length as and element is true (i.e., a segmentation point) if ; otherwise, it is false.

We evaluated the detection results of segmentation points spatially and temporally. The positional error counts the difference in L2 norm in two-dimensional space between segmentation points in the ground truth and segmentation results. The step error counts the time step difference (or index of the sequences). Since we do not know which segmentation points correspond to those in other trajectories, we chose the closest segmentation point for computing errors. To prevent trivial results that minimize these errors (for example, all the points are detected as segmentation points), we added errors by switching the estimated and ground truth sequences.

Figure 5: Positional and step errors. Left: trajectory with ground truth segmentation points in red. Right: estimated segmentation points in red. The difference is measured by distance for positional errors and by the index of point sequence of the trajectory for step errors.
Input: input trajectory , result , ground truth
Output: ,
Function CalcError():
       for i do
             if  then
      return ,
Algorithm 1 Calculation of positional and step errors. Elements of and are assumed to be Boolean ( or is true, if is a (ground truth or estimated) segmentation point). and are the numbers of estimated and ground truth segmentation points in a trajectory, respectively.

5.2 Agent estimation with the improved MDA

We propose iMDA because of the convergence problem that the original MDA suffers. Here, we compare the convergence of the EM algorithm using a subset of the Pedestrian Walking Path Dataset [16]. First, we selected 1874 trajectories corresponding to approximately 10 agents, each of which corresponds to a behavior connecting two exits from the scene. Then, we trained the proposed iMDA with agents. Figure 6 shows the estimated 10 agents. Different agents are shown with arrows in different colors; the arrows connect the start and end locations and , which are represented by Gaussian ellipses and . Locations and

were initialized by k-means clustering of the first and last points of the trajectories.

Figure 6: Ten estimated agents estimated by the improved MDA.

To obtain the agent models, trajectories were clustered into agents. This clustering changes over iterations of the EM algorithm, which is shown in Figure 7. From left to right in the figure, we can see that clustering converges after a small number of iterations, while a few trajectories move from one cluster to another. Figure 8 shows the results of the original MDA. Because of its instability, most trajectories go to a single cluster, even when the initialization is the same as that of iMDA. This clearly shows the effectiveness of the proposed method.

The proposed iMDA has two factors in its E-step: Gaussian distributions

and and Poission distributions and , in weights of the log likelihood of . To observe the effect of these two factors on the results, we omitted one or the other of the factors. Figure 9 shows the results without Poission distributions, and Figure 10 the results without Gaussian distributions. In both figures, the clustering results are still unstable, and both factors are necessary for improving the stability of clustering trajectories and estimating agent models.

Figure 7:

Clustering results produced by the improved MDA. Each row shows clusters of trajectories classified into agents. From left to right, the columns show the results at each iteration of the estimation maximization algorithm.

Figure 8: Clustering results produced by the original MDA.
Figure 9: Clustering results produced by the improved MDA without Poission distributions and .
Figure 10: Clustering results produced by the improved MDA without Gaussian distributions and .

5.3 Real data

To evaluate the methods using a real dataset, we manually annotated the 1874 trajectories used in the experiment by specifying points in the trajectories in which the destinations of the trajectories appeared to change. Then, we performed 10-fold cross validation on the 1874 trajectories. For each fold of the 10-fold cross validation, we performed the iMDA model estimation and HMM training with a different number of agents (between 9 and 12) on the training set. The test set was used for segmentation and evaluation. The results in Table 1

(row ”iMDA+HMM”) show the averages and standard deviations for the 10-fold cross validation. For the RDP method, we also performed 10-fold cross validation. For each fold, the best parameter

(in terms of positional or step errors) was estimated for the training set, and the estimated parameter was used for the test set. The results in Table 1 (row ”RDP”) show the averages and standard deviations for the 10-fold cross validation. The best selected using the cross validation is shown in the second column.

The errors of the proposed method and RDP are comparable: Both the proposed method and RDP have positional errors of approximately 20 pixels. However, RDP does not provide any semantic information for the segmentation. In contrast, the proposed method can divide trajectories into semantically meaningful segments by using the associated agent models, which facilitates the understanding of pedestrians’ behavior in a real-world scene. The row ”MDA+HMM” of Table 1 shows the results when the original MDA is used for agent model estimation. This shows that the proposed improved MDA model performs better at segmentation. Furthermore, the results show that iMDA+HMM consistently outperforms MDA+HMM.

Figure 11 shows four segmentation results. In the first result (a–c), the agents of the pedestrian are correctly visualized: the pedestrian started from the top-left entrance, first moved downward, and then turned toward the exit at the right side. Similar results were obtained in the second (d–f) and third (g–i) trajectories.

Figure 11 (j–n) shows the limitation of our approach. The downward trajectory started from the right-top entrance and turned its direction toward the right-top, and then turned downward again. We make two observations. First, at the turning points, the agent from the right-top to the left side (shown in red in Figure 11 (k, m)) was estimated. This is due to the small step size of the trajectory movement, which means the pedestrian may go nowhere, and an agent may be almost randomly assigned, because no agent can describe the behavior. Procedures for rejecting such cases are needed. Second, when the pedestrian turned to the exit at the right side, the agent from the bottom-left to top-right (shown in orange in Figure 11 (l)) was incorrectly estimated. Our HMM model uses agent dynamics only and ignores belief parameters (start and end location, or the direction of agent arrow in the figure) for computing the output probability matrix . Therefore, the selection of agents with similar dynamics suffers confusion. The incorporation of beliefs is left as future work.

Method No. of Agent Positional error Step error
MDA+HMM 9 42.93 4.70 1.70 0.22
10 34.97 6.71 1.40 0.29
11 43.49 8.74 1.63 0.27
12 39.41 6.27 1.52 0.26
iMDA+HMM (proposed) 9 22.32 2.12 0.91 0.13
10 22.32 2.51 0.92 0.13
11 22.88 2.31 0.94 0.12
12 22.38 2.83 0.91 0.13
RDP (best positional error) 21.93 3.23
RDP (best step error) 0.95 0.14
Table 1: Experimental results for real data. Positional errors are in pixels.
Figure 11: Examples of segmentation results. The arrow indicates the agent of the current position of the trajectory. There are four trajectories: (a–c), (d–f), (g–i), and (j–n). The points of the trajectories are shown in the colors of the corresponding agent models represented by arrows.

5.4 Behavior analysis

In the previous section, we provided a quantitative performance analysis and showed the segmentation results. In this section, we provide a qualitative behavior analysis to demonstrate the effectiveness of the proposed semantic segmentation method.

5.4.1 Transition between agents

Table 2 shows the transition probability matrix estimated by the HMM training with 1874 trajectories. Diagonal elements represent the probability that successive points in a trajectory have the same agent model; it is reasonable that the probability is very close to 1. To visualize the transition between agents more clearly, we define a normalized transition probability matrix :


where normalizes the rows of a given matrix. The normalized (shown in Table 3) represents transitions to other agents (not including itself), and therefore helps us to understand the relations between agents. Figure 12 visualizes only with transitions having probabilities larger than 0.2. This shows that agents tend to transit to each other if they share start or end locations (similar beliefs) or if their direction is similar (dynamics).

We can make two observations from this figure. First, agents are switched if they share the same start or goal point, which is reasonable. Second, interestingly, some of these transitions are asymmetric; for instance, agent 1 to 0, 2 to 3, and 6 to 8. Agent 0 represents the top-left entrance and the escalator on the right-side, and agent 1 represents the top-left entrance and the exit on the left-bottom. Both agents 0 and 1 share the same starting point; however, the dominant transition is from 1 to 0. Although some pedestrians who are following the flow to the bottom-left exit at first may turn to the escalator (agent 1 to 0), the opposite rarely happens (agent 0 to 1). This observation might be useful, for example, for suggesting the placement of additional signboards to guide people from the top-left entrance to the right escalator more effectively.

0 1 2 3 4 5 6 7 8 9
0 0.972 0.002 0. 0.006 0. 0. 0.002 0.019 0. 0.
1 0.04 0.937 0.001 0.001 0. 0. 0.015 0.001 0.002 0.003
2 0. 0. 0.974 0.008 0.01 0. 0.002 0. 0.003 0.003
3 0.006 0.005 0.006 0.953 0.002 0. 0.003 0.019 0.004 0.001
4 0. 0.002 0.01 0.002 0.979 0.001 0. 0. 0. 0.005
5 0. 0.001 0. 0.001 0.002 0.931 0. 0. 0.022 0.043
6 0.004 0.014 0. 0.001 0. 0. 0.968 0. 0.01 0.002
7 0.058 0. 0.001 0.029 0.001 0. 0. 0.906 0.004 0.001
8 0.001 0.008 0.005 0.002 0. 0.024 0.014 0.004 0.929 0.013
9 0.009 0.014 0.012 0.01 0.032 0.051 0.021 0.005 0.007 0.838
Table 2: Transition matrix obtained from 1874 trajectories. Row represents probabilities of transition from agent to agent . The order of the agents is the same as in Figure 7.
0 1 2 3 4 5 6 7 8 9
0 0. 0.059 0.001 0.196 0. 0. 0.073 0.657 0.002 0.012
1 0.639 0. 0.011 0.009 0.006 0. 0.24 0.015 0.033 0.047
2 0.001 0.009 0. 0.289 0.371 0.005 0.082 0.012 0.126 0.105
3 0.132 0.11 0.122 0. 0.041 0. 0.069 0.403 0.095 0.028
4 0.007 0.076 0.493 0.087 0. 0.07 0.02 0.003 0. 0.245
5 0. 0.018 0.006 0.008 0.034 0. 0. 0. 0.318 0.615
6 0.129 0.42 0.011 0.043 0.008 0.001 0. 0.012 0.301 0.075
7 0.615 0.003 0.01 0.31 0.011 0. 0. 0. 0.041 0.009
8 0.015 0.118 0.075 0.023 0.003 0.335 0.194 0.06 0. 0.176
9 0.058 0.083 0.075 0.064 0.2 0.312 0.129 0.033 0.045 0.
Table 3: Normalized transition matrix obtained from 1874 trajectories
Figure 12: Transition between agent models. The arrows between rectangles represent the normalized transition probabilities. The agent numbers shown near arrows are the same as in Figure 7.

5.4.2 Agent occurrence map

Figure 13 shows an agent occurrence map. In this experiment, we used all 12684 trajectories in the dataset, applied iMDA with 20 agents, performed HMM training, and then segmented all the trajectories. Then, we counted the number of agents that appear in each of 1010 blocks of the scene of size 19201080. For example, a block has a count of 1 if all trajectory segments passing through the block are assigned to the same agent. In the figure, a block is shown in red if the trajectory segments of many different agents pass through it.

There are mainly three areas with higher counts; the ticket counter just below the top-left entrance, the information booth at the center, and the right exit just above the right-side escalator. This means that many agents appear in these areas. In other words, there are many people in these areas that are coming from and going to different locations. Therefore, these areas may be crowded, and the flow of pedestrians may not be smooth, as is the case for the right exit. The high number of agents in the area in front of the ticket counter may represent queues because people move slowly in many directions when standing in a queue. Furthermore, the left and right sides of the information booth are not symmetric, which might suggest an imbalance of activity on the left and right sides of the booth.

Figure 13: Agent occurrence map showing the number of agents that appear in each of the 1010 blocks of the scene. The colors indicate that many agents pass through the red blocks, and fewer agents pass through the blue blocks.

5.4.3 Agent density maps of trajectory segments

Here, we show how agents correspond to segments in the scene. We used the segments of the 12684 trajectories obtained during the previous examination of the agent occurrence map. Extracting segments corresponding to a specific agent may explain the behavior of the agent in terms of segments of trajectories, instead of the entire trajectories, as shown in Figure 7

. We plotted the density maps of points of segments using kernel density estimation (KDE) because KDE is more effective for visually understanding distributions of segments than for plotting a large number of points of segments.

Figure 14 shows the KDE density maps of segments corresponding to three agents. In Figure 14(a) segments gather the ticket counter at the left, which means that the corresponding agent is assigned to short segments at the front of the counter. Figure 14(b) shows many longer segments from the top-left entrance to the right exit, along with some shorter segments from the top-right entrance. Therefore, this agent represents the dominant pedestrian flow from the top-left entrance. In contrast, Figure 14(c) shows that the agent corresponds to pedestrian flows from three different directions into the top-left exit (entrance). This analysis is made possible using our proposed method for semantic segmentation. The original MDA provides clustering of trajectories only; therefore, the estimated agents are used for analyzing entire trajectories, and not segments. RDP performs segmentation without semantics; thus segments cannot be classified. On the other hand, the proposed method divides trajectories into segments and classifies segments using the estimated agents, which enables us to perform a behavior analysis by using trajectory segments.

Figure 14: Agent density maps of trajectory segments. In each plot, we plotted the density maps of points of the segments corresponding to different agents using KDE.

6 Conclusions

In this paper, we proposed a semantic trajectory segmentation method in which MDA and HMM are combined to estimate agent models and segment trajectories according to the learned agents. Experimental results using a dataset of real trajectories showed that the proposed method performs comparably with RDP, with only a small difference in performance. Using our improved MDA in the proposed method greatly improves the performance compared to that of the original MDA. Additionally, examples of the type of behavior analysis that is made possible using the semantic segmentation results were also provided.


This work was supported by JSPS KAKENHI grant number JP16H06540.


  • [1] A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei, and S. Savarese (2016-06) Social lstm: human trajectory prediction in crowded spaces. In

    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    Vol. , pp. 961–971. External Links: Document, ISSN Cited by: §1.
  • [2] L. E. Baum (1972) An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. In Inequalities III: Proceedings of the Third Symposium on Inequalities, O. Shisha (Ed.), University of California, Los Angeles, pp. 1–8. Cited by: §1, §4.
  • [3] D. H. Douglas and T. K. Peucker (1973) Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica: The International Journal for Geographic Information and Geovisualization 10 (2), pp. 112–122. Cited by: 2nd item, §2, §5.
  • [4] Z. Feng and Y. Zhu (2016) A survey on trajectory data mining: techniques and applications. IEEE Access 4 (), pp. 2056–2067. External Links: Document, ISSN Cited by: §1.
  • [5] D. A. Harville (1997) Matrix algebra from a statistician’s perspective. Springer. External Links: ISBN 978-0-387-78356-7 Cited by: Appendix A.
  • [6] J. Lee, J. Han, and X. Li (2008) Trajectory outlier detection: a partition-and-detect framework. In Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on, pp. 140–149. Cited by: 3rd item, §2.
  • [7] N. Lee, W. Choi, P. Vernaza, C. B. Choy, P. H. S. Torr, and M. Chandraker (2017) DESIRE: distant future prediction in dynamic scenes with interacting agents. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2165–2174. External Links: Link, Document Cited by: §1.
  • [8] T. Li, H. Chang, M. Wang, B. Ni, R. Hong, and S. Yan (2015-03) Crowded scene analysis: a survey. IEEE Transactions on Circuits and Systems for Video Technology 25 (3), pp. 367–386. External Links: Document, ISSN 1051-8215 Cited by: §1.
  • [9] B. T. Morris and M. M. Trivedi (2008-08) A survey of vision-based trajectory learning and analysis for surveillance. IEEE Transactions on Circuits and Systems for Video Technology 18 (8), pp. 1114–1127. External Links: Document, ISSN 1051-8215 Cited by: §1.
  • [10] B. T. Morris and M. M. Trivedi (2011-11) Trajectory learning for activity understanding: unsupervised, multilevel, and long-term adaptive approach. IEEE Transactions on Pattern Analysis and Machine Intelligence 33 (11), pp. 2287–2301. External Links: Document, ISSN 0162-8828 Cited by: §1.
  • [11] D. Ogawa, T. Tamaki, B. Raytchev, and K. Kaneda (2018) Semantic segmentation of trajectories with agent models. In The International Workshop on Frontiers of Computer Vision (FCV2018), External Links: Link Cited by: footnote 1.
  • [12] W. Palma (2007) Long-memory time series: theory and methods. Vol. 662, John Wiley & Sons. Cited by: §3.2.1, §3.2.1, §3.2.2.
  • [13] K. B. Petersen and M. S. Pedersen (2012) Matrix cookbook. Cited by: Appendix A.
  • [14] U. Ramer (1972) An iterative procedure for the polygonal approximation of plane curves. Computer graphics and image processing 1 (3), pp. 244–256. Cited by: 2nd item, §2, §5.
  • [15] A. Viterbi (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE transactions on Information Theory 13 (2), pp. 260–269. Cited by: §1, §4.
  • [16] S. Yi, H. Li, and X. Wang (2015) Understanding pedestrian behaviors from stationary crowd groups. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3488–3496. Cited by: §1, Figure 2, Figure 4, §5.2, §5.
  • [17] S. Yi, H. Li, and X. Wang (2016) Pedestrian behavior modeling from stationary crowds with applications to intelligent surveillance. IEEE Transactions on Image Processing 25 (9), pp. 4354–4368. Cited by: §1.
  • [18] S. Yi, H. Li, and X. Wang (2016)

    Pedestrian behavior understanding and prediction with deep neural networks

    In Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling (Eds.), Cham, pp. 263–279. External Links: ISBN 978-3-319-46448-0 Cited by: §1.
  • [19] N. J. Yuan, Y. Zheng, L. Zhang, and X. Xie (2013) T-finder: a recommender system for finding passengers and vacant taxis. IEEE Transactions on Knowledge and Data Engineering 25 (10), pp. 2390–2403. Cited by: 3rd item, §2.
  • [20] Y. Zheng, Y. Chen, Q. Li, X. Xie, and W. Ma (2010) Understanding transportation modes based on gps data for web applications. ACM Transactions on the Web (TWEB) 4 (1), pp. 1. Cited by: 3rd item, §2, §2.
  • [21] Y. Zheng, Q. Li, Y. Chen, X. Xie, and W. Ma (2008) Understanding mobility based on gps data. In Proceedings of the 10th international conference on Ubiquitous computing, pp. 312–321. Cited by: 3rd item, §2, §2.
  • [22] Y. Zheng, L. Liu, L. Wang, and X. Xie (2008) Learning transportation mode from raw gps data for geographic applications on the web. In Proceedings of the 17th international conference on World Wide Web, pp. 247–256. Cited by: 3rd item, §2, §2.
  • [23] Y. Zheng (2015) Trajectory data mining: an overview. ACM Transactions on Intelligent Systems and Technology (TIST) 6 (3), pp. 29. Cited by: §1.
  • [24] B. Zhou, X. Tang, and X. Wang (2015) Learning collective crowd behaviors with dynamic pedestrian-agents. International Journal of Computer Vision 111 (1), pp. 50–68. Cited by: §1, Figure 1, §3.1, §3.2.1, §3.2.1, §3.2.2, §3.2.3, §3.2, §3.2, §3.

Appendix A and

By using the following formulas [13]


we have (note that we omit subscript from and for simplicity),




Therefore, we have the following system of equations.


By introducing the vectorization operator, we can rewrite it as