Real-time Crowd Tracking using Parameter Optimized Mixture of Motion Models

by   Aniket Bera, et al.

We present a novel, real-time algorithm to track the trajectory of each pedestrian in moderately dense crowded scenes. Our formulation is based on an adaptive particle-filtering scheme that uses a combination of various multi-agent heterogeneous pedestrian simulation models. We automatically compute the optimal parameters for each of these different models based on prior tracked data and use the best model as motion prior for our particle-filter based tracking algorithm. We also use our "mixture of motion models" for adaptive particle selection and accelerate the performance of the online tracking algorithm. The motion model parameter estimation is formulated as an optimization problem, and we use an approach that solves this combinatorial optimization problem in a model independent manner and hence scalable to any multi-agent pedestrian motion model. We evaluate the performance of our approach on different crowd video datasets and highlight the improvement in accuracy over homogeneous motion models and a baseline mean-shift based tracker. In practice, our formulation can compute trajectories of tens of pedestrians on a multi-core desktop CPU in in real time and offer higher accuracy as compared to prior real time pedestrian tracking algorithms.



There are no comments yet.


page 2

page 3

page 5

page 13


Leveraging Long-Term Predictions and Online-Learning in Agent-based Multiple Person Tracking

We present a multiple-person tracking algorithm, based on combining part...

Interactive Surveillance Technologies for Dense Crowds

We present an algorithm for realtime anomaly detection in low to medium ...

Fast Simulation of Crowd Collision Avoidance

Real-time large-scale crowd simulations with realistic behavior, are imp...

Computer methods for 3D motion tracking in real-time

This thesis is devoted to marker-less 3D human motion tracking in calibr...

DensePeds: Pedestrian Tracking in Dense Crowds Using Front-RVO and Sparse Features

We present a pedestrian tracking algorithm, DensePeds, that tracks indiv...

Tracking Pedestrian Heads in Dense Crowd

Tracking humans in crowded video sequences is an important constituent o...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The tracking of human crowd motion is becoming increasingly ubiquitous. It is a well-studied problem that has many applications in surveillance, behavior modeling, activity recognition, disaster prevention, and the analysis of crowd phenomena. Despite many recent advances, it is still difficult to accurately track pedestrians in real-world scenarios, especially as the crowd density increases. The problem of tracking pedestrians and objects has been studied in computer vision and image processing for three decades. However, tracking pedestrians in a crowded scene is regarded as a hard problem due to the following reasons: intra-pedestrian occlusion (one pedestrian blocking another), changes in lighting and pedestrian appearance, and the difficulty of modeling human behavior or the intent of each pedestrian.

Figure 1: Our mixture motion model can accurately compute the trajectories in real time. We highlight different motion models (Boids, Helbing’s Social Forces, or RVO) used for the same pedestrian (marked in red) over different frames. We adaptively choose the best-fit model for every pedestrian in the scene. This increases the accuracy by 4%-18% of our adaptive tracking algorithm.

One approach that improves the accuracy of tracking algorithms is the use of realistic crowd motion models. These motion models simulate the current behavior of each pedestrian in the crowd in order to predict the pedestrians’ possible future positions. There has been considerable work on developing crowd motion models for pedestrians in the areas of computer graphics, robotics, computer animation, and pedestrian dynamics. Many approaches have been investigated that suggest different principles to model crowds.Most of these models use kind of parameters to describe the shape and trajectory of each agent. While many approaches have been investigate to model the motion of the agents, there is relatively less effort to estimating model parameters based on available data, evaluating and comparing the effects of these parameters, and quantifying the improvements that can result from parameter optimization.

Prior realtime or online crowd-tracking algorithms use a single, homogeneous motion model. Every motion model is unique and generally relies upon one or more assumptions: these include the assumption of highly coherent motion in terms of velocity or acceleration, or assumptions about how pedestrian trajectories will change in response to other agents or obstacles.

The simpler motion models assume that agents will ignore any interactions with other pedestrians, instead assuming that they will follow “constant-speed” or “constant-acceleration” paths to their immediate destinations. However, the accuracy of this assumption decreases as crowd density in the environment increases (e.g. to 2-4 pedestrians per square meter). More sophisticated pedestrian motion models take into account interactions between pedestrians, formulated either in terms of attraction or repulsion forces or collision-avoidance constraints.

In real-world scenarios, the trajectory of each pedestrian is governed by its intermediate goal location, intrinsic behaviors, as well as local interactions with other pedestrians and obstacles in the scene. In a dense crowd setting, the behavior of each pedestrian changes in response to the environment, the overall crowd density and flow, and the behavior of other pedestrians. It may not be possible, therefore, to model the overall behavior of each pedestrian with a single, homogeneous motion model. Furthermore, each of these homogeneous models is described using some parameters that may correspond to the size, speed, anticipation period, or local navigation constraints of each pedestrian. The accuracy of each motion model is governed by the choice of these parameters. As the behavior of each pedestrian responds to changes in a dynamic environment, these model parameters should be recomputed or updated to improve the resulting motion model’s accuracy. Overall, we need efficient techniques that can take into account heterogeneous behaviors based on constantly changing models and underlying parameters.

Figure 2: (a) Our tracking on the student003 dataset student003 . (b) Comparing ground truth (in red) with with prediction by Social Forces by Helbing et al. helbing1995social (in blue). (c) Comparing ground truth (in red) with with prediction by our motion model mixture (in blue). The distance between the red and blue points denote the error in prediction. We can see that the error with our approach is considerably lower.

Main results: We present a method that uses particle filters to perform realtime pedestrian tracking in moderately crowded scenes. Our formulation computes the best-fit or mixture motion model for each pedestrian based on prior tracked data. In order to characterize the heterogeneous, dynamic behavior of each agent, we use an optimization based scheme to perform the following steps:

  • Choose, every few frames, the new motion model that best describes the local behavior of each pedestrian based on tracked data.

  • Compute the optimal set of parameters for that motion model that best fit this tracked data.

  • Computing the adaptive number of particles for each pedestrian based on a combination of metrics for optimizing performance.

We compute the locally-optimal motion model for the pedestrians in realtime and use that, along with a particle-filter based tracker, to compute their trajectories. In our approach, we consider a variety of possible motion models to characterize pedestrian motion during each frame: Boids Reynolds1987 , Social Forces helbing1995social or reciprocal velocity obstacles van2011reciprocal as possible models to characterize the motion of a pedestrian during each frame. For videos with high fps(over  50 fps), a constant velocity model may be sufficient to model the motion prior. Furthermore, we use our heterogeneous motion model to adaptively choose the number of particles for each pedestrian in our particle-filter. This adaptive formulation can increase the runtime speed based of our system based on a reliability measure computed using mixture motion model. We evaluate our method in comparison with homogeneous motion models on high definition crowd datasets that include both indoor and outdoor scenes recorded at different locations with 30 - 150 pedestrians and also standard datasets used in the pedestrian tracking community.

In practice, our adaptive particle-filter tracker with adaptive motion model is about 4-18% more accurate than prior interactive tracking algorithms that use homogeneous or simple motion models. Moreover, as the crowd density increases, we observe increased improvements in the level of accuracy. Moreover, the adaptive particle selection can increase the runtime frame rate by 2-2.5 times as compared to algorithms that use a constant high number of particles. Overall, our algorithm can track tens of pedestrians at realtime rates (i.e. more than 25fps) on a multi-core CPU.

The rest of the paper is organized as follows. In Section 2, we give an overview of prior work related to online pedestrian tracking. Section 3 gives an overview of our approach, and Section 4 describes our multi-agent heterogeneous motion model. Section 5 evaluates the different components of our algorithm and compares it with other online tracking methods.

2 Related work

In this section, we briefly review some prior work on pedestrian tracking and motion models. Multi-pedestrian tracking has attracted a lot of research attention in recent years. We refer the reader to some excellent surveys wuonline ; enzweiler2009monocular ; yilmaz2006object .

At a broad level, pedestrian tracking algorithms can be classified as either online or offline trackers. Online trackers use only the present or previous frames for realtime tracking. Zhang et al. 

zhang2012real proposed an approach that uses non-adaptive random projections to model the structure of the image feature space of objects, and Tyagi et al. tyagi2008context described a technique to track pedestrians using multiple cameras. Offline trackers, on the other hand, use data from future frames as well as current and past data sharma2012unsupervised ; rodriguez2011density . These methods, however, require future-state information; they are therefore not useful for realtime applications.

In addition to the online/offline classifications, tracking algorithms can also be classified based on their underlying search mechanisms: as either deterministic or probabilistic trackers. Deterministic trackers iteratively attempt to search for the local maxima of a similarity measure between the target candidate (the location of the pedestrian in a frame) and the object model (the initial state of the pedestrian). The most commonly used deterministic trackers are the mean-shift algorithm yilmaz2007object and the Kanade-Lucas-Tomasi algorithm lucas1981iterative

. In probabilistic trackers, the movement of the object is modeled based on its underlying dynamics. Two well-known probabilistic trackers are the Kalman filter and the particle filter. Particle filters are more frequently used than Kalman filters in pedestrian tracking, since particle filters are multi-modal and can represent any shape using a discrete probability distribution.

Motion Models: The problem of modeling crowd behaviors and motions has received significant attention in various disciplines. This attention has resulted in a high number of simulation models based on microscopic or macroscopic principles. Several of the proposed motion models represent each individual or pedestrian in a crowd as particles (or as 2D circles in a plane), then model the interactions between these particles. Reynolds’ Reynolds1987 ; Reynolds1999 seminal approach is representative of such models: local interactions, matching an agent’s speed and orientation to those of its neighbors, determine agents’ motions and lead to emergent behaviors. Many popular algorithms model agents as particles which are subjected to repulsive forces helbing1995social and additional behavior-improving rules. More recently, velocity-based algorithms van2011reciprocal ; Pettre2009 ; Karamouzas2009 have been developed, which model agents’ motions in velocity-space to ensure collision-free trajectories over short future time windows. Other approaches that have recently been developed are based on cognitive models chung2010mobile , affordance Fajen2007 , short-term planning using a discrete approach Antonini2006 or Linear Trajectory Avoidance (LTA) pellegrini2009you . A final recent approach uses the virtual optic flow of agents to derive perceptual variables in order to compute collision-free motions Ondvrej2010 . A few tracking algorithms use the Reciprocal Velocity Obstacle (RVO) model as motion prior Liu2014  bera2014 .

Many non-particle-based motion modeling techniques have also been proposed; these techniques are useful mainly for crowded scenes in which pedestrians display similar motion patterns. Song et al. song2013fully proposed an approach that clusters pedestrian trajectories based on the assumption that “persons only appear/disappear at entry/exit.” Ali et al. ali2008floor presented a floor-field based method to determine the probability of motion in densely crowded scenes. Rodriguez et al. rodriguez2011data used a large collection of public crowd videos and learned about crowd motion patterns by extracting global video features. Kratz et al. kratz2012going and Zhao et al. zhao2012tracking used local motion patterns in dense videos for pedestrian tracking. Shu et al. kratz2012going proposed an approach that learns part-based person-specific SVM classifiers which capture dynamically changing pedestrian appearance. Zamri et al. kratz2012going used generalized minimum clique graphs for multiple-person tracking. Leal-Taixé et al. leal2012exploiting used a social and grouping behavior as a physical model in their tracking system. Burgos-Artizzu et al. burgos2012social presented a novel method for analyzing social behavior, particularly in mice videos, where the continuous videos are segmented into action ‘bouts’ by building a temporal context model.

These methods are well-suited for modeling motion in dense crowds with few distinct motion patterns; however, they may not work in heterogeneous crowds.

3 Our Approach

In this section, we give an overview of our approach. First, offer an overview of our method, which is followed by more detailed explanations of the various components of our realtime tracking algorithm.

3.1 Overview

Figure 3: Overview of our realtime tracking algorithm. The symbols used in this figure are explained in Section 3.2. We use the trajectory computed over prior k frames, expressed as a succession of states, to compute the new motion model; we use our mixture motion model to compute next states using a particle filter.

Our approach can be viewed as a feedback pipeline (Figure 3). We use the most recent states (positions and velocities) for each agent and use them to compute our mixture model. This mixture model is used to predict next state of the pedestrian for the next frame. In other words, the next state is used as motion prior input for the tracker; it is also combined with confidence estimation computation to dynamically compute the number of particles. As a final step, the tracker’s definitively estimated next state is fed back into the loop, becoming the most recent agent state.

Data Representation Our algorithm keeps track of the state (i.e. position and velocity) of each pedestrian for the last k timesteps or frames. These are referred to as the k-states of each pedestrian. These k-states are initialized by pre-computing the states from the first k timesteps. The k-states are updated at each timestep by removing the agents’ state from the oldest frame and adding the latest tracker-estimated state.

The mixture motion model is a combination of several independent motion models that are widely used for pedestrian modeling in crowds: Boids, RVO and Social forces. This mixture motion model is used to compute the best motion model for the agents during each frame. First, based on an optimization algorithm, we “configure” (see Section 4) the motion models to “best” match the recent k-states data and select the best model based on a specific metric. Second, we use the “best configured” motion model to make a prediction on the agents’ next state.

The tracker is a particle-filter based tracker which uses the motion prior, obtained from the mixture of motion models, to estimate the agents’ next state. This tracker further uses a confidence estimation stage to dynamically compute the number of particles that balance the tradeoffs between the computation cost and the accuracy.

3.2 Notation and Terminology

We use the following notations in our paper:

  • represents the state (position and velocity) of an arbitrary pedestrian as computed by the tracker

  • represents the state (position and velocity) of an arbitrary pedestrian inside a crowd motion model

  • represents the “best configured” motion model from the mixture of motion models

  • bold fonts are used to represent values for all the pedestrians in the crowd; for example represents the states (positions and velocities) of all pedestrians as computed by the tracker

  • subscripts are used to indicate time; for example represents the “best configured” motion model at timestep , and represents all states of all agents for all successive timesteps between and , as computed by the tracker.

The “best configured” motion model can then be used as follows: or to compute the motion of one arbitrary pedestrian or all pedestrians, respectively.

3.3 Particle Filter for Tracking

Though any online tracker which requires a motion prior system can be used, we use particle filters as the underlying tracker algorithm. The particle filter is a parametric method which solves non-Gaussian and non-linear state estimation problems arulampalam2002tutorial . Particle filters are frequently used in object tracking, since they can recover from lost tracks and occlusions. The particle tracker’s tracking uncertainty is represented in a Markovian manner by only considering information from present and past frames.

Here, we consider the “best configured” motion model as well as the error in the prediction that this “best configured” motion model generated. Additionally, the observations of our tracker can be represented by a function that projects the state to a previously computed state . Moreover, we denote the error between the observed states and the ground truth as . We can now phrase them formally in terms of a standard particle filter as below:


Particle filtering is a Monte Carlo approximation to the optimal Bayesian filter, which monitors the posterior probability of a first-order Markov process:


where is the process state at time , is the observation, is all of the observations through time , is the process dynamical distribution, is the observation likelihood distribution, and is the normalization factor. Since the integral does not have a closed form solution in most cases, particle filtering approximates the integration using a set of weighted samples , where is an instantiation of the process state, known as a particle, and ’s are the corresponding particle weights. With this representation, the Monte Carlo approximation to the Bayesian filtering equation is:


where refers to the number of particles.

In our formulation, we use the motion model to infer dynamic transition, , for particle filtering.

We optimize our computation speed by adaptively modifying the number of active particles in our system using a combination of confidence metrics. A brief overview is given in Section 4.4.

4 Mixture Motion Model

In this section, we introduce the notion of a parameterized motion model. We then describe the different parameterized motion models that form the basis for the mixture motion model. Finally, we describe the mixture motion model itself.

4.1 Parameterized Motion Model

A motion model is defined as an algorithm which, from a collection of agent states , derives new states for these agents, representing their motion over a timestep towards the agents’ immediate goals :


Motion algorithms usually have several parameters that can be tuned in order to change the agents’ behaviors. We assume that each parameter can have a different value for each pedestrian. By changing the value of these parameters, we get some variation in the resulting trajectory prediction algorithm. We use to denote all the parameters of all the pedestrians. Typically, for a crowd of 50 pedestrians, the dimension of could be anywhere in the range 150-300 depending on the motion model. In our formulation, we denote the resulting parameterized motion model as:


4.2 Motion Models

Our mixture motion model can include any generic motion model that conforms to Equation (6). Here we describe the three component motion models that currently make up the mixture motion model in our current implementation.

4.2.1 Reciprocal Velocity Obstacles

RVO is a local collision-avoidance and navigation algorithm. Given each agent’s state at a certain timestep, it computes a collision-free state for the next timestepvan2011reciprocal . Each agent is represented as a 2D circle in the plane, and the parameters (used for optimization) for each agent consist of the representative circle’s radius, maximum speed, neighbor distance, and time horizon (only future collisions within this time horizon are considered for local interactions).

Let be the preferred velocity for a pedestrian that is based on the immediate goal location. The RVO formulation takes into account the position and velocity of each neighboring pedestrian to compute the new velocity. The velocity of the neighbors is used to formulate the ORCA constraints for local collision avoidance van2011reciprocal . The computation of the new velocity is expressed as an optimization problem for each pedestrian. If an agent’s preferred velocity is forbidden by the ORCA constraints, that agent chooses the closest velocity that lies in the feasible region:


More details and mathematical formulations of the ORCA constraints are given in van2011reciprocal . As per Equation (6), returns the states obtained with the admissible velocity that is closest to the preferred velocity.

4.2.2 The Boids Model

Initially developed to simulate the flocking behavior of birds, this model has later been extended to pedestrian motion in a crowd. Broadly, three rules are enforced on Boids agents:

  • Separation: steer to avoid crowding local agents

  • Alignment: steer towards the average heading of local agents

  • Cohesion: steer to move toward the average position (center of mass) of local agents

Thus, as per Equation (6), is a function of agents’ positions at some specified future time (current time plus constant). When the predicted distance between the pedestrians gets too low, a separation force is computed and added to the attraction force that is pulling the agents toward their goal. The parameters are radius (size of 2D circle agents) and comfort speed (i.e., speed when no interactions occur).

4.2.3 Social Forces Model

The social forces model is defined by the combination of three different forces: the personal motivation force, social forces, and physical constraints:

  • Personal Motivation force (): This is the incentive to move at a certain preferred velocity in a certain direction.

  • Social forces (): These are the repulsive forces from other agents and obstacles.

  • Physical Constraints (): These are the hard constraints other than the environment and other agents.

The net force then defines an agent’s chosen new velocity. For a detailed explanation of the method, refer to helbing1995social .

As per Equation (6), is a function of the agents’ positions from which all computed forces are derived. The parameters are radius and comfort speed.

4.3 Mixture of motion models

Figure 4: Our parameter optimization algorithm used in Figure 3. Based on the error metric, we compute optimal parameters for each motion model. The best motion model (from RVO, Social Forces, Boids or LIN) is used for trajectory extraction and predicting the next state.

We now present the algorithm to compute the mixture motion model, which essentially corresponds to computing the “best” motion model at any given timestep. In this case, the “best” motion model is the one that most accurately matches agents’ immediately past states, as per a given error metric. This “best” motion model is determined by an optimization framework, which automatically finds the parameters that minimize the error metric. Wolinski et al.  Wolinski2014 designed an optimization framework for evaluating crowd motion models but it computes the optimal parameters in an offline manner for a single homogenous simulation model. Our framework is online and iteratively computes the best heterogeneous motion every few frames and chooses the most optimized crowd parameters at a given time. The computation cost is considerably lower and hence useable for real-time tracking.

4.3.1 Formalization

Formally, at any timestep , we define the agents’ (k+1)-states (as computed by the tracker) :


Similarly, a motion model’s corresponding computed agents’ states can be defined as:


initialized with and .

At timestep , considering the agents’ k-states , computed states and a user-defined error metric , our algorithm computes:


where is the parameter set which, at timestep , leads to the closest match between the states computed by the motion algorithm and the agents’ k-states.

For several motion algorithms , we can then compute the algorithm which best matches the agents’ k-states at timestep :


and consequently, the best (as per the error in the metric itself) prediction for the agents’ next state obtainable from the motion algorithms for timestep is:


4.3.2 Optimization Algorithm and Error Metric

Optimizing crowd parameters is a unique and challenging problem. Because most simulation methods have several parameters to tune for each agent, even moderately sized scenarios with a few dozen agents can become a hundred-dimensional optimization problem.

In total we tested three global optimization approaches: Greedy algorithm, Simulated Annealing, and Genetic Algorithm.

For the greedy approach we start by choosing random parameters for every agent. The chosen data similarity metric is then evaluated to establish a baseline measure of how well the simulation matches the data. After several iterations, where in each iteration starts with the best set of simulation parameter seen so far. This new set of parameters is evaluated, whichever set of parameters has the lowest error metric over all the iterations is chosen as the optimal parameters for the agents.

The main limitation with a greedy approach is that it will get stuck in local minimum in search space and also the final outcome depends on the starting point. Simulated Annealing addresses this problem. Analogous with thermodynamics, simulated annealing incorporates a ‘temperature’ parameter into the minimization procedure. At high temperatures, we explore the parameter space whereas at lower temperature, we restrict the exploration.

  // initialize loop counter
while  do
         // compute temperature
         // try new neighbor
         // compute cost
       if  then // is new state better?
               // yes, change state
       end if
      if  then // did we find a new minimum?
               // save new optimum
               // reset loop counter
       end if
        // increase loop counter
end while
Algorithm 1 Simulated annealing.

Algorithm 1 gives the pseudocode for the process where:


pick a new random value for a random parameter according to the parameter’s base distribution


is iff , .


is , being the number of iterations with no improvement and the number of such iterations allowed.


the cost as returned by the currently used metric.

We also use a Genetic algorithm holland1992genetic . The underlying optimization technique as algorithm offers the best compromise between optimization results and speed. The efficiency component is important as our goal is realtime pedestrian tracking.

Genetic algorithms seek to overcome the problem of local minima in optimization. This is accomplished by keeping a pool of parameter sets and, during each iteration of the optimization process, creating a new pool of potential solutions by combining and modifying these parameter sets.

  // initialize population
while  do
         // evaluate and select fittest
       if  then // should we terminate?
              // yes, stop loop
       end if
        // new generation
end while
Algorithm 2 Genetic algorithm.

Algorithm 2 provides pseudocode for the method given the following functions:

  • initialize(): parameters randomly initialized in accordance with the base distribution for each parameter.

  • selection(): individuals are sorted according to their score and divided into 3 groups: Best, Middle and Worst.

  • termination(): the algorithm is terminated after finding successive loop iterations without any new optimum.

  • reproduction(): based on which group it belongs to, a parameter set is attributed three probabilities , and . For each parameter of this individual, decides if the value is changed or not, decides if the value is changed by crossover or mutation and, finally, decides which type of mutation is done.

  • crossover: a crossover is done by copying a value from an individual belonging to the Best group.

  • mutation: a mutation is done by picking a new value at random based on either the base distribution or the current real distribution of an individual from the Best group (according to ).

At each iteration, this algorithm evaluates and ranks all possible parameter sets (solutions) currently in the solution pool. If there have been a certain number of successive iterations without any improvement, the process is terminated. Otherwise, individual parameter values in each solution have a probability of being modified. If so, this modification has a probability of being either a crossover or a mutation. If it is a crossover, a value from the corresponding parameter from a better ranked solution is selected; if it is a mutation, a new value is sampled from a probability distribution. This probability distribution can either be the one defined by the user (for instance, a preferred velocity could obey a normal law with mean

and standard deviation

) or one that is computed on parameter values from better ranked solutions.

Figure 5: Comparing the score of the different optimization approaches. Each graph is a range of the scores (minimum and maximum) and the black dot is the mean score. We compute the score from the normalized error metric. A lower value indicates better optimization. MMM or the ‘Motion-Model Mixture’ is the our approach.
Figure 6: This graph shows the time taken for each computing the every set of optimal parameters corresponding to each motion model. MMM is our approach. Time computed is in miliseconds. Each graph is a range of the scores (minimum and maximum) and the black dot is the mean score. We compute the score from the normalized error metric.

An error metric is also needed to compute the term in Equation (10). In our case, we’ve chosen a metric that simply computes the average 2-norm between the observed agent positions and the tracker-computed positions. Formally, this metric is defined at timestep as follows:

Figure 7: This is the RMS error in the predicted position compared to the ground truth. For an unbiased comparison, all measurements are in ground-space (meters). We have divided our dataset into 3 categories (Refer table 2) (a) Low-density datasets (b) Medium-density datasets (c) High-density datasets We find that our approach considerably lower error for future-state prediction in medium-density crowds.

4.4 Adaptive Particle Selection

The performance of a particle filter is proportional to the number of particles used for each pedestrian, and the process can be expensive for a high number of particles. However, with more particles, the probability that a pedestrian will be tracked accurately is higher; fewer particles, though computationally less expensive, actually lowers the tracking accuracy. As a result, we need to use an appropriate number of particles to balance the tradeoffs between computation cost and accuracy. Ideally, one would use fewer particles most of the time, increasing their number only when needed: when there is a large change in motion trajectory, lighting, appearance or partial occlusions, for example.

To this end, we estimate tracker confidence and particle selection by using the motion model. We analyze the confidence of our tracker given the number of particles based on combining various metrics to measure the propagation and motion model reliability. The propagation reliability is a measure of how well the object matches the initial target candidate and also the last tracked object:


where is the propagation reliability at time and denotes the object representation at time . Motion model reliability is a normalized difference measure between the tracked state and the predicted state given by the motion model :


where is the motion model reliability at timestep and is function varying linearly to the norm difference of the actual and simulated trajectories.

The combination of these metrics helps us in optimizing the number of active particles needed in the system. In our mixture of motion models, our system chooses the optimal motion algorithm from all possible motion models (Equation (11)) with the optimal parameter set. Hence the motion model reliability is always higher compared to systems with homogeneous or non-varying motion models.

Model / Parameters min max mean
Boids model
radius () 0.1 1 0.3
comfort speed () 1 2 1.5
Helbing model
radius () 0.1 1 0.3
comfort speed () 1 2 1.5
RVO model
comfort speed () 1 2 1.5
neighbor distance () 2 20 11
radius () 0.2 0.8 0.5
agent time horizon () 0.1 5 2
obstacle time horizon () 0.1 5 2
Table 1: Initial motion model parameter for optimization.
Dataset Challenges Density Agents
NDLS-1 BV, PO, IC High 131
IITF-1 BV, PO, IC, CO High 167
IITF-3 BV, PO, IC, CO High 189
IITF-5 BV, PO, IC, CO High 71
NPLC-1 BV, PO, IC Medium 79
NPLC-3 BV, PO, IC, CO Medium 144
IITF-2 BV, PO, IC, CO Medium 68
Dataset Challenges Density Agents
IITF-4 BV, PO, IC, CO Medium 116
NDLS-2 BV, PO, IC, CO Low 72
NPLC-2 BV, PO Low 56
seq_hotel IC, PO Low 390
seq_eth BV, IC, PO Low 360
zara01 BV, IC, PO Low 148
zara02 BV, IC, PO Low 204
Table 2: Crowd Scenes used as Benchmarks. We highlight many attributes of crowd these videos along with density and the number of number of pedestrians tracked. We use the following abbreviations about the underlying scene: Background Variations(BV), Partial Occlusion(PO), Complete Occlusion(CO), Illumination Changes(IC)
LIN Boids Helbing RVO MMM
MOTP 64.42% 52.82% 67.24% 57.10% 43.14% 70.52% 61.33% 49.88% 72.19% 63.17% 51.31% 73.98% 69.23% 54.29%
MOTA 49.42% 35.3% 31.37% 50.59% 26.42% 40.88% 53.28% 44.19% 33.51% 53.95% 48.81% 35.83% 54.18% 50.16% 38.83%
Table 3: We compare the MOTA and MOTP values across the density groups and the different motion models.
High Density Medium Density Low Density
LIN 53 17 63 27 51 35 59 18 67 15 60 29 36 22 52 36 68 23 69 21
Boids 58 15 66 23 56 33 65 14 73 13 65 26 40 19 52 35 70 22 72 19
Helbing 56 16 66 26 52 33 62 15 74 11 68 23 41 19 59 31 75 18 72 14
LTA 54 17 65 22 51 32 60 17 68 11 62 28 42 18 54 32 69 23 70 20
RVO 57 14 69 20 53 29 64 13 71 10 64 26 42 18 53 32 72 20 74 16
MeanShift 27 32 31 38 23 52 34 29 39 36 41 31 22 33 39 45 31 28 45 28
MMM 63 12 73 19 57 27 67 10 77 7 71 20 44 16 63 28 79 17 78 14
Table 4: We compare the percentage of successful tracks (ST) and ID switches (IS) of our mix motion model algorithm (MMM) with homogeneous motion models - LIN, Boids, Helbing, LTA, RVO and a baseline mean-shift tracker. Our method provides higher accuracy compared to homogeneous motion models and lesser ID switches. The benefits of our approach is higher, as the crowd density increases. These datasets are publicly available at
seq_hotel seq_eth zara01 zara02
LIN 182 92 187 58 51 27 49 27
Boids 192 78 202 59 52 27 54 26
Helbing 221 73 232 48 54 26 55 25
LTA 238 70 249 42 60 24 62 25
RVO 241 71 258 37 61 22 65 23
MeanShift 98 171 112 139 32 41 33 39
MMM 252 68 267 34 63 20 68 21
Table 5: We compare the percentage of successful tracks (ST) and ID switches (IS) of our mix motion model algorithm (MMM) with homogeneous motion models - LIN, Boids, Helbing, LTA, RVO and a baseline mean-shift tracker with standard datasets - seq_hotel , seq_eth , zara01 , zara02  pellegrini2010improving .
High Density Medium Density
MMM-C 63 11 74 12 57 11 67 12 78 14 71 13 46 13
MMM 63 27 73 28 57 26 67 26 77 28 71 26 44 26
Medium Density Low Density
IITF-4 NDLS-2 NPLC-2 seq_hotel seq_eth zara01 zara02
MMM-C 63 11 80 12 78 11 254 11 267 16 63 14 69 15
MMM 63 27 79 28 78 26 252 28 267 29 63 27 68 28
Table 6: We compare the percentage of successful tracks (ST) and average tracking frames per second (FPS) of our mixture of motion models algorithm adaptive particle filtering (MMM) and with constant particle numbers (MMM-C).
Figure 8: Computation cost comparison between the particle filter system and the optimization framework. The x-axis represents number of people tracked and the y-axis represent the computation time (in milliseconds)

5 Implementation and Results

In this section we present our implementation details and highlight the performance on 10 different crowd video datasets.

5.1 Evaluation

We use the CLEAR MOT keni2008evaluating evaluation metrics to analyze the performance analytically. We use the MOTP and the MOTA metrics. MOTP evaluates the alignment of tracks with the ground truth while MOTA produces a score based on the amount of false positives, missed detections, and identity switches. These metrics have become standard for evaluation of detection and tracking algorithms in the computer vision community, and we refer the interested reader to  keni2008evaluating for more a detailed explanation.

We analyze these metric across the density groups and the different motion models (Table 3).

5.2 Tracking Results

We highlight the performance of our algorithm based on a mixture of motion models on different benchmarks, comparing the performance of our algorithm with single, homogeneous motion model methods: constant velocity model (LIN), LTA pellegrini2009you , Social Forces yamaguchi2011you , Boids Reynolds1999 and RVO van2011reciprocal . LIN models the velocities of pedestrians as constant, and is the underlying motion model frequently used in the standard particle filter. The other four models compute the pedestrian states based on optimizing functions, which model collision avoidance, destinations of pedestrians, and the desired speed. In our implementation, we replace the state transition process of a standard particle filtering algorithm with different motion models.

We evaluate on some challenging datasets bera2014 which are available publicly and also some standard datasets from the pedestrian tracking community. These videos were recorded at 24-30 fps. We manually annotated these videos and corrected the perspective effect by camera calibration. We also compare our performance compared to a baseline mean-shift tracker (Table 4). We also compare the computational overhead of our optimization framework compared the particle filter system in terms of computation time. (Refer Figure 8)

For our evaluation, we have divided our system into two phases:

Initialization: Here we initialize the motion model estimation and parameter-optimization system with hand-drawn or ground truth data for a few initial frames, which is computed offline. For our experiments, we’ve used the first 10 frames. We compute a score that is used to choose the best-fit model from our motion model set and the associated parameters.


After learning from the initial data, we use the predicted set of parameters to model the state transition part of the standard Bayesian inference framework. We iteratively and incrementally recompute the score and update the motion model. This computation is performed in realtime.

We show the number of correctly tracked pedestrians and the number of ID switches. A track is counted as “successful” when the estimated mean error between the tracking result and the ground-truth value is less than 0.8 meter in groundspace. The average human stride length is about 0.8 meter and we consider the tracking to be incorrect if the mean error is more than this value. Our method provides 9-18% higher accuracy over LIN for medium density crowds (Table 

4). Moreover, we compare the performance of our adaptive particle tracking algorithm with a particle filter that uses constant number of particles (Table 6).

Figure 9: The results of our approach on some challenging datasets. From top to bottom, left to right: IITF-1, IITF-2, NPLC-1, IITF-3, NDLS-2, NDLS-1, NLPC-2, IITF-4, IITF-5. We are able to achieve a 4-12% increase in accuracy over homogeneous motion models at interactive framerates.

6 Limitations, Conclusions, and Future Work

We present a realtime algorithm for pedestrian tracking in crowded scenes. Our algorithm provides a good balance between accuracy and speed. We highlight its performance on many pedestrian datasets, showing that it can track crowded scenes in realtime on a PC with a multi-core CPU. As compared to prior algorithms of similar accuracy, we obtain 2-3 times speedup.

Our approach has some limitations related to our motion model. Our motion model set does not take into account physiological and psychological pedestrian traits. All pedestrians are modeled with the same sensitivity towards gender and density; our model doesn’t take into account heterogeneous agent characteristics, which affect the final behavior. These behavior characteristics can introduce additional errors in our confidence estimation. In practice, the performance of the algorithm can vary based on various other attributes of the input video.

As part of future work, we would like to incorporate the personality characteristics of the pedestrians, along with other characteristics, such as ‘fundamental diagrams’ from pedestrian dynamics. We would like to parallelize the approach on a GPU to handle more complex pedestrian datasets in realtime. Finally, we would like to use improved learning algorithms to increase the accuracy of our tracker.

7 Acknowledgements

This work was supported by NSF awards 1000579, 1117127, 1305286, Intel Corporation, and a grant from the Boeing Company


  • (1) Ali, S., Shah, M.: Floor fields for tracking in high density crowd scenes. In: ECCV, pp. 1–14 (2008)
  • (2) Antonini, G., Martinez, S.V., Bierlaire, M., Thiran, J.P.: Behavioral priors for detection and tracking of pedestrians in video sequences. INT. J. COMPUT. VIS 69(2), 159–180 (2006)
  • (3) Arulampalam, M.S., Maskell, S., Gordon, N., Clapp, T.: A tutorial on particle filters for online nonlinear/non-gaussian bayesian tracking. Signal Processing, IEEE Transactions on pp. 174–188 (2002)
  • (4) Bera, A., Manocha, D.: Realtime multilevel crowd tracking using reciprocal velocity obstacles.

    In: Proceedings of Conference on Pattern Recognition, Sweden (2014)

  • (5) Burgos-Artizzu, X.P., Dollár, P., Lin, D., Anderson, D.J., Perona, P.: Social behavior recognition in continuous video. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp. 1322–1329. IEEE (2012)
  • (6) Chung, S.Y., Huang, H.P.: A mobile robot that understands pedestrian spatial behaviors. In: IROS, pp. 5861–5866 (2010)
  • (7) Enzweiler, M., Gavrila, D.M.: Monocular pedestrian detection: Survey and experiments. PAMI pp. 2179–2195 (2009)
  • (8) Fajen, B.R.: Affordance-based control of visually guided action. ECOLOGICAL PSYCHOLOGY 19(4), 383–410 (2007)
  • (9) Helbing, D., Molnar, P.: Social force model for pedestrian dynamics. Physical review E (1995)
  • (10) Holland, J.H.: Genetic algorithms. Scientific american 267(1), 66–72 (1992)
  • (11) Karamouzas, I., Heil, P., van Beek, P., Overmars, M.H.: A predictive collision avoidance model for pedestrian simulation. In: Motion in Games, pp. 41–52 (2009)
  • (12) Keni, B., Rainer, S.: Evaluating multiple object tracking performance: the clear mot metrics. EURASIP Journal on Image and Video Processing 2008 (2008)
  • (13) Kratz, L., Nishino, K.: Going with the flow: pedestrian efficiency in crowded scenes. In: ECCV, pp. 558–572 (2012)
  • (14) Leal-Taixé, L., Pons-Moll, G., Rosenhahn, B.: Exploiting pedestrian interaction via global optimization and social behaviors. In: Outdoor and Large-Scale Real-World Scene Analysis, pp. 1–26. Springer (2012)
  • (15) Lerner, A.: /crowd-data
  • (16) Liu, W., Chan, A.B., Lau, R.W.H., Manocha, D.: Leveraging long-term predictions and online-learning in agent-based multiple person tracking (2014)
  • (17) Lucas, B.D., Kanade, T., et al.: An iterative image registration technique with an application to stereo vision. (1981)
  • (18) Ondřej, J., Pettré, J., Olivier, A.H., Donikian, S.: A synthetic-vision based steering approach for crowd simulation. ACM Trans. Graph. 29(4), 123–123 (2010). DOI 10.1145/1778765.1778860. URL
  • (19) Pellegrini, S., Ess, A., Schindler, K., Van Gool, L.: You’ll never walk alone: Modeling social behavior for multi-target tracking. In: ICCV, pp. 261–268 (2009)
  • (20) Pellegrini, S., Ess, A., Van Gool, L.: Improving data association by joint modeling of pedestrian trajectories and groupings. In: Computer Vision–ECCV 2010, pp. 452–465. Springer (2010)
  • (21) Pettré, J., Ondřej, J., Olivier, A.H., Cretual, A., Donikian, S.: Experiment-based modeling, simulation and validation of interactions between virtual walkers. In: Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, SCA ’09, pp. 189–198. ACM, New York, NY, USA (2009). DOI 10.1145/1599470.1599495. URL
  • (22) Reynolds, C.W.: Flocks, herds and schools: A distributed behavioral model. In: SIGGRAPH ’87, pp. 25–34. ACM, New York, NY, USA (1987). DOI
  • (23) Reynolds, C.W.: Steering behaviors for autonomous characters. In: Game Developers Conference 1999 (1999). URL
  • (24) Rodriguez, M., Laptev, I., Sivic, J., Audibert, J.Y.: Density-aware person detection and tracking in crowds. In: ICCV, pp. 2423–2430 (2011)
  • (25) Rodriguez, M., Sivic, J.e.a.: Data-driven crowd analysis in videos. In: ICCV, pp. 1235–1242 (2011)
  • (26) Sharma, P., Huang, C., Nevatia, R.: Unsupervised incremental learning for improved object detection in a video. In: CVPR, pp. 3298–3305 (2012)
  • (27) Song, X., Shao, X., Zhang, Q., Shibasaki, R., Zhao, H., Cui, J., Zha, H.: A fully online and unsupervised system for large and high-density area surveillance: Tracking, semantic scene learning and abnormality detection. TIST (2013)
  • (28) Tyagi, A., Davis, J.W.: A context-based tracker switching framework. In: WMVC, pp. 1–8 (2008)
  • (29) Van Den Berg, J., Guy, S.J., Lin, M., Manocha, D.: Reciprocal n-body collision avoidance. In: Robotics Research (2011)
  • (30) Wolinski, D., Guy, S.J., Olivier, A.H., Lin, M.C., Manocha, D., Pettré, J.: Parameter estimation and comparative evaluation of crowd simulations. In: Eurographics (2014)
  • (31) Wu, Y., Lim, J., Yang, M.H.: Online object tracking: A benchmark pp. 2411–2418 (2013)
  • (32) Yamaguchi, K., Berg, A.C., Ortiz, L.E., Berg, T.L.: Who are you with and where are you going? In: CVPR, pp. 1345–1352 (2011)
  • (33) Yilmaz, A.: Object tracking by asymmetric kernel mean shift with automatic scale and orientation selection. In: CVPR, pp. 1–6 (2007)
  • (34) Yilmaz, A., Javed, O., Shah, M.: Object tracking: A survey. Acm Computing Surveys (CSUR) (2006)
  • (35) Zhang, K., Zhang, L., Yang, M.H.: Real-time compressive tracking. In: ECCV, pp. 864–877 (2012)
  • (36) Zhao, X., Gong, D., Medioni, G.: Tracking using motion patterns for very crowded scenes. In: ECCV, pp. 315–328 (2012)