Multiple object tracking with context awareness

11/24/2014 ∙ by Laura Leal-Taixé, et al. ∙ 0

Multiple people tracking is a key problem for many applications such as surveillance, animation or car navigation, and a key input for tasks such as activity recognition. In crowded environments occlusions and false detections are common, and although there have been substantial advances in recent years, tracking is still a challenging task. Tracking is typically divided into two steps: detection, i.e., locating the pedestrians in the image, and data association, i.e., linking detections across frames to form complete trajectories. For the data association task, approaches typically aim at developing new, more complex formulations, which in turn put the focus on the optimization techniques required to solve them. However, they still utilize very basic information such as distance between detections. In this thesis, I focus on the data association task and argue that there is contextual information that has not been fully exploited yet in the tracking community, mainly social context and spatial context coming from different views.



There are no comments yet.


page 1

page 3

page 19

page 33

page 34

page 37

page 38

page 40

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Motivation

Video cameras are increasingly present in our daily lives: webcams, surveillance cameras and other imaging devices are being used for multiple purposes. As the number of data streams increases, it becomes more and more important to develop methods to automatically analyze this type of data. People are usually the central characters of most videos, therefore, it is particularly interesting to develop techniques to analyze their behavior. Either for surveillance, animation or activity recognition, multiple people tracking is a key problem to be addressed. In crowded environments occlusions and false detections are common, and although there have been substantial advances in the last years, tracking is still a challenging task. The task is typically divided into two steps: detection and data association. Detectors are nowadays very robust and provide extremely good detection rates for normal scenes, but still struggle with partial and full occlusions common in crowded scenes. Data association or tracking, on the other hand, is also extremely difficult in crowded scenarios, especially due to the high rate of missing data and common false alarms. In this thesis, we argue that there are two main sources of pedestrian context that have not been fully exploited in the tracking community, namely social context and spatial context coming from different views.

Typically, matching is solely based on appearance and distance information, , the closest detection in the following frame is matched to the detection in the current frame. But this can be completely wrong: let us imagine a queue of people waiting at a coffee shop and a low frame rate camera, as is typical for surveillance scenarios. In one frame we might have 4 persons waiting, while in the next the first person is already out of the queue and a new person entered the queue. In this case, if we only use distance information, the 4 persons of the first frame might be matched to the 4 persons of the second frame, although they are completely different pedestrians. Though this is an extreme case, it represents an error that is common while tracking in crowded scenarios, and this is only caused by the assumption that people do not move from one frame to the next, which is clearly inaccurate.

It is therefore more natural to take into account the context of the pedestrian, which can be the activity they are performing (, queueing) or the interactions that take place in a crowded scenario. It is clear that if a person is walking alone, he/she will follow a straight path towards his/her destination. But what if the environment becomes more crowded, and suddenly the straight path is no longer an option? The pedestrian will then try to find a rather short path to get to the same destination by avoiding other pedestrians and obstacles. All these pedestrian movements and reactions to the environment are ruled by what is called the Social Force Model.

Another source of information that has not been fully exploited in the literature is the spatial context coming from different camera views. It is typical for many applications to observe the same scenario from different viewpoints. In this case, object locations in the images are temporally correlated by the system dynamics and are geometrically constrained by the spatial configuration of the cameras. These two sources of structure have been typically exploited separately, but splitting the problem in two steps has obviously several disadvantages, because the available evidence is not fully exploited. For example, if one object is temporarily occluded in one camera, both data association for reconstruction and tracking become ambiguous and underconstrained when considered separately. If, on the other hand, evidence is considered jointly, temporal correlation can potentially resolve reconstruction ambiguities and vice versa.

In this thesis, we will show that pedestrian context is an incredibly rich source of information that should be included in the tracking procedure.

2 Contributions and Organization

Figure 1: Organization of the thesis

As we motivated in the previous section, tracking methods still fail to capture and fully exploit much of the context of a pedestrian and his/her environment. In this thesis we mainly focus on two sources of context, social and spatial, and provide solutions which are globally optimal. We now detail the organization of the rest of the thesis, shown as a diagram in Figure 1, as well as the main contributions.

Chapter 2. We start by introducing the problem at hand, namely pedestrian tracking or data association, and the basic paradigm we follow, , tracking-by-detection. Since the rest of the thesis is focused on the tracking part, we briefly introduce here a few state-of-the-art detectors we use throughout the thesis. Finally, we discuss some of the major problems of those detectors and how they are handled during the tracking phase.

Chapter 3. In this chapter, we give an introduction to the basics of Linear Programming, which is the optimization technique we will use throughout the entire thesis. We describe the main solver, namely the Simplex algorithm, and its geometric intuition. We then move towards the graphical model representation of a Linear Program, so we can make use of the efficient solvers (, -shortest paths) present in the network flow community.

Chapter 4.

Once we have the necessary notions of Linear Programming and graphical models, we formulate the multi-object tracking problem as a Linear Program. Unlike previous methods, this solves the tracking problem for all pedestrian trajectories and all frames at the same time, obtaining a unique and guaranteed global optimum. Furthermore, the problem can be solved in polynomial time. Our first contribution is to propose a small change in the graph structure which allows us to drop two parameters which have to be typically learned in an Expectation-Maximization fashion. Our graph performs tracking just depending on the actual scene information. Here we present an overview of the literature ranging from frame-by-frame matching to global methods similar to Linear Programming.

Chapter 5. Given the Linear Programming framework presented in the previous chapter, we propose here to enhance tracking by including the social context

. Interaction between pedestrians is modeled by using the well-known physical Social Force Model, used extensively in the crowd simulation community. The key insight is that people plan their trajectories in advance in order to avoid collisions, therefore, a graph model which takes into account future and past frames is the perfect framework to include social and grouping behavior. Instead of including social information by creating a complex graph structure, which then cannot be solved using classic LP solvers, we propose an iterative solution relying on Expectation-Maximization. Results on several challenging public datasets are presented to show the improvement of the tracking results in crowded environments. An extensive parameter study as well as experiments with missing data, noise and outliers are also shown to test the robustness of the approach. In this chapter, we present an overview of state-of-the-art tracking methods that use social context.

Chapter 6. In this chapter, we describe a method to include yet another source of context, in this case spatial context. As discussed in the previous section, spatial information between cameras and temporal information are still regarded in the literature as two separate problems, namely reconstruction and tracking. In this chapter, we argue that it is not necessary to separate the problem in two parts, and we present a novel formulation to perform 2D-3D assignments (reconstruction) and temporal assignments (tracking) in a single global optimization. When evidence is considered jointly, temporal correlation can potentially resolve reconstruction ambiguities and vice versa. The proposed graph structure contains a huge number of constraints, therefore, it can not be solved with typical Linear Programming solvers such as Simplex. In order to solve this problem, we rely on multi-commodity flow theory. We propose to exploit the specific structure of our problem and use Dantzig-Wolfe decomposition and branching to find the guaranteed global optimum. Here we present an overview of relevant literature of multiple view multiple target tracking.

Chapter 7. Our final chapter includes the conclusions of the thesis, as well as a discussion for possible future research directions.

Appendix A.

In the Appendix A of this thesis, we present a case study where Computer Vision is proven to be useful for the field of marine biology and chemical physics. An automatic method is presented for the tracking and motion analysis of swimming microorganisms. This includes early work done by the author at the beginning of the PhD. A new method for improved tracking, named multi-level Hungarian, is presented and compared with the Linear Programming formulation. Since microorganisms do not act according to the same social forces as humans, a method based on Hidden Markov Models is developed in order to analyze the motion of the microorganisms. The final software for tracking and motion analysis has proven to be a helpful tool for biologists and physicists as it provides a vast amount on analyzed data in an easy fast way.

3 Papers of the author

In this section, the publications of the author are detailed by topic and chronological order. The core parts of the thesis are based on four main publications of the author:

[lealcvpr2014] L. Leal-Taixé, M. Fenzi, A, Kuznetsova, Bodo Rosenhahn, Silvio Savarese. Learning an image-based motion context for multiple people tracking.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

, June 2014.

In this work, we present a method for multiple people tracking that leverages a generalized model for capturing interactions among individuals. At the core of our model lies a learned dictionary of interaction feature strings which capture relationships between the motions of targets. These feature strings, created from low-level image features, lead to a much richer representation of the physical interactions between targets compared to hand-specified social force models that previous works have introduced for tracking. One disadvantage of using social forces is that all pedestrians must be detected in order for the forces to be applied, while our method is able to encode the effect of undetected targets, making the tracker more robust to partial occlusions. The interaction feature strings are used in a Random Forest framework to track targets according to the features surrounding them.

[lealcvprw2014] L. Leal-Taixé, M. Fenzi, A, Kuznetsova, Bodo Rosenhahn, Silvio Savarese. Multi-target tracking with context from interaction feature strings.

IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR). SUNw: Scene Understanding Workshop

, June 2014.

This is the 2 page abstract version of [lealcvpr2014], presented at the same conference as invited paper in the Scene Understanding workshop.

[lealbookchapter2012] L. Leal-Taixé, Bodo Rosenhahn. Pedestrian interaction in tracking: the social force model and global optimization methods. Modeling, Simulation and Visual Analysis of Crowds: A multidisciplinary perspective. Springer, 2012.

In this work, we present an approach for multiple people tracking in semi-crowded environments including interactions between pedestrians in two ways: first, considering social and grouping behavior, and second, using a global optimization scheme to solve the data association problem. This is an extended text of the conference paper [lealiccv2011] in book chapter format. It is intended to be an exhaustive introduction to Linear Programming for multiple people tracking, providing the necessary background on both graphical models and optimization to allow students to start programming such a tracking system.

[lealdagstuhl2012] L. Leal-Taixé, G. Pons-Moll, B. Rosenhahn. Exploiting pedestrian interaction via global optimization and social behaviors. Theoretic Foundations of Computer Vision: Outdoor and Large-Scale Real-World Scene Analysis. Springer, 2012.

In this work, we present an approach for multiple people tracking in semi-crowded environments including interactions between pedestrians in two ways: first, considering social and grouping behavior, and second, using a global optimization scheme to solve the data association problem. This is an extended text of the conference paper [lealiccv2011], which includes more experiments, detailed evaluation of the effect of the method’s parameters, detailed implementation details and extended theoretical background on graphical models.

[lealcvpr2012] L. Leal-Taixé, G. Pons-Moll, B. Rosenhahn. Branch-and-price global optimization for multi-view multi-object tracking. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2012.

In this work, we present a new algorithm to jointly track multiple objects in multi-view images. While this has been typically addressed separately in the past, we tackle the problem as a single global optimization. We formulate this assignment problem as a min-cost problem by defining a graph structure that captures both temporal correlations between objects as well as spatial correlations enforced by the configuration of the cameras. This leads to a complex combinatorial optimization problem that we solve using Dantzig-Wolfe decomposition and branching. Our formulation allows us to solve the problem of reconstruction and tracking in a single step by taking all available evidence into account. In several experiments on multiple people tracking and 3D human pose tracking, we show that our method outperforms state-of-the-art approaches.

[lealiccv2011] L. Leal-Taixé, G. Pons-Moll, B. Rosenhahn. Everybody needs somebody: modeling social and grouping behavior on a linear programming multiple people tracker. IEEE International Conference on Computer Vision Workshops (ICCV). 1st Workshop on Modeling, Simulation and Visual Analysis of Large Crowds, November 2011.

In this work, we present an approach for multiple people tracking in semi-crowded environments. Most tracking methods make the assumption that each pedestrian’s motion is independent, thereby ignoring the complex and important interaction between subjects. On the contrary, our method includes the interactions between pedestrians in two ways: first, considering social and grouping behavior, and second, using a global optimization scheme to solve the data association problem. Results are presented on three challenging, publicly available datasets to show that our method outperforms several state-of-the-art tracking systems.

The five publications related to the appendix section of the thesis are detailed below. This work was partially funded by the German Research Foundation, DFG projects RO 2497/7-1 and RO 2524/2-1 and the EU project AMBIO, and done in collaboration with the Institute of Functional Interfaces of the Karlsruhe Institute of Technology. Digital in-line holography is a microscopy technique which has gotten an increasing amount of attention over the last few years in the fields of microbiology, medicine and physics, as it provides an efficient way of measuring 3D microscopic data over time. In the following works, we explore detection, tracking and motion analysis on this challenging data, as well as ways for extending the method to a multiple camera system.

[maleschlijskibio2012] S. Maleschlijski, G. H. Sendra, A. Di Fino, L. Leal-Taixé, I. Thome, A. Terfort, N. Aldred, M. Grunze, A.S. Clare, B. Rosenhahn, A. Rosenhahn. Three dimensional tracking of exploratory behavior of barnacle cyprids using stereoscopy. Biointerphases. Journal for the Quantitative Biological Interface Data. Springer, 2012.

In this work, we present a low-cost transportable stereoscopic system consisting of two consumer camcorders. We apply this novel apparatus to behavioral analysis of barnacle larvae during surface exploration and extract and analyze the three-dimensional patterns of movement. The resolution of the system and the accuracy of position determination are characterized. In order to demonstrate the biological applicability of the system, three-dimensional swimming trajectories of the cypris larva of the barnacle Semibalanus balanoides are recorded in the vicinity of a glass surface. Parameters such as swimming direction, swimming velocity and swimming angle are analyzed.

[maleschlijskibio2011] S. Maleschlijski, L. Leal-Taixé, S. Weisse, A. Di Fino, N. Aldred, A.S. Clare, G.H. Sendra, B. Rosenhahn, A. Rosenhahn. A stereoscopic approach for three dimensional tracking of marine biofouling microorganisms. Microscopic Image Analysis with Applications in Biology (MIAAB), September 2011.

In this work, we describe a stereoscopic system to track barnacle cyprids and an algorithm to extract 3D swimming patterns for a common marine biofouling organism - Semibalanus balanoides. The details of the hardware setup and the calibration object are presented and discussed. In addition we describe the algorithm for the camera calibration, object matching and stereo triangulation. Several trajectories of living cyprids are presented and analyzed with respect to statistical swimming parameters.

[lealdagstuhl2011] L. Leal-Taixé, M. Heydt, A. Rosenhahn, B. Rosenhahn. Understanding what we cannot see: automatic analysis of 4D digital in-line holography data. Video Processing and Computational Video. Springer, July 2011.

In this work, we present a complete system for the automatic analysis of digital in-line holographic data; we detect the 3D positions of the microorganisms, compute their trajectories over time and finally classify these trajectories according to their motion patterns. This work includes the contributions presented in

[lealwmvc2009] and [lealdagm2010], extended experiments, theoretical background and implementation details.

[lealdagm2010] L. Leal-Taixé, M. Heydt, S. Weisse, A. Rosenhahn, B. Rosenhahn. Classification of swimming microorganisms motion patterns in 4D digital in-line holography data. 32nd Annual Symposium of the German Association for Pattern Recognition (DAGM), September 2010.

In this work, we present an approach for automatically classifying complex microorganism motions observed with digital in-line holography. Our main contribution is the use of Hidden Markov Models (HMMs) to classify four different motion patterns of a microorganism and to separate multiple patterns occurring within a trajectory. We perform leave-one-out experiments with the training data to prove the accuracy of our method and to analyze the importance of each trajectory feature for classification. We further present results obtained on four full sequences, a total of 2500 frames. The obtained classification rates range between 83.5% and 100%.

[lealwmvc2009] L. Leal-Taixé, M. Heydt, A. Rosenhahn, B. Rosenhahn. Automatic tracking of swimming microorganisms in 4D digital in-line holography data. IEEE Workshop on Motion and Video Computing (WMVC), December 2009.

In this work, we approach the challenges of a high throughput analysis of holographic microscopy data and present a system for detecting particles in 3D reconstructed holograms and their 3D trajectory estimation over time. Our main contribution is a robust method, which evolves from the Hungarian bipartite weighted graph matching algorithm and allows us to deal with newly entering and leaving particles and compensate for missing data and outliers. In the experiments we compare our fully automatic system with manually labeled ground truth data and we can report an accuracy between 76% and 91%.

Aside from the previous publications related to multiple object tracking and motion analysis, the author was also involved in other projects, mainly related to pose estimation:

[kuznetsovaiccvw2013] A. Kuznetsova, L. Leal-Taixé, B. Rosenhahn. Real-time sign language recognition using a consumer depth camera. IEEE International Conference on Computer Vision Workshops (ICCV). 3rd IEEE Workshop on Consumer Depth Cameras for Computer Vision (CDC4CV), December 2013.

In this work, we propose a precise method to recognize hand static gestures from a depth data provided from a depth sensor. Hand sign recognition is performed using a multi-layered random forest (MLRF), which requires less the training time and memory when compared to a simple random forest with equivalent precision. We evaluate our algorithm on synthetic data, on a publicly available Kinect dataset containing 24 signs from American Sign Language (ASL) and on a new dataset, collected using the Intel Creative Gesture Camera.

[fenzicvpr2013] M. Fenzi, L. Leal-Taixé, B. Rosenhahn, J. Ostermann. Class generative models based on feature regression for pose estimation of object categories. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2013.

In this work, we propose a method for learning a class representation that can return a continuous value for the pose of an unknown class instance, using only 2D data and weak 3D labelling information. Our method is based on generative feature models, i.e., regression functions learnt from local descriptors of the same patch collected under different viewpoints. We evaluate our approach on two state-of-the-art datasets showing that our method outperforms other methods by 9-10%.

[fenzidagm2012] M. Fenzi, R. Dragon, L. Leal-Taixé, B. Rosenhahn, J. Ostermann. 3D Object Recognition and Pose Estimation for Multiple Objects using Multi-Prioritized RANSAC and Model Updating. 34th Annual Symposium of the German Association for Pattern Recognition, DAGM. August 2012.

In this work, we present a feature-based framework that combines spatial feature clustering, guided sampling for pose generation and model updating for 3D object recognition and pose estimation. We propose to spatially separate the features before matching to create smaller clusters containing the object. Then, hypothesis generation is guided by exploiting cues collected off- and on-line, such as feature repeatability, 3D geometric constraints and feature occurrence frequency. The evaluation of our algorithm on challenging video sequences shows the improvement provided by our contribution.

[ponsdagstuhl2012] G. Pons-Moll, L. Leal-Taixé, B. Rosenhahn. Data-driven manifold for outdoor motion capture. Theoretic Foundations of Computer Vision: Outdoor and Large-Scale Real-World Scene Analysis. Springer, 2012.

In this work, we present a human motion capturing system that combines video input with sparse inertial sensor input under a particle filter optimization scheme. It is an extension of the work presented in [ponsiccv2011] which includes a thorough theoretical introduction, extended experimental section and implementation details.

[ponsiccv2011] G. Pons-Moll, A. Baak, J. Gall, L. Leal-Taixé, M. Mueller, H.-P. Seidel and B. Rosenhahn. Outdoor human motion capture using inverse kinematics and von Mises-Fisher sampling. IEEE International Conference on Computer Vision (ICCV), November 2011.

In this paper, we introduce a novel hybrid human motion capturing system that combines video input with sparse inertial sensor input. Employing an annealing particle-based optimization scheme, our idea is to use orientation cues derived from the inertial input to sample particles from the manifold of valid poses. Then, visual cues derived from the video input are used to weigh these particles and to iteratively derive the final pose. Our method can be used to sample poses that fulfill arbitrary orientation or positional kinematic constraints. In the experiments, we show that our system can track even highly dynamic motions in an outdoor environment with changing illumination, background clutter and shadows.

[ponsdagm2011] G. Pons-Moll, L. Leal-Taixé, T. Truong, B. Rosenhahn. Efficient and robust shape matching for model based human motion capture. 33rd Annual Symposium of the German Association for Pattern Recognition (DAGM), September 2011.

In this work, we present an approach for Markerless Motion Capture using string matching. We find correspondences between the model predictions and image features using global bipartite graph matching on a pruned cost matrix. Extracted features such as contour, gradient orientations and the turning function of the shape are embedded in a string comparison algorithm. The information is used to prune the association cost matrix discarding unlikely correspondences. This results in significant gains in robustness and stability and reduction of computational cost. We show that our approach can stably track fast human motions where standard articulated Iterative Closest Point algorithms fail. This work was done by a Master’s student whom the author co-supervised.

The following publication was presented out of the author’s Master’s Thesis completed at Northeastern University in Boston, USA:

[lealmsc2009] L. Leal-Taixé, A.U. Coskun, B. Rosenhahn, D. Brooks. Automatic segmentation of arteries in multi-stain histology images. World Congress on Medical Physics and Biomedical Engineering, September 2009.

Atherosclerosis is a very common disease that affects millions of people around the world. Currently most of the studies conducted on this disease use Ultrasound Imaging (IVUS) to observe plaque formation, but these images cannot provide any detailed information of the specific morphological features of the plaque. Microscopic imaging using a variety of stains can provide much more information although, in order to obtain proper results, millions of images must be analyzed. In this work, we present an automatic way to find the Region of Interest (ROI) of these images, where the atherosclerotic plaque is formed. Once the image is well-segmented, the amount of fat and other measurements of interest can also be determined automatically.

Aside from the aforementioned publications, the author edited a post-proceedings book of the Dagstuhl Seminar organized in 2012:

[lealbook2012] F. Dellaert, J.-M. Frahm, M. Pollefeys, B. Rosenhahn, L. Leal-Taixé. Theoretic Foundations of Computer Vision: Outdoor and Large-Scale Real-World Scene Analysis. Springer, April 2012.

The topic of the meeting was Large-Scale Outdoor Scene Analysis, which covers all aspects, applications and open problems regarding the performance or design of computer vision algorithms capable of working in outdoor setups and/or large-scale environments. Developing these methods is important for driver assistance, city modeling and reconstruction, virtual tourism, telepresence and outdoor motion capture. After the meeting, this post-proceedings book was edited with the collaboration of all participants who sent a paper that was peer-reviewed by three reviewers.

4 The scale of tracking

Videos of walking pedestrians can vary in an infinite number of ways. Camera position, camera distance and type of environment are a few of the characteristics that define the type of video that will be created. Before introducing the problem that we are dealing with in this thesis, we first need to introduce the types of scenarios and the types of videos we will be working with.

In Figure 2, we can see four examples of different scenarios with varying crowdness levels. The first example in Figure 2(a), from the well-known PETS2009 dataset [pets2009], shows a scene with few pedestrians. The small size of the pedestrians, similar clothing and occlusions behind the pole or within themselves make this a challenging scenario. Nonetheless, recent methods have shown excellent results on this video, which is why more difficult datasets have been introduced.

One example from the Town Center dataset [benfoldcvpr2011] is shown in Figure 2(b). This semi-crowded environment is challenging for the high amount of occlusions, but it is well-suited for the study of social behaviors as we will see in Chapter . Pedestrian detection is very challenging in these scenarios, since pedestrians are almost never fully visible and tracking is difficult due to the high amount of crossing trajectories.

Even more crowded scenarios, like the one shown in Figure 2(c), can still be analyzed with special methods which take into account the high density of the crowd [rodrigueziccv2011poster]. For this category of videos, either the target to follow is manually initialized [rodrigueziccv2009] or only head positions are tracked, since other parts of the body are rarely visible for a detector to work. Other approaches rely on feature tracking and motion factorization [brostowcvpr2006], conveying the idea that if two points move in a similar way they belong to the same object. In this last case, there is no need for a detection step.

Finally, we have extremely crowded scenarios like marathons, demonstrations, etc. which are filmed from an elevated point of view, as in Figure 2(d). In these cases, individuals cannot be detected and identified, and therefore the task changes from individual person tracking towards analysis of the overall flow of the crowd [alieccv2008, rodrigueziccv2011].

(a) Sparse
(b) Semi-crowded
(c) Crowded
(d) Macroscopic
Figure 2: Scenarios with different crowdness levels. (a) Sparse: individuals are detected and tracked throughout the scene. (b) Semi-crowded: it is still possible to detect individuals, but occlusions and missed detections are very common, making tracking challenging. (c) Crowded: tracking full-body pedestrians is no longer possible, but detection and tracking of heads is still performed. Person counting is a common task for videos of this crowdness level. Image from [rodrigueziccv2011poster]. (d) Macroscopic scenario: individuals cannot be properly detected, therefore the goal in these scenarios is typically to find the overall flow of the crowd. Image from [alieccv2008].

Therefore, depending on the amount of people present in the scene, we can perform two types of tasks for video analysis:

  • Microscopic tracking focuses on the detection and tracking of individuals. Behavior analysis is centered around each individual and possibly their interactions. It uses individual motion and appearance features and is not too concerned with the overall motion in the scene.

  • Macroscopic tracking, on the other hand, focuses on capturing the “flow" of the crowd, the global behavior and motion tendencies. It is not focused on observing individual behavior but rather network behavior. Individual tracking can be performed if a target is manually initialized, since detection is not possible in this type of videos.

Throughout this thesis, we work on sparse and semi-crowded scenarios as shown in Figures 2(a) and 2(b). We use the tracking-by-detection paradigm, which we detail in the following section.

5 Tracking-by-detection paradigm

As mentioned in the previous section, there are several approaches for pedestrian video analysis. We focus on microscopic tracking, which means we are interested in detecting and tracking individuals. For this task, the tracking-by-detection paradigm has become increasingly popular in recent years, driven by the progress in object detection. Such methods involve two independent steps: (i) object detection on all individual frames and (ii) tracking or association of those detections across frames.

Figure 3: Tracking-by-detection paradigm. Firstly, an independent detector is applied to all image frames to obtain likely pedestrian detections. Secondly, a tracker is run on the set of detections to perform data association, , link the detections to obtain full trajectories.

We can see a diagram of the tracking-by-detection paradigm in Figure 3. There are two main components: the detector and the tracker. State-of-the-art detectors are discussed in Section 6, while the tracker is the main focus of the thesis. As we will see later on in this chapter, detectors are not perfect and often return false alarms or miss pedestrians. This makes tracking or data association a challenging task. Some of the most important challenges include:

  • Missed detections: long-term occlusions are usually present in semi-crowded scenarios, where a detector might lose a pedestrian for 1-2 seconds. In this case, it is very hard for the tracker to re-identify the pedestrian without distinctive appearance information, and therefore, the track is usually lost. That is why in recent literature, researchers are opting for global optimization methods [zhangcvpr2008, berclaztpami2011, lealiccv2011], which are very good at dealing with long-term occlusions.

  • False alarms: the detector can be triggered by regions in the image that actually do not contain any pedestrian, creating false positive. A tracker might follow the false alarms and create what is called a ghost trajectory.

  • Similar appearance: one source of information commonly used for pedestrian identification is appearance. However, in some videos similar clothing can lead to virtually identical appearance models for two different pedestrians. Many methods in recent literature focus on the motion of the pedestrian rather than his/her appearance [pellegriniiccv2009, lealiccv2011].

  • Groups and other special behaviors

    : when dealing with semi-crowded scenarios, it is very common to observe social behaviors like grouping or waiting at a bus stop or stopping to talk to a person. All these behaviors do not fit classic tracking models like Kalman Filter

    [kalman], which consider pedestrians motion to be rather constant.

These are a few of the challenges that tracking has to address. In this thesis, we make the observation that there is a lot of context that is not being used for tracking, especially social context or pedestrian interaction and spatial context coming from multiple views of the same scene. The proper use of these two sources of context in a global optimization scheme will be the center of the thesis.

6 Detectors

There are many pedestrian detectors in literature. Even though this thesis is focused on the data association part of multiple object tracking, we want to give a brief overview of three of the most used methods for pedestrian detection. Such methods can be classified in many ways; we detail one possible classification scheme:

Model-based detectors. A model of the background is created, and then a pixel-wise or block-wise comparison of a new image against the background model is performed to detect regions that do not fit the model [detectionsurvey3]. This method is commonly used for video surveillance, since the camera is static and therefore the background model can be learned accurately. The drawback of these techniques is that they are very sensitive to changes in illumination and occlusions.

Template-based detectors. These detectors use a pre-learned set of templates, based for example on image features such as edges [dalalcvpr2005]. The detector is triggered if the image features inside the local search window meet certain criteria. The drawback of this approach is that its performance can be significantly affected by background clutter and occlusions; if a person is partly occluded, the overall detection score will be very low because part of the image will be completely different from the learned examples.

Part-based detectors. One downside of holistic detectors like the one presented in [dalalcvpr2005] is that they are easily affected by occlusions and local deformations. We would need a lot of training data to cover all the deformations that a body can undergo. In order to reduce the amount of data needed, recent works [felzenszwalbcvpr2008, felzenszwalbcvpr2010] have proposed to use part-based methods, in which a template for each body part is learned separately. This way, deformations can be learned locally for each part and later combined. Another advantage of this method is that it is more robust to occlusions, since if one part is occluded, all the others can still be detected and combined for an overall high detection score. Other detectors based on parts have also been presented and created specifically to address occlusions [shucvpr2012, wojekcvpr2011].

Similar to these are block-based detectors, either based on HoG features [galltpami2011] or SIFT features [leibeijcv2005]. The objective is to learn the appearance of blocks inside the bounding box of a detection. At testing time, each block votes for the position of the center of the object to be detected.

There is a different family of detectors, namely online detectors, that formulate the problem of tracking as that of re-detection. The combination of both types of detectors can be very beneficial as shown in [shucvpr2013], specially to account for appearance variations which might not be captured by the learned templates.

We refer the reader to the following survey [detectionsurvey1] regarding Adaboost and HOG-based pedestrian detectors for monocular videos; [detectionsurvey3] for background subtraction techniques based on a mixture of gaussian background modeling and [felzenszwalbtpami2010] for a detailed description of part-based models for object detection.

6.1 Background modeling using Mixture-of-Gaussians

While the most basic background subtraction methods are based on a frame-by-frame image difference, we detail here the model-based method presented in [backgroundsubtraction] and used in the OpenCV implementation [opencv]

. The basic idea is to model each pixel’s intensity by using a Gaussian Mixture Model (GMM). A simple heuristic determines which intensities most probably belong to the background, and pixels which do not match these are called foreground pixels. Foreground pixels are grouped using 2D connected component analysis.

Figure 4: (a) Original input image. (b) Background subtraction using the pre-learned model. White pixels are classified as foreground, black pixels as background. (c) Final detected bounding boxes.

An example of this process is shown in Figure 4. As we can see, the background model is not perfect, which often leads to spurious foreground pixels around the scene as in Figure 4. Of course this method detects all kinds of moving objects, and therefore it is a method prone to false detections. In the experiments for this thesis, we use the homography provided by the camera calibration in order to determine the approximate size of a pedestrian on each pixel position. This allows us to determine a rough bounding box size and to discard groups of foreground pixels which are too small to be a pedestrian.

6.2 Histogram of Oriented Gradients (HOG)

The essential thought behind the Histogram of Oriented Gradients (HOG) descriptor is that local object appearance and shape within an image can be described by the distribution of intensity gradients or edge directions. An overview of the pedestrian detection process as described in [dalalcvpr2005] is shown in Figure 5.

Figure 5:

Overview of the feature extraction and object detection chain.

The first step is to compute the gradients, then divide the image into small spatial windows or cells. For each cell we accumulate a local 1-D histogram of gradient directions or edge orientations over the pixels of the cell. The histograms can also be contrast-normalized for better invariance to changes in illumination or shadowing. Normalization is done over larger spatial regions called blocks. The detection window is covered with an overlapping grid of HOG descriptors, and the resulting feature vector is used in a conventional SVM classifier

[svm, PRML] that learns the appearance of a pedestrian vs. non-pedestrian.

Figure 6: The HOG detector is based mainly on silhouette contours. As we can see, the most active blocks are centered on the image background just outside the contour. (a) Average gradient image over the training samples. (b) Each “pixel" shows the maximum positive SVM weight in the block centered in the pixel. (c) Likewise for the negative SVM weights. (d) Test image. (e) HOG descriptor. (f) HOG descriptor weighted by positive SVM weights. (g) Likewise for negative weights. Images from [dalalcvpr2005].

The HOG descriptor is particularly suited for human detection in images. This is because coarse spatial sampling, fine orientation sampling, and strong local photometric normalization allow the individual body movement of pedestrians to be ignored so long as they maintain a roughly upright position.

6.3 Part-based model

Recent works have proved that modeling objects as a deformable configuration of parts [felzenszwalbijcv2005, felzenszwalbcvpr2010] leads to increased detection performance compared to rigid templates [dalalcvpr2005]. In the case of human detection, this is specially useful as the body can assume a large number of different poses. This model can also be used to estimate the 2D human pose of humans [yangcvpr2011].

Figure 7: Deformable part-based model detector. (a) Example of a detection of a person. Green box represents the root filter detection while the yellow boxes represent the part detections. (b) Coarse template or root filter. (c) Templates of the parts. (d) Spatial model for the location of each part. Images from [felzenszwalbcvpr2008].

The basic idea is to have a model based on several HOG feature filters. The model for each object consists of one global root filter (see Figure 7), which is equivalent to the rigid template as presented before, and several part models. The features of the part filters are computed at twice the spatial resolution of the root filter in order to capture smaller details. Each part model specifies a spatial model (see Figure 7) and a part filter (see Figure 7). The spatial model defines a set of allowed placements for a part relative to the detection window and a deformation cost for each placement.

Detection is done using a sliding window approach. The score is computed by adding the score of the root filter and the sum over all parts, taking into account the placement of each part, the filter score and the deformation cost. Usually both part-based and rigid template-based approaches are prone to double detections, therefore a non-maxima suppression step is necessary to avoid too many false detections around one pedestrian. We will show examples of this phenomenon in Section 7.

Training is done by using a set of images with an annotated bounding box around each instance of an object (a pedestrian in our case). Learning is done in a similar way as in [dalalcvpr2005], only now, apart from learning the model parameters, the part placements also need to be learned. These are considered as latent values and therefore Latent SVM is used to learn the model.

7 Detection results

In this section, we discuss some detection results, show common failure cases and present some further methods proposed in recent literature. In Figure 8, we plot some results of the three methods referenced in the previous section on the publicly available dataset PETS2009 [pets2009].

In Figure 8(a), we show a common failure case of background subtraction methods. As we can see, the three pedestrians in the center of the image are very close to each other, which means the background subtraction method obtains a single blob in that region. In some cases it is possible to determine the presence of more than one pedestrian based on blob size and a knowledge of the approximate size of the pedestrian in pixels. In this case, though, the partial occlusion of one of the pedestrians makes it hard to determine exactly how many pedestrians belong to the foreground blob. The resulting detection is therefore placed in the middle of the group of pedestrians, which means we do not only have missed detections but also an incorrect position of the detection, which in fact will be considered as a false alarm. A similar situation is shown for the two pedestrians on the right side of the image. The final detection is positioned in the middle between them.

(a) Background subtraction
(b) HOG detector
(c) Part-based detector
Figure 8: Example of detection results on one frame of the PETS2009 sequence. (a) Using background subtraction (Section 6.1). (b) Using HOG features and SVM learning (Section 6.2). (c) Using part-based model, HOG features and Latent SVM learning (Section 6.3).

In Figure 8(b), we show results of a HOG detector with SVM learning. Here, the major problems are double detections and the threshold of the score that determines what is a detection and what is not. As we can see in Figure 8(b), if the threshold is too low we can get a lot of false detections. The advantage is that we can detect half-occluded pedestrians like the orange pedestrian behind the pole.

A part-based detector returns the result shown in Figure 8(c). As we can see, it is successful in finding partially occluded people or people who are close together. It only fails to detect the pedestrian occluded by the pole; this is mainly because one of the most distinctive parts for detections is the one of the head and shoulders, forming an omega shape.

(a) HOG detector
(b) Part-based detector
(c) HOG detector
(d) Part-based detector
Figure 9: Example of detection results on the Town Center sequence. (a,c) Using HOG features and SVM learning (Section 6.2). (b,d) Using part-based model, HOG features and Latent SVM learning (Section 6.3).

We show results on the Town Center dataset [benfoldcvpr2011] in Figure 9. This is a high resolution dataset of a busy town center, where partial occlusions and false alarms are very common. As we can see, double detections are specially problematic, for both the simple HOG detector and the part-based detector. It is common that two pedestrians trigger a single detection with a bigger bounding box, which means the non-maxima suppression is a key step in this case. Nonetheless, these methods still present two key advantages for this dataset: (i) most false detections can be easily removed using camera calibration and an approximate size of a pedestrian; (ii) there are few missed pedestrians. As we will see in Chapter , the Linear Programming algorithms for tracking are capable of handling false alarms better than missing data.

Figure 10: Example of pedestrians walking with objects. This often leads to double detections or missed detections, but pedestrian-object interactions can be a useful source of information to improve tracking.

It is common to see pedestrians walking with objects, either pushing a bicycle, carrying a bag or pushing a trolley or a stroller, as we can see in Figure 10. The close proximity to those objects often leads to double detections or can even lead to the complete misdetection of the pedestrian. In recent works researchers proposed to include those objects in the tracking system. In [mitzeleccv2012] a tracker for unknown shapes was proposed in order to deal not only with pedestrians but also with carried objects. 3D information was used to create a model of unknown shapes which was then tracked through time. Furthermore, in [baumgartnercvpr2013] pedestrians interaction with objects was included to support tracking hypotheses. This confirms the argument presented in this thesis, that context from a pedestrian’s environment (in this case pedestrian-object interaction) can be extremely useful to improve tracking. Finally, tracking systems for complete scene understanding are becoming more and more important in the literature [wojektpami2013].

8 What is Linear Programming?

A linear program consists of a linear objective function


subject to linear constraints


Solving the program means finding the that maximize (or minimize) the objective function while satisfying the linear constraints. The linear program can be expressed as


where is the matrix of coefficients and the vector that defines the constraints of the LP. The problem constraints can be written as equalities or inequalities (), as these can always be converted to a standard form without changing the semantics of the problem.

A point is called feasible if it satisfies all linear constraints, see Figure 11. If there are feasible solutions to a linear program, then it is called a feasible program. A problem can be infeasible if its constraints are contradictory, , and .

A feasible is an optimal solution to a linear program if for all feasible , see Figure 11.

A linear program is bounded if there exists a constant such that for all feasible . An example of an unbounded problem can be seen in Figure 11.

Figure 11: Representations of Linear Programs with constraints represented by colored lines and half-spaces represented by colored regions. (a) Representation of a Linear Program with three constraints represented in blue, green and red, and its space of possible solutions (in yellow). (b) Optimal solution for the Linear Program of maximizing (yellow dot). (c) An unbounded Linear Program, when trying to maximize the solution space is infinite, as indicated by the arrows pointing towards the unbounded direction.

8.1 Linear Programming forms

A Linear Program can be expressed in different forms, namely, Standard Form 1, Inequality Form, Standard Form 2 and General Form. In order to solve a problem with the Simplex method, for example, we need to have the problem in Standard Form 1, therefore, it is useful to know how to easily go from one form to another. All forms share the same objective function, which is a minimization, but change the way in which the constraints are expressed. Remember that we have variables, , and constraints, , .

Standard Form 1. The constraints are expressed as equalities and it is implied that the variables are nonnegative.


Inequality Form. The constraints are expressed as inequalities and we need to explicitly define the non-negativity constraints (if any).


Standard Form 2. The constraints are expressed as inequalities and it is implied that the variables are nonnegative.


General Form. The constraints are expressed both as equalities and inequalities. We need to explicitly define the non-negativity constraints (if any).


Once we have all the forms defined, we are interested in knowing how to go from one form to the other. We are specially interested in converting a problem to the Standard Form 1, which is the one we need to use the Simplex algorithm. Any LP can be converted into the Standard Form 1 by performing a series of operations. Let us consider the following example of an LP problem:


In order to express this problem in Standard Form 1, we can follow a set of simple transformations:

  • To convert a maximization problem into a minimization one, we simply negate the objective function:

  • To convert inequalities into equalities, we introduce a set of slack variables which represent the difference between the two sides of the inequality and are assumed to be nonnegative. The cost on the objective function for these variables is zero:

  • If the lower bound of a variable is not zero, we introduce another variable and perform substitution:

  • We can replace unrestricted variables by the difference of two restricted variables:


After all the transformations, we obtain the following LP in standard form:


9 Geometry of a Linear Program

The geometry of a Linear Program (LP) is important since most solvers exploit this geometry in order to obtain the optimal solution of an LP efficiently. A region defined by the LP, like the yellow striped region in Figure 11, has a set of corners, also called vertices. If an LP is feasible and bounded, then the optimal solution lies on a vertex. More formally, a set of vectors in is a polyhedron if for some matrix and some vector . defines the set of feasible solutions, as shown in Figure 11.

An inequality is valid for a polyhedron if each satisfies . The inequality is active at if . For example, in Figure 11 we can see that the optimal solution, depicted as a yellow dot, is active for the green and red constraints.

Figure 12:

(a) Half-space defined by the inequality constraint. (b) Hyperplane defined by the equality constraint.

Considering , and , we can see the representation of the inequality constraint in Figure 12 as a half-space and the equality constraint in Figure 12 as a hyperplane.

Let us now consider the notion of a vertex. Looking at Figure 11, we can see that the optimal solution in yellow is a point inside where the green and red constraints are active. In this 2D space, we need two constraints to define a vertex. More formally, a point is a vertex of if there exist or more inequalities that are valid for and active at and not all active at any other point in .

Another interpretation of the definition of vertices is that the point is a basic solution if , where is a sub-system of active inequations at . If , then it is a basic feasible solution. In this case, is a vertex of iff it is a basic feasible solution.

If a linear program is feasible and bounded and if , the LP has an optimal solution that is a vertex.

Recall from linear algebra that a system of equations with constraints and variables can either be directly solvable if and is full-rank, which means it is invertible. If , we have an underdetermined system which leads to more than one optimal solution. For example, we can have several solutions that lie on an edge instead of only one solution on a vertex. Finally, if , we have an overdetermined system, in which case it is possible that there exists no solution. Usually these problems are solved by using least-squares (see [PRML]).

We can draw an important consequence from Theorem 12, which is that an LP can be solved by enumerating all vertices and picking the best one. As the dimensionality of our search space and number of constraints increase, enumerating all solutions quickly becomes unmanageable. In the following Section, we present the Simplex algorithm developed by George B. Danzig in 1947, which drastically reduces the number of possible optimal solutions that must be checked.

10 The Simplex method

If we know that the optimal solution lies on a vertex, we could simply evaluate the objective function on each of the vertices and just pick the optimum one. Nonetheless, the number of vertices of an LP is typically too large, therefore we need to find a clever way to move towards the optimum vertex.

The Simplex method is an iterative method to efficiently solve an LP. The basic intuition behind the algorithm is depicted in Figure 13. Starting from a vertex in the feasible region, the idea is to move along the edges of the polyhedron until the optimum solution is reached. Each move from one vertex to another shall increase the objective function (in case we have a maximization problem), so that convergence is guaranteed. In other words, the Simplex algorithm maintains a basic feasible solution at every step. Given a basic feasible solution, the method first applies an optimality criterion to test its optimality. If the current solution does not fulfill this condition, the algorithm obtains another solution with a higher value of the objective function (which is closer to the optimum in the case of a maximization problem). Let us now define some more useful concepts.

Figure 13: Idea of the Simplex algorithm. Starting from a vertex of the polyhedron, we move along the edges until we reach the optimum solution.

Two distinct vertices and of are adjacent, if there exist linearly independent inequalities of active at both and .

are adjacent iff there exists such that a set of solutions of is the line segment spanned by and .

Let us assume we start with a solution vertex . While is not optimal, the algorithm finds another vertex adjacent to with , and update . If no vertex can be found, we can assert that the LP is unbounded. This is summarized in Algorithm 1.

  Start with vertex
  while  is not optimal do
     if We find a vertex adjacent to with  then
        Assert that LP is unbounded.
     end if
  end while
Algorithm 1 Basic idea of the Simplex algorithm

As we can see, there are two key aspects to be defined: firstly, how to assert that a vertex is optimal, and secondly, how to find an adjacent vertex with a better cost. Both will be detailed in the next subsections.

10.1 Optimality criteria

Again, let us start by defining some concepts, namely bases and degeneracy.

A subset of the rows-indices of , with and invertible, is called a basis of the LP. If in addition the point is feasible, then is called a feasible basis.

If a vertex is represented by a basis , then . But a vertex can be represented by many bases. Let us consider the LP problem depicted in Figure 14, where . There are 4 constraints in this LP, identified by their coefficients and depicted by green lines. Since we are in a 2D space, each pair of constraints forms a basis for . A possible set of feasible solutions created by constraints and is painted in light green. In total, we have 6 bases that represent , namely, .

Figure 14: LP with 4 constraints identified by their coefficients and depicted by green lines. There are 6 bases that represent , namely, . The half-space of constraints is the area depicted in light green.

An LP is degenerate if there exists an such that there are more than constraints of active at . The LP depicted in Figure 14 is degenerate, since and there are 4 active constraints at .

A basis is optimal if it is feasible and the unique with and satisfies .

If all components of outside of are zero, then we can write the following equality , since all rows of with indices outside of will not contribute to the dot product. Since is invertible, we can then write . From this, two theorems emerge.

If is an optimal basis, then is an optimal solution of the LP.

Suppose the LP is non-degenerate and is a feasible but not optimal basis, then is not an optimal solution.

Basically, for every vertex , we can quickly check if it is an optimal solution by checking if the basis that represents this vertex is optimal or not. The proof of the theorem will help us see how to move closer to the optimal solution.


Let us assume that is a feasible but not optimal basis. We can split the constraints of the LP into active and inactive ones with respect to .


For a unique with , we have that . Since is feasible but not optimal, we know that there will be some for some .

We now compute a such that and . That means is orthogonal to all rows of except the one that represents constraint .

Now we want to move from in the direction given by , as depicted in Figure 15. Let us first take a look at what happens to the objective function if we move along :

Figure 15: (a) Move in the direction as shown in the proof of Theorem 14. (b) Reaching constraint by moving in the direction .

Given and the conditions in which we defined , for which , we can see that . This means that if we move in the direction , we will improve our objective function value.

Let us now consider a given quantity , which represents how much we move along direction . Now we are interested in knowing if the new point is feasible, if it satisfies Equation (14). We can see that the inequations are certainly satisfied:


since the product because of the previous definition . This means that there exists an such that is feasible because it satisfies all inequalities expressed in Equation (14). But the value of the objective function at this new point will be


which is greater than the objective value of , proving this is not an optimal solution. ∎

10.2 Moving to a better neighbor

Now we have an with which we can move from in the direction to a vertex close to the optimum, namely, to a better neighbor. The question now is how large can be. We need to find out how far we can go before we hit a constraint for the first time, because past a constraint, the feasible region ends. This is depicted in Figure 15 where the constraint is represented in orange.

Remember we had constraints in . We denote as the set of indices that represent the constraints that might be hit by , and it is formally defined as


needs to be larger than zero, otherwise we would never hit the constraint . The set of constraints will contain constraints not in basis , since all . There are now two cases:

  1. , which means we can move indefinitely in direction , and therefore the LP is unbounded.

  2. , which means there is a constraint with index which we will hit while moving in the direction , as depicted in Figure 15. Let us now compute the value of for which we hit constraint :


    We know this division can be done because the denominator is greater than zero. The optimal will be the smallest of all the :


    where is the index for which we find . The optimal must be the minimum, because all greater violate at least the constraint , and therefore go out of the feasible region. To know that there is, in fact, a new vertex , which is adjacent to and with higher objective value, we have to prove that defined as


    is a basis. Note that we are incorporating the new constraint and taking out the that did not make our basis optimal (recall that ). Remember that , but not since . This means that is not a linear combination of , proving is a basis. Furthermore, the inequalities are active at , which means is a vertex and in fact adjacent to .

We have seen so far that the concepts of basic feasible solution and feasible basis are interchangeable, therefore we can rewrite the Simplex algorithm in basis notation, as shown in Algorithm 2.

  Start with a feasible basis
  while  is not optimal do
     Let be the index with (remember and )
     Compute with and
     if  then
        Assert that LP is unbounded.
        Let be the index where is attained
     end if
  end while
Algorithm 2 The Simplex algorithm

If the Linear Program is non-degenerate, then the Simplex algorithm terminates.

The idea of the Simplex algorithm is to jump from one base to another (equivalently from vertex to vertex), making sure no base is revisited. We have proven before that when we move in direction from point to , we obtain , which means that we are making progress at each iteration of the Simplex, proving it will eventually terminate.

10.3 The degenerate case: Bland’s pivot rule

The Simplex algorithm as described in Algorithm 2 can be applied to degenerate Linear Programs, but we can encounter the problem of cycling, which is when we move from one basis to another without progress and end up returning to one of the bases we already visited. This means that the algorithm would never terminate. In order to avoid this, we need to carefully choose the indices that are leaving and entering the basis at each iteration, an operation that is called pivoting. In Algorithm 3, we highlight in orange the changes to the Simplex algorithm according to Bland’s pivot rule [bland1977], which allows Simplex to solve degenerate LP.

  Start with a feasible basis
  while  is not optimal do
     Let be the smallest index with ( and )
     Compute with and
     if  then
        Assert that LP is unbounded.
        Let be the smallest index where is attained
     end if
  end while
Algorithm 3 The Simplex algorithm with Bland’s pivot rule

If Bland’s rule is applied, the Simplex algorithm terminates.

For the interested reader, the proof of the theorem can be found in [Schrijver1998].

10.4 Finding an initial vertex

In all descriptions of Simplex in Algorithms 1, 2 and 3, it always starts by choosing a feasible initial vertex or basis. But how do we find this initial vertex? Finding a feasible solution of a Linear Program is almost as difficult as finding an optimal solution. Fortunately, by using a simple technique, we can find a feasible solution of a related auxiliary LP and use it to initialize the Simplex method on our LP. Let us consider our initial LP to be in the standard form 2:


We can split the conditions according to whether has a positive or negative value:


and define a new artificial variable . We now create an auxiliary LP where we minimize the sum of the new artificial variables:


We can show that this auxiliary problem is always feasible, since we can always find an initial feasible solution like , each is bounded by the absolute value of the corresponding component of . It fulfills all conditions of the auxiliary LP of Equation (24), therefore it is a feasible initial vertex. From here, we can apply the Simplex as described in Algorithm 3 to find the optimal solution. If we find an optimal solution with variables which yields the optimal value of the objective function of Equation (24) to be zero, we can assert that the vertex is a feasible solution of the original LP problem in Equation (22). We will then use this initial vertex to start the Simplex algorithm to solve the original LP. On the other hand, if we find that the minimum value of the auxiliary problem is larger than zero, we can assert that the original LP is infeasible.

The final complete description of the Simplex algorithm is found in Algorithm 4. Finding the initial vertex is commonly called Phase I of the Simplex algorithm, while the optimization towards the final solution through pivoting is commonly referred to as Phase II. A hands-on example on how to solve a problem practically with Simplex will be presented in Section 12.

  Create the auxiliary problem of the LP and find the optimal solution with basis and objective function value .
  if  then
      is a feasible basis of the initial LP. Start with the feasible basis .
     while  is not optimal do
        Let be the smallest index with ( and )
        Compute with and
        if  then
           Assert that LP is unbounded.
           Let be the smallest index where is attained
        end if
     end while
     Assert that the LP is infeasible.
  end if
Algorithm 4 The complete Simplex algorithm with Bland’s pivot rule

10.5 Complexity

The Simplex method is remarkably efficient, specially compared to earlier methods such as Fourier-Motzkin elimination. However, in 1972 it was proven that the Simplex method has exponential worst-case complexity [simplexworst]

. Nonetheless, following the observation that the Simplex algorithm is efficient in practice, it has been found that it has polynomial-time average-case complexity under various probability distributions. In order for the Simplex to perform in polynomial time, we have to use certain pivoting rules that allow us to go from one vertex of the polyhedron to another in a small number of steps. We will better understand this concept when we introduce the graphical model representation of a polyhedron in Section


11 The dual Linear Program

In this section, we introduce a very important property of Linear Programs: duality. Given any general optimization problem, or primal problem, we can always convert it to a dual problem. For LPs the dual problem is also an LP. The motivation to use dualization, depicted in Figure 16, is that the dual problem gives us an upper bound on the objective function of the primal problem.

Figure 16: The motivation to find the dual of a problem is to find an upper bound on the objective function of the primal.

As we saw in Section 10, the Simplex algorithm starts from a suboptimal solution and performs gradient ascent to iteratively find solutions with increasing objective value, until the optimum is reached. In the case of dual linear programs, we can find an upper bound and iteratively make it more stringent until it reaches the optimum. It is guaranteed for LPs that the smallest upper bound will correspond to the optimum solution of the primal problem.

Let us consider the following LP:

We can try to find an upper bound on the value of the objective function. One way to do this, is by linearly combining the constraints of the problem, to obtain an expression of the form , where are the coefficients of this linear combination. Let us multiply the first constraint by , the fourth by and sum them up:

Note how we obtained our objective function after the sum, and therefore we can say that is an upper bound. We can also try another combination, summing the fourth constraint and the second multiplied by :

In this case, we obtain an upper bound of , which turns out to be the smallest upper bound and therefore corresponds to the optimum of the objective function.

The general principle to find the dual problem is to multiply each of the constraints by a new positive variable, namely the dual variable and sum the constraints up:

Note that we already used this trick in Section 10.1, with as our new variables. The variables have to be positive in order not to change the inequality sign. Now we want to make this sum equal to our objective function:

which, by the constraints of the primal problem, is upper bounded by

Recall that our objective is to find the smallest upper bound. Let us express this in a matrix notation. To make the notation clearer, we separate the new variables between the ones associated to the constraints of the primal and the ones associated with the implicit positivity constraints, .

We can eliminate by substitution, , obtaining the final equations for the primal and dual problems:


So far, we have seen the relationship between a Linear Program and its dual. This is summarized in the following theorem:

Weak Duality. Consider a Linear Program and its dual . If and are primal and dual feasible respectively, then .

This can be easily seen by the inequalities , the first of which comes from the constraints of the dual problem, and the second from the constraints of the primal provided that .

An even more important theorem is:

Strong Duality. Consider a Linear Program and its dual .If the primal is feasible and bounded, then there exist a primal feasible and a dual feasible with .

This means that with the dual we can find an upper bound that is tight at the optimal solution of the primal. This can be used to prove optimality of primal solutions and, as a consequence, optimality of dual solutions.


The proof of Theorem 11 is divided into two cases.

  1. has full column rank.

    If this is the case, then we can use the Simplex algorithm to obtain an optimal basis . By the optimality of , we know that subject to , and that for all . We then know that the condition is fulfilled, and therefore is dual feasible.

    Now consider that is the current primal solution returned by the Simplex. We can compare the value of the objective function at with the value of the dual objective function at to check if they are, in fact, equal.

  2. .

    First, we need to make sure our constraint matrix has full column rank, which is why we replace the vector of variables with . Now the Linear Program looks like:

    Note, that the new LP will be equivalent to the old one in the sense that any solution will also be a solution of the initial LP with the same objective value. If we consider the new variable to be The constraint matrix that also incorporates the positiveness constraints is: , where is an identity matrix, and the objective function vector is , while the right-hand side term is . The new constraint matrix does have full column rank, since the column vectors are now all independent thanks to the placement of the new identity matrices. We can now use the Simplex algorithm to find a solution. Let us denote the primal solution returned as , while is the dual returned by the Simplex to verify the optimality of the primal solution. Let us write the conditions that should be verified by the dual, taking into account that :

    We have just proven that is dual feasible. Now the Simplex algorithm can check the condition of optimality for the primal solution by verifying that:

    And this proves the theorem, because we have found one possible primal feasible solution and one dual feasible solution whose objective function values coincide.

11.1 Proving optimality and infeasibility

So far we have seen that there is a close relationship between the dual and primal problems and between the dual and primal optimum solutions. But what happens, for example, if the dual problem is infeasible? Let us consider the following example:


If we check the primal problem carefully, we can identify , , and , and therefore the dual problem is defined as shown. Nonetheless, we can quickly see that the dual problem is infeasible, since the conditions set and which means will never be 1. An infeasible dual implies that we cannot determine a bound for the primal. If we take a closer look at the primal problem we see that it is, in fact, unbounded. For any that we choose, if we assign , the problem is feasible and the objective value is , which means the objective function can be maximized to infinity, making the problem unbounded.

We summarize the relationship between primal and dual problems in Table 1.

Primal/Dual Optimal Unbounded Infeasible
Optimal X
Unbounded X
Infeasible X X
Table 1: Possible combinations of properties of the primal and dual problems.

In the first case, by the strong duality Theorem 11, if a primal has an optimal solution, the dual will also have an optimal solution. The second case is when the primal is unbounded. In that case,