Efficient pedestrian detection by directly optimize the partial area under the ROC curve

10/03/2013 ∙ by Sakrapee Paisitkriangkrai, et al. ∙ Apple, Inc. 0

Many typical applications of object detection operate within a prescribed false-positive range. In this situation the performance of a detector should be assessed on the basis of the area under the ROC curve over that range, rather than over the full curve, as the performance outside the range is irrelevant. This measure is labelled as the partial area under the ROC curve (pAUC). Effective cascade-based classification, for example, depends on training node classifiers that achieve the maximal detection rate at a moderate false positive rate, e.g., around 40 method which achieves a maximal detection rate at a user-defined range of false positive rates by directly optimizing the partial AUC using structured learning. By optimizing for different ranges of false positive rates, the proposed method can be used to train either a single strong classifier or a node classifier forming part of a cascade classifier. Experimental results on both synthetic and real-world data sets demonstrate the effectiveness of our approach, and we show that it is possible to train state-of-the-art pedestrian detectors using the proposed structured ensemble learning method.



There are no comments yet.


page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Object detection is one of several fundamental topics in computer vision. The task of object detection is to identify predefined objects in a given images using knowledge gained through analysis of a set of labelled positive and negative exemplars. Viola and Jones’ face detection algorithm

[23] forms the basis of many of the state-of-the-art real-time algorithms for object detection tasks.

The most commonly adopted evaluation method by which to compare the detection performance of different algorithms is the Receiver Operating Characteristic (ROC) curve. The curve illustrates the varying performance of a binary classifier system as its discrimination threshold is altered. In the face and human detection literature researchers are often interested in the low false positive area of the ROC curve since this region characterizes the performance needed for most real-world vision applications. This is due to the fact that object detection is a highly asymmetric classification problem as there are only ever a small number of target objects among the millions of background patches in a single test image. A false positive rate of per scanning window would result in thousands of false positives in a single image, which is impractical for most applications. For many tasks, and particularly human detection, researchers also report the partial area under the ROC curve (pAUC), typically over the range and false positives per image [7]. As the name implies, pAUC is calculated as the area under the ROC curve between two specified false positive rates (FPRs). It summarizes the practical performance of a detector and often is the primary performance measure of interest.

Although pAUC is the metric of interest that has been used to evaluate detection performance, Most classifiers do not directly optimize this evaluation criterion, and as a result, often under-perform. In this paper, we present a principled approach for learning an ensemble classifier which directly optimizes the partial area under the ROC curve, where the range over which the area is calculated may be selected according to the desired application. Built upon the structured learning framework, we thus propose here a novel form of ensemble classifier which directly optimizes the partial AUC score, which we call pAUCEns. As with all other boosting algorithms, our approach learns a predictor by building an ensemble of weak classification rules in a greedy fashion. It also relies on a sample re-weighting mechanism to pass the information between each iteration. However, unlike traditional boosting, at each iteration, the proposed approach places a greater emphasis on samples which have the incorrect ordering111The positive sample has an incorrect ordering if it is ranked below the negative sample. In other words, we want all positive samples to be ranked above all negative samples. to achieve the optimal partial AUC score. The result is the ensemble learning method which yields the scoring function consistent with the correct relative ordering of positive and negative samples and optimizes the partial AUC score in a false positive rate range where .

Main contributions

(1) We propose a new ensemble learning approach which explicitly optimizes the partial area under the ROC curve (pAUC) between any two given false positive rates. The method is of particular interest in the wide variety of applications where performance is most important over a particular range within the ROC curve. The approach shares similarities with conventional boosting methods, but differs significantly in that the proposed method optimizes a multivariate performance measure using structured learning. Our design is simple and a conventional boosting-based visual detector can be transformed into a pAUCEns-based visual detector with very few modifications to the existing code. Our approach is efficient since it exploits both the efficient weak classifier training and the efficient cutting plane solver for optimizing the partial AUC score in the structural SVM setting. (2) We show that our approach is more intuitive and simpler to use than alternative algorithms, such as Asymmetric AdaBoost [22] and Cost-Sensitive AdaBoost [14], where one needs to cross-validate the asymmetric parameter from a fixed set of discrete points. Furthermore, it is unclear how one would set the asymmetric parameter in order to achieve a maximal pAUC score for a specified false positive range. To our knowledge, our approach is the first principled ensemble method that directly optimizes the partial AUC in an arbitrary false positive range . (3) Experimental results on several data sets, especially on challenging human detection data sets, demonstrate the effectiveness of the proposed approach. Our pedestrian detector performs better than or on par with the state-of-the-art, despite the fact that our detector only uses two standard low-level image features.

Related work

Various ensemble classifiers have been proposed in the literature. Of these AdaBoost is one the most well known as it has achieved tremendous success in computer vision and machine learning applications. In object detection, the cost of missing a true target is often higher than the cost of a false positive. Classifiers that are optimal under the symmetric cost, and thus treat false positives and negatives equally, cannot exploit this information. Several cost sensitive learning algorithms, where the classifier weights a positive class more heavily than a negative class, have thus been proposed.

Viola and Jones introduced the asymmetry property in Asymetric AdaBoost (AsymBoost) [22]

. However, the authors reported that this asymmetry is immediately absorbed by the first weak classifier. Heuristics are then used to avoid this problem. Peng proposed a fully-corrective asymmetric boosting method which does not have this problem

[25]. Note that one needs to carefully cross-validate the asymmetric parameter in order to achieve the desired result. Masnadi-Shirazi and Vasconcelos [14] proposed a cost-sensitive boosting algorithm based on the statistical interpretation of boosting. Their approach is to optimize the cost-sensitive loss by means of gradient descent. Shen proposed LACBoost and FisherBoost to address this asymmetry issue in cascade classifiers [20]. Most works along this line address the pAUC evaluation criterion indirectly. In addition, one needs to carefully cross-validate the asymmetric parameter in order to maximize the detection rate in a particular false positive range.

Several algorithms that directly optimize the pAUC score have been proposed in bioinformatics [9, 11]. Komori and Eguchi optimize the pAUC using boosting-based algorithms [11]. This algorithm is heuristic in nature. Narasimhan and Agarwal develop a structural SVM based method which directly optimizes the pAUC score [16]

. They demonstrate that their approach, which uses a support vector method, significantly outperforms several existing algorithms, including pAUCBoost

[11] and asymmetric SVM [28]. Building on Narasimhan and Agarwal’s work, we propose the principled fully-corrective ensemble method which directly optimizes the pAUC evaluation criterion. The approach is flexible and can be applied to an arbitrary false positive range . To our knowledge, our approach is the first principled ensemble learning method that directly optimizes the partial AUC in a false positive range not bounded by zero. It is important to emphasize here the difference between our approach and that of [16]. [16] train a linear structural SVM while our approach learns the ensemble of classifiers. For pedestrian detection, HOG with the ensemble of classifiers reduces the average miss-rate over HOG+SVM by more than [2].


Bold lower-case letters, , , denote column vectors and bold upper-case letters, , , denote matrices. Let be the set of positive training data and be the set of negative training data. A set of all training samples can be written as where and . We denote by a set of all possible outputs of weak learners. Assuming that we have possible weak learners, the output of weak learners for positive and negative data can be represented as where and , respectively. Here is the label predicted by the weak learner on the positive training data . Each column of the matrix represents the output of all weak learners when applied to the training instance . Each row of the matrix represents the output predicted by the weak learner on all the training data. The goal is to learn a set of binary weak learners and a scoring function, , that has good performance in terms of the pAUC between some specified false positive rates and where .

Structured learning approach for optimizing pAUC

Before we propose our approach, we briefly review the concept of SVM [16], in which our ensemble learning approach is built upon. Unless otherwise stated, we follow the symbols used in [16]. The area under the empirical ROC curve (AUC) can be defined as,


and the partial AUC in the false positive range can be written as [5, 16],


where , , denotes the negative instance in ranked in the -th position amongst negative samples in descending order of scores. , and correspond to the sum of detection rates at , , and , respectively.

Given a training sample , our objective is to find a linear function that optimizes the pAUC in an FPR range of . We cast this pAUC optimization problem as a structural learning task. For any ordering of the training instances, the relative ordering of positive instances and negative instances is represented via a matrix where,


We define the correct relative ordering of as where . The pAUC loss in the false positive range of with respect to can be written as,


where denotes the index of the negative instance consistent with the matrix . We define the joint feature map of the form


The choice of over guarantees that the variable , which optimizes , will also produce the scoring function that achieves the optimal partial AUC score. The above problem can be summarized as the following convex optimization problem [16]:


and . Note that denote the correct relative ordering and denote any arbitrary orderings.

2 Our approach

In order to design an ensemble-like algorithm for the pAUC, we first introduce a projection function, , which projects an instance vector to . This projection function is also known as the weak learner in boosting. In contrast to the previously described structured learning, we learn the scoring function, which optimizes the area under the curve between two false positive rates of the form: where is the linear coefficient vector and denote a set of binary weak learners. Let us assume that we have already learned a set of all projection functions. By using the same pAUC loss, , as in (1), and the same feature mapping, , as in (5), the optimization problem we want to solve is:


and . is the projected output for positive and negative training samples. where and it is defined as,


The only difference between (6) and (7) is that the original data is now projected to a new non-linear feature space. We will show how this can further improved the pAUC score in the experiment section. The dual problem of (7) can be written as (see supplementary),


where is the dual variable, denotes the dual variable associated with the inequality constraint for and . To derive the Lagrange dual problem, the following KKT condition is used,


Finding best weak learners

In this section, we show how one can explicitly learn the projection function, . We use the idea of column generation to derive an ensemble-like algorithm similar to LPBoost [4]. The condition for applying the column generation is that the duality gap between the primal and dual problem is zero (strong duality). By inspecting the KKT condition, at optimality, (10) must hold for all . In other words, must hold for all .

For the weak learner in the current working set, the corresponding condition in (10) is satisfied by the current solution. For the weak learner that are not yet selected, they do not appear in the current restricted optimization problem and the corresponding . It is easy to see that if for any that are not in the current working set, then the current solution is already the globally optimal one. Hence the subproblem for selecting the best weak learner is:


In other words, we pick the weak learner with the value most deviated from zero. At iteration , we pick the most optimal weak learner from . Substituting (8) into (11), the subproblem for generating the optimal weak learner at iteration can be defined as,


where , , index the positive training samples (), the negative training samples () and the entire training samples (,,), respectively. Here


For decision stumps, the last equation in (2) is always valid since the weak learner set is negation-closed [12]. In other words, if , then , and vice versa. Here . For decision stumps, one can flip the inequality sign such that and . In fact, any linear classifiers of the form are negation-closed. Using (2) to choose the best weak learner is not heuristic as the solution to (11) decreases the duality gap the most for the current solution. See supplementary for more details.

Optimizing weak learners’ coefficients

We solve for the optimal that minimizes our objective function (7). However, the optimization problem (7) has an exponential number of constraints, one for each matrix . As in [16, 10], we use the cutting plane method to solve this problem. The basic idea of the cutting plane is that a small subset of the constraints are sufficient to find an -approximate solution to the original problem. The algorithm starts with an empty constraint set and it adds the most violated constraint set at each iteration. The QP problem is solved using linear SVM and the process continues until no constraint is violated by more than

. Since, the quadratic program is of constant size and the cutting plane method converges in a constant number of iterations, the major bottleneck lies in the combinatorial optimization (over

) associated with finding the most violated constraint set at each iteration. Narasimhan and Agarwal show how this combinatorial problem can be solved efficiently in a polynomial time [16]. We briefly discuss their efficient algorithm in this section.

The combinatorial optimization problem associated with finding the most violated constraint can be written as,




The trick to speed up (14) is to note that any ordering of the instances that is consistent with yields the same objective value, in (15). In addition, one can break down (14) into smaller maximization problems by restricting the search space from to the set where

Here represents the set of all matrices in which the ordering of the scores of two negative instances, and , is consistent. The new optimization problem is now easier to solve as the set of negative instances over which the loss term in (15) is computed is the same for all orderings in the search space. This simplification allows one to reduce the computational complexity of (15) to . Interested reader may refer to [16].

Algorithm 1 The training algorithm for pAUCEns.


Our final ensemble classifier has a similar form as the AdaBoost-based object detector of [23]. Based on Algorithm LABEL:ALG:main, step ① and ② of our algorithm are exactly the same as [23]. Similar to AdaBoost, in step ① plays the role of sample weights associated to each training sample. The major difference between AdaBoost and our approach is in step ③ and ④  where the weak learner’s coefficient is computed and the sample weights are updated. In AdaBoost, the weak learner’s coefficient is calculated as where and is the indicator function. The sample weights are updated with . We point this out here since a minimal modification is required in order to transform the existing implementation of AdaBoost to pAUCEns. Given the existing code of AdaBoost and the publicly available implementation of [16], our pAUCEns can be implemented in less than lines of codes. A computational complexity analysis of our approach can be found in the supplementary.

In the next section, we train two different types of classifiers: the strong classifier [6] and the node classifier [23, 27]. For the strong classifier, we set the value of and based on the evaluation criterion. For the node classifier, we set the value of and in each node to be and , respectively.

3 Experiments

Figure 1: Decision boundaries on the toy data set where each strong classifier consists of Top row: weak classifiers and Bottom row: weak classifiers. Positive and negative data are represented by and , respectively. The partial AUC score in the FPR range is also displayed. Our approach achieves the best pAUC score of and at and weak classifiers, respectively. At weak classifiers, we observe that both traditional and asymmetric classifiers start to perform similarly.
Figure 2: Decision boundaries on a toy data set with weak classifiers at FPR of The partial AUC score and detection rate at false positive rate are also shown. Our approach performs best on both evaluation criteria. Our approach preserves a larger decision boundary near positive samples at , , and angles.

Synthetic data set

We first illustrate the effectiveness of our approach on a synthetic data set similar to the one used in [22]. We compare pAUCEns against the baseline AdaBoost, Cost-Sensitive AdaBoost (CS-AdaBoost) [14] and Asymmetric AdaBoost (AsymBoost) [22]. We use vertical and horizontal decision stumps as the weak classifier. We evaluate the partial AUC score of different algorithms at FPRs. For each algorithm, we train a strong classifier consisting of and weak classifiers. Additional details of the experimental set-up are provided in the supplementary. Fig. 1 illustrates the boundary decision222We set the threshold such that the false positive rate is . and the pAUC score. Our approach outperforms all other asymmetric classifiers. We observe that pAUCEns places more emphasis on positive samples than negative samples to ensure the highest detection rate at the left-most part of the ROC curve (FPR ). Even though we choose the asymmetric parameter, , from a large range of values, both CS-AdaBoost and AsymBoost perform slightly worse than our approach. AdaBoost performs worst on this toy data set since it optimizes the overall classification accuracy. However as the number of weak classifiers increases ( stumps), we observe all algorithms perform similarly on this simple toy data set. This observation could explain the success of AdaBoost in many object detection applications even though AdaBoost only minimizes the symmetric error rate.

In the next experiment, we train a strong classifier of weak classifiers and compare the performance of different classifiers at FPR of . We choose this value since it is the node learning goal often used in training a cascade classifier. Also we only learn weak classifiers since the first node of the cascade often contains a small number of weak classifiers for real-time performance. For pAUCEns, we set the value of to be . In Fig. 2, we display the decision boundary of each algorithm, and display both their pAUC score (in the FPR range ) and detection rate at false positive rate. We observe that our approach and AsymBoost have the highest detection rate at false positive rate. However, our approach outperforms AsymBoost on a pAUC score. We observe that our approach places more emphasis on positive samples near the corners (at , , and angles) than other algorithms.

pAUC(, )
Ours (pAUCEns)
SVM [16]
pAUCBoost [11]
Asym SVM [28]
SVM [10]
Table 1: The pAUC score on Protein-protein interaction data set. The higher the pAUC score, the better the classifier. The result reported here is better than the one reported in [16]. We suspect that we tuned the regularization parameter in the finer range. Results marked by were reported in [16]. The best classifier is shown in boldface.

Protein-protein interaction prediction

In this experiment, we compare our approach with existing algorithms which optimize pAUC in bioinformatics. The problem we consider here is a protein-protein interaction prediction [18], in which the task is to predict whether a pair of proteins interact or not. We used the data set labelled ‘Physical Interaction Task in Detailed feature type’, which is publicly available on the internet333http://www.cs.cmu.edu/~qyj/papers_sulp/proteins05_pages/feature-download.html. The data set contains protein pairs known to be interacting (positive) and a random set of protein pairs labelled as non-interacting (negative). We use a subset of features as in [16]. We randomly split the data into two groups: for training/validation and for evaluation. We choose the best regularization parameter form , , , , by -fold cross validation. We repeat our experiments times using the same regularization parameter. We train a linear classifier as our weak learner using LIBLINEAR [8]. We set the maximum number of boosting iterations to and report the pAUC score of our approach in Table 1. Baselines include SVM, SVM, pAUCBoost and Asymmetric SVM. Our approach outperforms all existing algorithms which optimize either AUC or pAUC . We attribute our improvement over SVM [16], as a result of introducing a non-linearity into the original problem. This phenomenon has also been observed in face detection as reported in [27].

Comparison to other asymmetric boosting

Here we compare pAUCEns against several boosting algorithms previously proposed for the problem of object detection, namely, AdaBoost with Fisher LDA post-processing [27], AsymBoost [22] and CS-AdaBoost [14]. The results of AdaBoost are also presented as the baseline. For each algorithm, we train a strong classifier consisting of weak classifiers. We then calculate the pAUC score by varying the threshold value in the FPR range . For each algorithm, the experiment is repeated times and the average pAUC score is reported. For AsymBoost, we choose from , , , by cross-validation. For CS-AdaBoost, we choose from , , , by cross-validation. We evaluate the performance of all algorithms on

vision data sets: USPS digits, scenes and face data sets. See supplementary for more details on feature extraction. We report the experimental results in Table 

2. From the table, pAUCEns demonstrates the best performance on all three vision data sets.

Ours () () ()
(pAUCEns) () () ()
() () ()
AdaBoost () () ()
[23] () () ()
() () ()
Ada + LDA () () ()
[27] () () ()
() () ()
AsymBoost () () ()
[22] () () ()
() () ()
CS-AdaBoost () () ()
[14] () () ()
() () ()
Table 2:

Average pAUC scores and their standard deviations on vision data sets at various boosting iterations. All experiments are repeated

times. The best average performance is shown in boldface.

Pedestrian detection - Strong classifier

We evaluate our approach on the pedestrian detection task. We train our approach on the INRIA pedestrian data set. For the positive training data, we use all INRIA cropped pedestrian images. To generate the negative training data, we first train the cascade classifier with nodes using Viola and Jones’ approach. We then combine random negative windows generated in the first node with another negative windows generated in the subsequent nodes. The resulting negative windows are used for training the strong classifier. We generate a large pool of features by combining the histogram of oriented gradient (HOG) features [3] and covariance (COV) features444Covariance features capture the relationship between different image statistics and have been shown to perform well in our previous experiments. However, other discriminative features can also be used here instead, , Haar-like features, Local Binary Pattern (LBP) [15], Sketch Tokens [13] and self-similarity of low-level features (CSS) [24]. [21]. Additional details of HOG and COV parameters are provided in the supplementary. We use weighted linear discriminant analysis (WLDA) as weak classifiers. We train weak classifiers and set multi-exits [17]. To be more specific, we set the threshold at , , , and weak classifiers. These exits reduce the evaluation time during testing significantly. The regularization parameter is cross-validated from . Since we have not carefully cross-validated a finer range of , tuning this parameter could yield a further improvement. The training time of our approach is under two hours on a parallelized quad core Xeon machine.

During evaluation, each test image is scanned with

pixels step stride and the scale ratio of input image pyramid is

. The overlapped detection windows are merged using the greedy non-maximum suppression strategy as introduced in [6]. We use the continuous AUC evaluation software of Sermanet [19] and report the pAUC score between FPPI ( false positive), FPPI ( false positives), FPPI ( false positives) and FPPI ( false positives) in Table 3. From the table, we observe that setting the value of to be minimal () yeilds the best pAUC score at FPPI . As we increase the FPPI range, the higher value of tends to perform better. This table clearly illustrates the advantage of our approach.

TrainTest (FPPI)
Table 3: The pAUC score in the FPR range on the training set. Our objective here is to optimize the area under the curve between , , and FPPI on the INRIA test set. Since we plot FPPI versus miss rate, a smaller pAUC score means a better detector. The best detector at each FPPI range is shown in boldface. Clearly, a large value of is best for a large FPPI range.
Figure 3: ROC curves of our approach and several state-of-the-art detectors on the INRIA test image. We train a strong classifier using HOG and COV features.

Fig. 3 compares the performance of our approach with other state-of-the-art algorithms on the INRIA pedestrian data set. We use the evaluation software of Dollár [7], which computes the AUC from discrete points sampled between FPPI . Our approach performs second best on this data set. It performs comparable to VeryFast [1] which trains multiple detectors at multiple scale. Upon a closer observation, our pAUCEns performs slightly better than VeryFast when the number of FPPI is less then and VeryFast performs slightly better when the number of FPPI is greater . We evaluate our strong classifier on TUD-Brussels and ETH pedestrian data sets but we observe that the detection results contain a large number of false positives. Instead of bootstrapping with more negative samples as in [6, 24], we train a cascade classifier in the next section.

Figure 4: From top to bottom: performance on INRIA, TUD-Brussels and ETH test set. Algorithms are sorted using the partial AUC score in the FPPI range . Our pAUCEns consistently performs comparable to the state-of-the-art.

Pedestrian detection - Cascade classifier

In this section, we train a cascade classifier using our pAUCEns. We train our detector on INRIA training set and evaluate the detector on INRIA, TUD-Brussels and ETH test sets. On both TUD-Brussels and ETH data sets, we upsample the original image to pixels before applying our pedestrian detector. We train the human detector with a combination of HOG and COV features as previously described. To achieve the node learning goal of the cascade (each node achieves an extremely high detection rate () and a moderate false positive rage ()), we optimize the pAUC in the FPR range . We train a multi-exit cascade [17] with exit. In this experiment, we use the software of [19] to compute the continuous AUC score in the FPPI range . We sort different algorithms based on the pAUC score in the FPPI range and report the results in Fig. 4. We compare our proposed approach with the baseline HOGCOV classifier (using AdaBoost). We observe that our approach reduces the average miss-rate over HOGCOV by on INRIA test set. From Fig. 4, our approach achieves similar performance to the state-of-the-art detector. We then break-down experimental results of different measures using the partial AUC score (FPPI range ) in Table 4. On average, our approach performs best on the large evaluation setting where pedestrians are at least pixels tall. On other settings, our approach yields competitive results to the state-of-the-art detector in that category. In summary, our approach performs better than or on par with the state-of-the-art despite its simplicity (in comparison to LatSvm — a part-based approach which models unknown parts as latent variables). In addition, the current detector is only trained with two discriminative visual features (HOG and COV). Applying additional discriminative features, , LBP [26] or motion features [24], could further improve the overall detection performance.

Ours ChnFtrs ConvNet CrossTalk FPDW FeatSynth FtrMine HOG HikSvm HogLbp LatSvm-V1 LatSvm-V2 MultiFtr Pls PoseInv Shapelet VJ VeryFast
Reasonable (min. 50 pixels tall & no/partial occlusion) - Partial AUC(0,0.1)%
INRIA-Fixed 27.4 31.6 31.9 26.9 30.9 49.3 71.5 59.4 53.9 50.5 62.6 29.1 51.3 50.4 89.4 93.0 83.2 23.9
TudBrussels 65.8 72.2 77.8 68.8 77.0 - - 87.9 92.3 91.3 95.5 80.8 81.6 82.6 94.5 97.8 97.4 -
ETH 62.8 72.4 62.8 67.0 74.5 - - 78.1 85.5 67.4 86.1 61.4 73.1 69.5 98.1 97.3 95.4 68.7
Large (min. 100 pixels tall) - Partial AUC(0,0.1)%
INRIA-Fixed 26.0 29.6 27.6 25.0 28.7 48.6 70.9 59.0 53.1 49.5 61.6 25.9 50.0 49.3 89.4 92.9 82.9 21.7
TudBrussels 47.3 50.0 49.5 52.8 53.8 - - 88.4 85.1 73.9 88.9 67.9 71.4 66.8 94.2 92.8 96.3 -
ETH 42.9 57.6 48.1 48.6 62.7 - - 56.9 66.0 54.2 74.7 45.5 59.4 50.8 96.3 93.4 92.0 48.6
Near (min. 80 pixels tall) - Partial AUC(0,0.1)%
INRIA-Fixed 25.9 30.1 30.8 25.4 29.5 48.3 70.9 58.5 53.0 49.4 62.0 27.6 50.3 49.5 89.2 92.9 82.8 22.7
TudBrussels 49.0 59.1 60.1 58.7 62.1 - - 87.1 88.0 79.5 91.7 70.6 74.5 75.1 93.9 95.1 96.3 -
ETH 51.3 64.8 51.8 55.9 66.5 - - 69.2 74.2 56.7 77.5 51.0 63.2 60.9 97.8 95.2 93.6 55.4
Medium (min. 30 pixels tall and max. 80 pixels tall) - Partial AUC(0,0.1)%
INRIA-Fixed 100.0 100.0 54.9 96.5 100.0 100.0 100.0 100.0 100.0 94.6 88.8 96.5 94.3 100.0 96.5 96.5 89.6 51.3
TudBrussels 75.3 78.0 84.6 74.0 81.3 - - 86.4 93.3 97.0 96.1 85.7 83.7 86.3 94.0 98.6 97.3 -
ETH 67.2 64.8 76.0 65.0 67.1 - - 69.8 80.9 78.7 88.3 76.7 68.6 67.0 96.3 89.8 89.0 73.2
Table 4: Performance comparison of various detectors on several pedestrian test sets. The best detector in each category from each data set is highlighted in bold. The AUC score is taken over the FPPI range . A smaller pAUC score means a better detector. The AUC score over the FPPI range can be found in the supplementary.

4 Conclusion

We have proposed a new ensemble learning method for object detection. The proposed approach is based on optimizing the partial AUC score in the FPR range . Extensive experiments demonstrate the effectiveness of the proposed approach in visual detection tasks. We plan to explore the possibility of applying the proposed approach to the multiple scales detector of [1] in order to improve the detection results of very low resolution pedestrian images.


  • [1] R. Benenson, M. Mathias, R. Timofte, and L. V. Gool. Pedestrian detection at 100 frames per second. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2012.
  • [2] R. Benenson, M. Mathias, T. Tuytelaars, and L. V. Gool. Seeking the strongest rigid detector. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2013.
  • [3] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., volume 1, 2005.
  • [4] A. Demiriz, K. Bennett, and J. Shawe-Taylor. Linear programming boosting via column generation. Mach. Learn., 46(1–3):225–254, 2002.
  • [5] L. E. Dodd and M. S. Pepe.

    Partial auc estimation and regression.

    Biometrics, 59(3):614–623, 2003.
  • [6] P. Dollár, Z. Tu, P. Perona, and S. Belongie. Integral channel features. In Proc. of British Mach. Vis. Conf., 2009.
  • [7] P. Dollár, C. Wojek, B. Schiele, and P. Perona. Pedestrian detection: An evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell., 34(4):743–761, 2012.
  • [8] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. J. Mach. Learn. Res., 9:1871–1874, 2008.
  • [9] M.-J. Hsu and H.-M. Hsueh. The linear combinations of biomarkers which maximize the partial area under the roc curves. Comp. Stats., 28(2):1–20, 2012.
  • [10] T. Joachims, T. Finley, and C.-N. J. Yu. Cutting-plane training of structural svms. Mach. Learn., 77(1):27–59, 2009.
  • [11] O. Komori and S. Eguchi. A boosting method for maximizing the partial area under the roc curve. BMC Bioinformatics, 11(1):314, 2010.
  • [12] O. Komori and S. Eguchi.

    Boosting learning algorithm for pattern recognition and beyond.

    IEICE Trans. Infor. and Syst., 94(10):1863–1869, 2011.
  • [13] J. J. Lim, C. L. Zitnick, and P. Dollar. Sketch Tokens: A learned mid-level representation for contour and object detection. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2013.
  • [14] H. Masnadi-Shirazi and N. Vasconcelos. Cost-sensitive boosting. IEEE Trans. Pattern Anal. Mach. Intell., 33(2):294–309, 2011.
  • [15] Y. Mu, S. Yan, Y. Liu, T. Huang, and B. Zhou. Discriminative local binary patterns for human detection in personal album. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., Anchorage, AK, US, 2008.
  • [16] H. Narasimhan and S. Agarwal. A structural svm based approach for optimizing partial auc. In Proc. Int. Conf. Mach. Learn., 2013.
  • [17] M.-T. Pham, V.-D. D. Hoang, and T.-J. Cham. Detection with multi-exit asymmetric boosting. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2008.
  • [18] Y. Qi, Z. Bar-Joseph, and J. Klein-Seetharaman. Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Proteins: Struct., Func., and Bioinfor., 63(3):490–500, 2006.
  • [19] P. Sermanet, K. Kavukcuoglu, S. Chintala, and Y. LeCun. Pedestrian detection with unsupervised multi-stage feature learning. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2013.
  • [20] C. Shen, P. Wang, S. Paisitkriangkrai, and A. van den Hengel. Training effective node classifiers for cascade classification. Int. J. Computer Vision, 103(3):326–347, 2013.
  • [21] O. Tuzel, F. Porikli, and P. Meer. Pedestrian detection via classification on Riemannian manifolds. IEEE Trans. Pattern Anal. Mach. Intell., 30(10):1713–1727, 2008.
  • [22] P. Viola and M. Jones. Fast and robust classification using asymmetric AdaBoost and a detector cascade. In Proc. Adv. Neural Inf. Process. Syst., pages 1311–1318. MIT Press, 2002.
  • [23] P. Viola and M. J. Jones. Robust real-time face detection. Int. J. Comp. Vis., 57(2):137–154, 2004.
  • [24] S. Walk, N. Majer, K. Schindler, and B. Schiele. New features and insights for pedestrian detection. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., San Francisco, US, 2010.
  • [25] P. Wang, C. Shen, N. Barnes, and H. Zheng. Fast and robust object detection using asymmetric totally-corrective boosting.

    IEEE Trans. Neural Networks and Learning Systems

    , 23(1):33–46, 2012.
  • [26] X. Wang, T. X. Han, and S. Yan. An HOG-LBP human detector with partial occlusion handling. In Proc. IEEE Int. Conf. Comp. Vis., 2009.
  • [27] J. Wu, S. C. Brubaker, M. D. Mullin, and J. M. Rehg. Fast asymmetric learning for cascade face detection. IEEE Trans. Pattern Anal. Mach. Intell., 30(3):369–382, 2008.
  • [28] S.-H. Wu, K.-P. Lin, C.-M. Chen, and M.-S. Chen.

    Asymmetric support vector machines: low false-positive learning under the user tolerance.

    In Proc. of Intl. Conf. on Knowledge Discovery and Data Mining, 2008.