The main goal of vision-based road detection is detecting traversable road areas ahead of an ego-vehicle using an on-board camera. Detecting the course of the road is a key component for the future development of driver assistance systems and autonomous driving . Road detection using a monocular color camera is challenging since algorithms must deal with continuously changing background, the presence of different objects like vehicles and pedestrians, different road types (urban, highways, country side) and varying illumination and weather conditions. Moreover, these algorithms should be executed in real-time.
A common road detection approach consists of analyzing road homogeneity to group pixels into road and background areas by training a classifier based on road and non-road examples. However, the large diversity of non-road areas and the lack of annotated datasets hinder sampling these classes to create a comprehensive representation . This has motivated the development of algorithms that perform using only information from the current image [3, 4, 5, 6]. These algorithms are usually referred as online road detection algorithms. The core of these algorithms is a single class classifier  trained on a small set of road (positive) examples collected, for instance, from the bottom part of the image being analyzed. Therefore, these algorithms do not require samples of the background class. These algorithms are highly adaptive to new road appearance and hence suited to deal with constantly changing conditions that may occur in real driving scenarios (Fig. 1). In addition, to fit the real time constraints, these algorithms usually represent pixel values using simple (fast) features such as color [3, 8, 9, 10] or texture [5, 11]. Color offers many advantages over texture since texture varies with the speed of the vehicle and with the distance to the camera due to perspective effects. However, using color cues is a challenging task due to the varying photometric conditions in the acquisition process (e.g., illumination variations or different weather conditions). Therefore, in this paper, we focus on evaluating online road detection algorithms by using different single-class classification methods performing on most common color representations for online road detection. These different color representations are evaluated on their robustness to varying imaging conditions and their discriminative power. Moreover, a road dataset is provided with ground truth to enable large scale experiments for road detection. The dataset and the annotation tool are made publicly available to the community at http://scrd.josemalvarez.net/.
Hence, the contribution of this paper is two fold. First, we provide a dataset to the community of on-board images for road detection. The dataset consists of more than seven hundred manually annotated images acquired at different daytime and weather conditions in real world driving situations (urban scenarios, highways and secondary roads). Second, we present a comprehensive evaluation of existing single-class classifiers using different color representations. To this end, we devise a simple two stage algorithm for road detection for a single image. In the first stage, the input image is converted to different color representations and the output is used for a single class classifier to provide a per-pixel confidence corresponding to the probability of a pixel belonging to the road. The classifier is trained using road pixels under the only assumption that a ROI in the bottom part of the image belongs to the road surface, see Fig.2.
The rest of this paper is organized as follows. First, in Sect. II related work on road detection is reviewed. Then, in Sect. III, we introduce the road detection algorithms and the survey of color models and single class-classifiers. The road dataset and the annotation tool are introduced in Sect. IV. Experiments are presented in Sect. V. Finally, in Sect. VI, conclusions are drawn.
Ii Related Work
Common road detection methods analyze road homogeneity by grouping pixels into road and background areas by training a classifier based on road/non-road samples. However, the large diversity of non-road areas and the lack of annotated datasets has motivated the development of online detection algorithms [3, 4, 5, 12, 6, 10]. The core of these algorithms is a single class classifier trained on a small set of positive examples collected from the bottom part of the image. Therefore, these algorithms do not require examples of the background class. In addition, these algorithms represent pixel values using simple (fast) features such as color [3, 8] or texture [5, 12] to be able to perform in real-time. Color offers many advantages over texture since texture varies with the distance to the camera. Color provides powerful information about the road independent of the shape of the road or perspective effects. However, using color cues is a challenging task due to the varying photometric conditions in the acquisition process. Different color planes exhibiting different invariant properties have been used to reduce the influence of these photometric variations. Color spaces derived from data that have proved to be, to a certain extent, robust to lighting variations are [3, 13], normalized , CIE-  or their combination [15, 16]. More recently, color constancy has also been used in  to minimize the influence of lighting variations. Algorithms embed these color representations in complex systems that use inference methods (CRF), post-processing steps and constraints such as temporal coherence [4, 17] or road shape restrictions . Therefore, is difficult to compare and, more importantly, it is difficult to analyze separately the different color representations to deal with illumination changes within the road detection context.
Iii Online Road Detection Algorithm
In this section, we present a simple framework for online road detection. The algorithm, depicted in Fig. 2, is devised for still images and consists of two stages: color conversion (Sect. III-A) and pixel classification (Sect. III-B). In short, this algorithm performs as follows: pixel values are converted to a preferred color representation and then used as input to the classification stage. The second stage is a single class classifier that considers only road samples collected from the bottom part of the image. Thus, the algorithm is based on the assumption that the bottom region of the image belongs to the road class. This area usually corresponds to a distance of about four meters ahead the camera and it is a reasonably assumption when the car is on the road. The output of the classifier is a road likelihood showing the probability of each pixel of belonging to the road class defined as for a test image of pixels. This likelihood ranges from to in which the higher the likelihood, the higher the probability of being a road pixel. State of the art algorithms build upon this road likelihood to obtain the traversable road area incorporating post-processing steps such as connected components , temporal coherence [4, 17], shape restrictions  or even conditional random fields results in robustified algorithms . In this paper, for fair comparison, we use a simple threshold to assign pixel labels: if , the i-th pixel is labeled by a road label. Otherwise, a background label is assigned.
Iii-a Color Conversion
The first stage is the color conversion process to represent pixel values by different color models. Algorithms have exploited several invariant/sensitive properties of existing color spaces to reduce the influence of lighting variations in outdoor scenes. In this paper, we analyze the performance of five different device independent color spaces: and the following four other spaces (Table I): normalized , opponent color space , and the CIE- space. Each of these color spaces have different properties as summarized in Table II. For instance, color channels , , or provide high discriminative power but limited invariance to shadows and lighting variations. On the other hand, using hue or saturation to represent pixels provide higher invariance but lower discriminative power. Three of the color spaces consider separating the luminance and the chrominance into different signals. For instance, in the color space, the channel provides discriminative power while and components provide different levels of invariance. Similarly, the opponent color space comprises the luminance component and two channels providing chromaticity information. As a result, these color representations are uncorrelated and provide diversified color information.
Iii-B Pixel Classification
The second stage of the algorithmic pipeline takes converted pixel values as input and outputs a pixel-level road likelihood based on one-class classification. One-class classification corresponds to the problem of distinguishing the target class from all other possible classes which are considered as non-targets or outliers. We assume that only examples of the target class are available for training. This is because it is assumed that non-target samples are not present or not properly sampled. In fact, binary classifiers relying on training samples from both classes are not considered as they can not create a boundary between the two classes during the training process. We consider one class classifiers as they characterize the target class and then, given a test sample, decide whether it belongs or not to that class. As a consequence, one class classifiers assume that a well-sampled training set of the target objects is available. Ideally, the model description of the target class should be large enough to accept most of the new target samples and yet selective to reject outliers. However, in online road detection, collecting road samples is an ill-posed problem since the knowledge of the road class is deduced from a finite (and small) set of training samples (in our case, collected in a unsupervised manner from the bottom part of the image). Hence, the additional problem arises of having a poorly sampled target class (we do not have a sufficient number of samples of the target class) leading to ill-posed representations and distributions.
|Global Illumination changes||-||-||-||-||-||+||+||-||+||+||+|
|Distance to the camera||+||+||+||+||+||+||+||+||+||+||+|
|Road type and shape||+||+||+||+||+||+||+||+||+||+||+|
). c) Distribution of road pixels within the training area in the same color space. d) Joint distribution of the training data. e) and f) Gaussian and Mixture of Gaussian representations of training data. Bottom row shows classifiers based on superpixels for training: g) Support vector descriptor; h) k-centers; i) k-means and j) Linear programming data descriptor. As shown, using the centroid of superpixels reduces the variance in the input data. Detailed description of these classifiers can be found in Sect.III-B.
One-class classifiers can be divided in three groups 
: density based, reconstruction based and boundary methods. Density based methods aim at modeling the probability density function of the target class using training data. Reconstruction and boundary based methods avoid the explicit estimation of the probability density function. The former is based on assumptions of the underlying structure of the data. The latter aims at defining the boundaries that encloses all the elements from the target class (in the training set).
In the rest of this section, we briefly review most of the promising one-class classification algorithms. First, we focus on five different density methods: model-based (histograms), nearest-neighbors, single Gaussian, robustified Gaussian and mixture of Gaussians. Then, two reconstruction methods are discussed such as the
-means and Principal Component Analysis algorithms. Finally, seven boundary methods are outlined: nearest-neighbor, k-centers, linear data description, support vector description, min-max probability and minimum spanning tree method. The evolution of these methods for a given road image is shown in Fig.3
. This is a non-parametric classifier that uses a likelihood measure to approximate the conditional probability of having a road pixel given a pixel value. This probability distribution is estimated for each image using the training samples. In particular, we use the normalized histogram of training samples. Therefore, the road likelihood is given by, where is the normalized histogram. The higher the likelihood value, the higher the potential of being a road pixel.
Single Gaussian (G)
. This classifier models road training samples using a unique Gaussian distribution. The road likelihood for the i-th pixel is obtained by, where is the pixel value and , are the parameters of the Gaussian distribution learned using the training samples. In practice, to avoid numerical instabilities, we do not estimate the density. Instead, we use the Mahalanobis distance as follows: , where is the covariance matrix estimated using the training set.
Robustified Gaussian (RG). The single Gaussian classifier is sensitive to outliers and noise in the training samples. In our case, these outliers are long tails in the distribution mainly due to lighting conditions or different road appearances as shown in Fig. 4. Therefore, the robustified Gaussian classifier is based on a single Gaussian where the parameters are learned using robust statistics. To achieve this, training samples are weighted according to their proximity to the mean value. Distant samples are down weighted to obtain a more robust estimate. Finally, the road likelihood is obtained as in Gaussian classifier.
Mixture of Gaussians (MoG). Single Gaussian classifiers have the drawback of modeling a single distribution. This may negatively influence their performance in the presence of shadows and lighting variations. Mixture of Gaussians classifier models the set of training samples using a combination of Gaussians and thus, creates a more flexible description of the road class. The road likelihood is given by , where and are the parameters of the different Gaussians involved and is the weight assigned to the n-th Gaussian. In this paper, we optimize these parameters using the EM algorithm and we will also evaluate different values of .
k-means (km). This classifier does not rely on estimating the density probability function. Instead, the classier describes the training data using different clusters. These clusters are defined by minimizing the average distance to a cluster center. Then, the road likelihood is obtained by where is the set of cluster centers.
k-center (kc). This method aims at covering the training set with small balls with equal radious. The centers of these balls are placed on training samples by minimizing the maximum distance of all minimum distance between training pixels and the centers of the balls (minimize ). Once the centers are defined, the road likelihood is obtained as in the -means method: .
Principal Component Analysis (PCA)
. This classifier describes road data using a linear subspace defined by the eigenvectors of the data covariance matrix. To verify if a new data instance belongs to the road class, the algorithm analyzes the reconstruction error defined as the difference between the incoming instance and the projection of that instance in the road subspace. Therefore, the road likelihood is defined by, where is the projection of
into the subspace. In this paper, we assume the subspace is built using the eigenvectors with the largest eigenvalues representingof the energy in the original data.
Nearest neighbor (NN). This method avoids the explicit density estimation and estimates the road likelihood using the distances between test pixels and the training data. That is, where is the set of training pixels. In this paper, we consider the minimum squared Euclidean distance over the training set. However, this method is suitable to use any other metric such as circular distances over specific color planes.
Linear programming distance-data description (dLP). This method aims at describing the road data in terms of distances to other objects . Then, the road likelihood is estimated based on the dissimilarity between the test pixels and road training samples. This is formulated using a linear proximity function as follows: , where the weights
are optimized to minimize the max-norm distance from the bounding hyperplane to the origin. Furthermore, only a few of these weights are non-zero as a consequence of the linear programming formulation.
Support Vector Descriptor (SVD). This method aims to define the hypersphere with a minimum volume covering the entire training set . This is a specific instance of the SVM classifier where only positive examples are used. In our case, we consider a general kernel to fit a hypersphere around the road samples in the training set. Then, the road likelihood is computed as the distance of the test sample to the center of the sphere.
Minimax Probability (MPM)
This method aims at computing the linear classifier that separates the data from the origin rejecting maximally a specific fraction of the training data represented as a random variable.
Minimum Spanning Tree (MST). This is a non-parametric classifier aiming at capturing the underlying structure of the data based on fitting a minimum spanning tree to the training data . In the ideal case, a test instance belongs to the target class if it is in one of the edges of the spanning tree. However, since the training set is finite and may not represent all possible instances of the target class, a test instance is considered as a target if it lies in the neighborhood of any of the edges. Therefore, the road likelihood is estimated as the minimum distance to the one of the edges of the tree given by , where is the projection of the test pixel onto the line defined by two training samples (i.e., the vertices of the tree). In those cases where the projection does not lie between and then, the distance is computed as a nearest neighbour distance between and or .
Iv The Road Dataset
In this section, we introduce and provide a novel dataset for road detection. The dataset consists of still images extracted from different road sequences comprising thousands of images acquired at different days, different daytime (daybreak, morning, noon and afternoon), different weather conditions (sunny, cloudy, rainy) and mainly from urban-like scenarios. The set of images has been carefully selected to include the major challenges in real world driving situations by discarding those images where the road is uniformly illuminated. We also discard those images where the percentatge of the image covered by the road surface is too large leading to the distribution of images shown in Fig. 5. As shown, the dataset consists of images where the road represent approximately the of the image. Images in the dataset contain strong shadows, wet surfaces, sidewalks similar to the road, direct reflections, crowded scenes and lack of lane markings as shown Fig. 6.
Ground-truth is provided by a single experienced user providing manual segmentations (Fig. 6). To facilitate the labeling task, we used the annotation tool shown in Fig. 7. This tool allows multiple user annotations as well as defining multiple objects such as cars, road, sky. Once the annotation is completed, points defining the polygon around the object are stored in a XML file associated to the user. Both the dataset and annotation tool are made publicly available to the community at http://scrd.josemalvarez.net.
In this section, we present experiments conducted to evaluate different combinations of single class classifiers and color representations for road detection. In particular, we evaluate each color plane individually ( color planes) and their most common combinations such as , , , and in conjunction with a one-class classifier. The set up of the classifiers is as follows. First, we consider four instances of the model-based classifier. Two of these instances directly use the training samples from the bottom part of the road to build the normalized-histogram with and bins. The other two instances extend the training set with noisy samples. Extending the training set with synthetic samples is a common practice to improve the robustness of the algorithms . Hence, we duplicate the samples and adding zero mean and standard deviation noise to half of it (referred as ). Then, two different model-based configurations are considered: and bins. Using different number of bins to build the histogram enables the stability analysis of variations of this parameter. The single and robustified Gaussian models are learned by rejecting of the data. Furthermore, we consider three instances of MoG classifier: , and . This last configuration optimizes based on the training set.
Road samples collected from a rectangular area ( pixels) at the bottom part of each image yields training pixels (Fig. 4). Note that this area is suited for right driving situations and it is not extremely large. Furthermore, the area is fixed and independent of the image. Therefore, as shown in Fig. 4, training pixels may not represent all the road areas in the image for two reasons: the variability within the training set is not significant and the area does not belong to the road surface. To reduce the computational cost required to train some methods, this area is oversegmented using superpixels and only a single value per superpixel is considered. In particular, we consider the central value of the distribution within each super pixel to reduce the effect of long tails due to noise in the imaging process. This process reduces the training set to a compact area of approximately samples per image as shown in Fig. 3.
V-a Evaluation Measures
Quantitative evaluations are provided using average ROC curves 
on the pixel-wise comparison between ground-truth and results obtained by binarizing the road likelihood(Sect. III) with different threshold values. ROC curves represent the trade-off between true positive rate and false positive rate . These two measures provide different insights into the performance of the algorithm. The true positive rate () or sensitivity refers to the ability of the algorithm to detect road pixels. A low sensitivity corresponds to under-segmented results.
False positive rate () or fall-out refers to the ability of the algorithm to detect background pixels. Hence, a high fall-out corresponds to over-segmented results. However, in road images, a low fall-out does not ensure a high discriminative power since the number of false positives that can appear within the road areas is negligible compared to the number of background pixels. Hence, small fall-out variations may correspond to significant variations in the final road detection result. Finally, for performance comparison, we consider the area under the curve (AUC ). The higher the AUC, the higher the accuracy will be. The equal error rate (EER) is defined as the intersection between the curve and the line where error rates are equal i.e., .
|Individual Color Representations|
|Combination of Color Representations|
The summary of the AUC values resulting from combining the different color representations and instances of single class classifiers is listed in Table III. ROC curves for the different instances of the model-based classifier are shown in Fig. 8 and representative ROC curves for the rest of classifiers are shown in Fig. 9. From Fig. 8, we can derive that the stability of the model-based classifier with respect to the number of bins used to build the histogram. The relative low performance of this model-based classifier tends to improve by extending the training set when using noisy samples. This is probably due to the lack of training samples representing the target (road) class. Therefore, adding noisy samples improves the variety of the training data. Note the performance drop of this model-based classifier when considering multiple color planes. This suggests that the joint distribution of these color planes can not capture the road appearance using only a few training samples. These results could be improved by considering the likelihood provided by each color plane independently.
As expected, the PCA classifier can not perform with single color planes. PCA is based on data covariance matrix. Therefore, is not suitable for single dimension input. Nevertheless, this classifier provides outstanding performance when using three dimensional input data. Further, besides model-based classifier on joint distributions, the worst performance corresponds to the linear programming classifier (dLP) in the color space. As shown, the performance of this color space is generaly low. This is mainly due to an excess of invariance leading to higher false positive rates (i.e., the model has not discriminative properties) as shown in Fig. 9. The use of as input to most of the classifiers also provides low performance. This opponent component is a combination or and color planes. In contrast, and color planes provides good performance. Worth noticing that these two color components are combinations of color space and include certain amount of . This suggests that provides relevant information to balance invariance and discriminative power. As shown in Table III, among color planes, is the one providing higher performance.
Interestingly, high performance is achieved when the input data contains , and in particular, when the input data is . The performance drops when luminance is included (as ). This is mainly due to the lack of inviariance in the luminance color plane. Therefore, we can conclude that, for challenging situations, the use of luminance color space decreases the performance of the algorithms. Nevertheless, in real world driving situations these challenging conditions may only represent a small portion of the dataset (e.g., the dataset presented here is a subset of a large dataset with less-challenging images) and a different color plane may be more suitable. For instance, when single color planes are evaluated in general sequences outperforms the other color planes.
In this paper, we introduced a comprehensive evaluation combining color representations with different single class-classifiers for road detection. Experiments were conducted on a new set of road images comprising manually annotated images. From the results, we conclude that combining multiple color representations using a parametric classifier outperforms the accuracy of single color representations. Moreover, in this dataset, learning a robustified Gaussian model in a color space using both saturation and hue yields highest accuracy.
-  J. Fritsch, T. Kuhnl, and F. Kummert, “Monocular road terrain detection by combining visual and spatial information,” IEEE Trans. Intel. Transp. Systems (T-ITS), vol. 15, no. 4, pp. 1586 – 1596, 2014.
-  J Fritsch, T Kühnl, and A Geiger, “A new performance measure and evaluation benchmark for road detection algorithms,” in ITSC, October 2013.
-  M.A. Sotelo, F.J. Rodriguez, and L. Magdalena, “Virtuous: vision-based road transp. for unmanned operation on urban-like scenarios,” IEEE Trans. Intel. Transp. Systems (T-ITS), vol. 5, no. 2, pp. 69 – 83, June 2004.
-  C. Tan, T. Hong, T. Chang, and M. Shneier, “Color model-based real-time learning for road following,” in ITSC, 2006, pp. 939–944.
-  H. Kong, J. Y. Audibert, and J. Ponce, “General road detection from a single image,” IEEE Trans. on Image Processing (TIP), vol. 19, no. 8, pp. 2211 –2220, 2010.
-  J. M. Alvarez and A.M. Lopez, “Road detection based on illuminant invariance,” IEEE Trans. on Intel. Transp. Systems (T-ITS), vol. 12, no. 1, pp. 184 –193, 2011.
-  D. M. J. Tax, One-class Classification, Ph.D. thesis, Delft University of Technology, 2001.
-  Y. He, H. Wang, and B. Zhang, “Color–based road detection in urban traffic scenes,” IEEE Trans. Intel. Transp. Systems (T-ITS), vol. 5, no. 24, pp. 309 – 318, 2004.
-  X. Hu, S. A. Rodriguez-Florez, and A. Gepperth, “A multi-modal system for road detection and segmentation,” in IV, June 2014.
-  J. M. Alvarez, T. Gevers, and A. M. Lopez, “Learning photometric invariance for object detection,” Intern. Journal of Computer Vision (IJCV), pp. 45 – 61, 2010.
-  P. Lombardi, M. Zanin, and S. Messelodi, “Switching models for vision-based on–board road detection,” in ITSC, 2005, pp. 67 – 72.
-  Jose M. Alvarez, M. Salzmann, and N. Barnes, “Data-driven road detection,” in WACV, March 2014, pp. 1134–1141.
-  C. Rotaru, T. Graf, and J. Zhang, “Color image segmentation in HSI space for automotive applications,” Journal of Real-Time Image Processing, pp. 1164–1173, 2008.
A. Ess, T. Mueller, H. Grabner, , and L. van Gool,
“Segmentation-based urban traffic scene understanding,”in BMVC, Sep. 2009.
-  O. Ramstrom and H. Christensen, “A method for following unmarked roads,” in IV’05, 2005.
-  J. M. Alvarez, Theo Gevers, Yann LeCun, and A. M. Lopez, “Road scene segmentation from a single image,” in ECCV, 2012, vol. 7578, pp. 376–389.
-  T. Michalke, R. Kastner, M. Herbert, J. Fritsch, and C. Goerick, “Adaptive multi-cue fusion for robust detection of unmarked inner-city streets,” in IV, 2009, pp. 1 –8.
C. Guo, S. Mita, and D. McAllester,
“Mrf-based road detection with unsupervised learning for autonomous driving in changing environments,”in IV, June 2010, pp. 361–368.
-  Elzbieta Pekalska, David M. J. Tax, and Robert P. W. Duin, “One-class lp classifiers for dissimilarity representations,” in NIPS, 2002, pp. 761–768.
-  D.M.J. Tax and R.P.W Duin, “Support vector domain description,” Pattern Recognition Letters, vol. 20, no. 11-13, pp. 1191–1199, 1999.
G.R.G. Lanckriet, L. El Ghaoui, and M.I. Jordan,
“Robust novelty detection with single-class mpm,”in NIPS, 2003.
-  Piotr Juszczak, David M.J. Tax, Elzbieta Pekalska, and Robert P.W. Duin, “Minimum spanning tree based one-class classifier,” Neurocomputing, vol. 72, no. 7-9, pp. 1859–1869, 2009.
-  J. M. Alvarez, Yann LeCun, Theo Gevers, and A. M. Lopez, “Semantic road segmentation via multi-scale ensembles of learned features,” in ECCVW, 2012, vol. 7584, pp. 586–595.
-  T. Fawcett, “An introduction to roc analysis,” Pattern Recognition Letters, vol. 27, no. 8, pp. 861 – 874, 2006.
-  J. M. Alvarez and A. M. Lopez, “Novel index for objective evaluation of road detection algorithms,” in ITSC, Nov. 2008, pp. 815–820.