Autonomous vehicles on the road would be endangered by tiny obstacles, e.g., bricks, stones and lost cargos. These obstacles (with low height of 15-30cm at long distance of 30m or more rather challenging) are hardly discovered by point clouds generated by LIDAR or stereo camera. Moreover, the patterned ground, e.g., zebra crossing or bricks, is easily mistaken as obstacles only by appearance cues. Hence, discovering tiny obstacles at long distance is challenging.
define the task of obstacle discovery as segmenting the road scene into semantic labels. They merge apperance cue and geometry cue by utilizing deep learning, which is time-consuming. In addition, some methods utilize proposal to capture object in the image, then build a model for classfication between obstacle and non-obstacle. However, the low perceptual ability to tiny obstacles limits such methods to detect obstacles. Hence, the methods mentioned above are unable to extract tiny obstacles.
Edge is an important visual element for object perception . Furthermore, occlusion edge  reveals the 3D cue of object, and hence captures object contours better. As a basic feature, it could be effectively applied in tracking , robot following, action recognition and visual homing, etc. However, in some cases like Fig.1(c), the edges of tiny objects at long distance are weak and inconsistent, so that the occlusion edges are insufficient acquired, making the proposals fail to enclose objects. In this paper, the task of obstacle discovery is based on three goals: 1) detecting the contours of distant obstacles as completely as possible, 2) extracting the proposals of obstacles as much as possible, 3) ranking the obstacle proposals as high as possible. To achieve these goals, a set of obstacle-aware occlusion edge maps is generated to critically fit the contours of obstacles at various distances. In this generation, the multi-layer regions revealing the distance from camera, i.e., pseudo distance, are inferred from 2D images by visual cues. To enhance the perception of tiny obstacles, the edge cues at all pseudo distances are fused. Compared to the previous works  shown in Fig.1(c), our method fits the contours of tiny obstacles better, as shown in Fig.1
(d). Furthermore, the proposals are extracted from the occlusion edge maps of each layer, ensuring the existance of tiny obstacle proposals. Finally, an obstacle-aware regressor based on random forest is learned to produce an obstacle occupied probability map, and the predicted obstacles are shown in Fig.1(b).
In summary, the main contributions of our method lie in:
A set of novel obstacle-aware occlusion edge maps is proposed to characterize obstacles, which fuses edge cues from each layer in a multi-layer architecture. These maps have a better expression for the contours of obstacles at long distance than previous works.
By subtly combining appearance features and pseudo distance features, an obstacle-aware regressor is proposed to give high score to the obstacle proposals.
Our method achieves remarkable performance on the Lost and Found dataset , outperforms the state-of-the-art algorithms and significantly improves the discovery ability for tiny obstacles at long distance.
Ii Related work
. The first type compares the relative positions between 3D points in disparity map, and classifies all points into obstacle and road.PHT and FPHT 
apply the statistical hypothesis testing to assess the drivable area and obstacle hypotheses. However, due to the dependence of these methods on the accuracy of disparity map, once the disparity map is inaccurate, discovery failure would occur, especially for the tiny obstacles at long distance. The second type segments an RGB image into several regions with different semantics.MergeNet  proposes a network architecture for discovering obstacles, which makes effective use of the limited data. However, tiny obstacles provide little information throughout the image, hence it is hard to discover them. The third type discovers obstacles from proposals by classification or regression.  produces plentiful proposals by Faster-RCNN 
, and classifies proposals by Support Vector Machines. Methods of this type are rarely used to discover tiny obstacles, the reason is that tiny objects do not have sufficient information. All methods above rarely pay attention to better discover tiny obstacles at long distance.
Different from these methods mentioned above, the focus of this paper is on the tiny obstacles at long distance. Since our method closely depends on occlusion edge  and Edge boxes , brief introductions for them are given below.
Ii-a Reviewing Edge boxes
To model the observation of objects in an image, Edge boxes  densely searches bounding boxes in the image, and defines the specific objectness score based on the edge map of this image.
However, some tiny obstacles have similar color distribution with road area, hence the contours of them are incompleted and weak. The boxes intersecting the weak edge obtain the higher score ranking. Meanwhile, since there is no spatial constraint between pixel values in the edge probability map, different edge pixels of the same obstacle have completely different probabilities, making this obstacle less like an object. Furthermore, since the tiny obstacles at long distance get lower score than other larger objects, the ability to find them greatly drop. Hence, designing a method to improve the edge with closed region is necessary.
Ii-B Reviewing Occlusion Edge
Occlusion edge  aims to find the edge revealling the depth discontinuity between obstacle and background. It takes the edges between adjacent regions of over-segmented image as inputs, and classifies all the edges into two subsets: occlusion edges and trivial edges. Compared to other edge cues  , the occlusion edge has stronger response to obstacle contour, especially for tiny obstacles. The reason is that the surface cue is additionally taken into account. Hence, the occlusion edge is more proper for obstacle discovery.
However, due to the complexity of road scene, e.g., the similar appearance with the road plane, some tiny obstacles at long distance still hardly acquire sufficient occlusion edge cues. The intrinsic reason is that the space distance between obstacle and camera is not taken into consideration. Apparently, this issue brings great difficulty to discover tiny obstacles, which would be addressed in this paper.
In some cases where the tiny obstacles are located at long distance, the occlusion edge cues in  cannot be acquired sufficiently, which restrict the discovery for these obstacles. To address this issue, a set of novel obstacle-aware occlusion edge maps is constructed to refinedly fit the contours of tiny obstacles at various distances. Each map at multi-layer distances is used for proposal extraction, ensuring the existence of tiny obstacle proposals. In addition, to give relatively high scores for the tiny obstacles in proposal set, an obstacle-aware regressor is learned by some novel features which are related to the pseudo distance. And an obstacle occupied probability map is generated by the regressor.
Iii-a Obstacle-aware occlusion edge map
To ensure the existence of the tiny obstacle proposals, more reliable occlusion edge maps are expected to be generated over multiple distances, which refinedly fit the contours of tiny obstacles. To achieve this goal, as shown in Fig.3
, a multi-layer framework with dual paths is built. In this framework, a near-to-far pathway considers various distances at which obstacles appear, and estimates the image regions indicating different distances. A far-to-near pathway fuses edge probability maps at different distances to enhance the edge cues of the tiny obstacles. The lateral connections fit the contours of obstacles at different distances respectively.
Near-to-far pathway. To estimate multi-layer regions revealing various distances, the principle of perspective is used to connect 3D space distance to 2D visual cues. As shown in Fig.2(b), fixing the camera, the farther this obstacle with fixed size is in 3D space, the smaller it is in 2D image, meanwhile, the farther it is away from the image bottom. Hence, two 2D properties in mononcular image are employed to describe pseudo distance of an obstacle: (i) the pixel distance from an obstacle center to the image bottom, (ii) the number of pixels occupied by an obstacle. All the training obstacles are given by the Lost and Found dataset, as shown in Fig.2(a), and the green doted line region is used for obstacle discovery.
To be specific, given the training obstacle set
, considering pseudo distance properties mentioned above, the k-means clustering is employed to divide the whole obstacle setinto subsets, i.e., . Note that each is a set of obstacles with similar locations and similar areas, and the obstacles in are farther than that in . Then for an image , the region is divided into sub-regions by considering the partition of , namely, sub-region contains all the obstacles of , sub-region contains the obstacles of . Following the same way, sub-region contains the obstacles of . Intuitively, the farthest obstacle exists in sub-region with smallest range. Each sub-region corresponds to a layer in the framework. Hence, the multi-layer map is denoted as .
Far-to-near pathway. To greatly improve the edge probability of tiny obstacles, the edge cues in distant layer would be passd to the nearby layer, making the tiny obstacles could be clearly observed in each layer. A set of edge probability map , which corresponds to the layers , is generated by the structured edge detection . As shown in Fig.3, at the beginning, the enhanced edge probability map is equal to . Due to the fact that the edge pixels in are also observed in the larger edge probability map , there is a pixel correspondence between and . Hence, the values on pixels in map are summed to that of the corresponding pixels in map to generate the enhanced map . In the same way, the enhanced edge probability map of each layer is passed to the map to generate the enhanced map . In the enhanced map , since the edge cues in the long distance regions are improved by many times, the tiny obstacles at long distance obtain a high response. Hence, these obstacles are easier to be discovered.
Lateral connections. The occlusion edges are part of the atomic edges between two adjacent regions, however, edges in are unable to partition the scene into regions. Thus, while retaining the edge probability gain, the lateral connections aim to fit the contours of obstacles by superpixels.
At the begining, each pixel in is treated as a superpixel with the most detailed partition. Then, an iterative optimization rule proposed in  is used to optimize superpixels. The enhanced edge cues are used as a critical factor to generate superpixels refered from . The pixels inside a superpixel have similar properties. Since the obstacles and the road area belong to different objects, they are divided into two different superpixels. Hence, the contours of obstacles are reflected on the atomic edges between two adjacent superpixels. Assuming that the atomic edges set in each layer is denoted as corresponding to . Due to the fusion of cues from various distances, the contours of the tiny obstacles at long distance are completely fitted by the edges , as shown in the bottom of Fig.4(c).
To capture the contours of tiny obstacles at long distance as completely as possible, each layer generates a set of superpixels , and all sets are used for occlusion edge detection . Several cues are jointly used to express an atomic edge , forming a feature vector . The occlusion edge classifier can be trained as follows:
where is the class label of corresponding training edge , is a coefficient that balances the two related terms, and b are the target classifier and bias, respectively. The atomic edges set in the -th layer is classified to generate -th occlusion edge maps, respectively. As shown in Fig.4(d), the contours of tiny obstacles below obtain higher scores than those of the above, making the tiny obstacles easier to be found.
Naturally, by considering the prior information on distance, different layers contain obstacles at different distances. And the tiny obstacles at long distance could be observed in scenes at all layers. To ensure the existance of tiny obstacle proposals, the occlusion edge maps in all layers are respectively used to extract proposals by the Object-Level Proposal . Specifically, a full joint set of proposals, i.e, , can be obtained, where represents the proposal set generated from -th layer occlusion edge map. However, apart from the tiny obstacles, there also exists many non-obstacles in these proposals, e.g, the brick texture, pedestrian crossing. It is necessary to build an obstacle-aware model for seeking the obstacles in the joint set .
Iii-B Obstacle-aware regressor
An obstacle-aware regressor is expected to give a high score to the real obstacle. Due to the good generalization achieved by random forest , it is suitable for the complex regression in our task.
Training Data Generation: Training sample is one of the key elements to produce the obstacle occupied probability map. The proposals in the initial set can be predefined as one of three categories: (i) road area, (ii) obstacle, (iii) non-road area. Since the harmful obstacles always lay on the road, only the proposals , containing (i) and (ii), are considered as the training samples. In addition, the overlap between proposal and the ground truth is employed as the label.
Feature is second key element to produce the obstacle occupied probability map. In this paper, several features are novelly employed to characterize the proposals. Specifically, (i) Edge and Structure: Edge density (ED) , average, maximum, and mode of edge response, the ratio of the mode measure the statistical information of edge; ED measures the density of edges near the box borders. (ii) Pseudo Distance: Following , size, position, height, width and aspect ratio of the proposal; The combination of these features is associated to pseudo distance. (iii) Objectness score: Following , the objectness score measures the likelihood that a box contains an object. (iv) Color: Color contrast (CC) 
and color variance (CV) of the proposal; CC measures the color dissimilarity of a box to its immediate surrounding area, and CV of a box in the HSV image reflects the color dispersion inside this box. In this work, cosine distance between the HSV histograms is employed as the metric of CC.
Stacking all the features, a 20 dimensional feature vector (7 for edge and structure, 6 for pseudo distance, 1 for objectness score, 6 for HSV color space) is constructed.
Obstacle-aware Regressor: As shown in Fig.5, the random forest consists of binary trees, and each tree consists of internal nodes and leaf nodes. The internal node classifies the proposals reaching on this node, and passes these proposals to its left or right child node until a leaf node is reached. And the reached leaf node stores a score which would be given to the input proposal. Based on the generated feature vectors for training, our regressor uses these feature vectors to regress the overlap between the obstacle and the ground truth.
As shown in Fig.6, the frequencies of most features are similar, which means that they have sufficient discriminability for obstacles. The pseudo distance feature is relatively high in frequency of use, indicating its important role for obstacle discovery. The intrinsic reason is that the tiny obstacles at long distance have similar color with the road area, making the color, objectness and structure properties of them different from that of the nearby obstacles. These appearance cues are constrained by pseudo distance feature. For the obstacles, the combined use of all features has a higher distinguishing capability than each feature itself.
As shown in Fig.5, in the forest, the training proposals that fall inside the same leaf node have similar appearance. It is observed that the distant lost cargos with square shape are assigned to the same leaf node. In addition, all the obstacle proposals have convincingly higher score than the proposals containing road area, i.e., scores of obstacles are usually higher than , but that of road area are lower than .
Prediction: The prediction of forest is formulated as the average of each tree output :
where denotes the output of each tree to proposal , the score of is the outputs average of all trees.
Obstacle Occupied Probability Map: The scores of all proposals in are accumulated in the corresponding pixels to produce a probability map .
where denotes the coordinate of pixel . denotes the normalization term. If is inside , score is summed into . Finally, the tiny obstacles at long distance obtain high probabilities, and more details are shown in the experiments.
Iv-a Dataset and parameter setting
Our algorithm is validated by performing experiments on Lost and Found dataset , i.e., the only publicly available dataset focusing on discovering the small obstacles and lost cargos on the road. The dataset records 13 different challenging street scenarios and 37 different obstacles, and is split into a training subset and a testing subset, in which the obstacle types in the testing subset is more complex than that in the training subset. For the experimental parameters, the cluster number is set from 1 to 4 for variants comparison. To simplify the expression, Ours@n denotes the layers variants of the multi-layer in our method.
Iv-B Evaluation metrics
In addition, our method is evaluated on two metrics: the pixel-level metric and the instance-level metric.
Pixel-level Metric: Refering from , pixel-level Receiver-Operator-Characteristic (ROC) curve compares Ture-Positive-Rate (TPR) over False-Positive-Rate (FPR).
where denotes the correctly discovered pixel number of the obstacle, and denotes the number of road pixel that is incorrectly predicted as obstacle. refers to the total pixels of the obstacle class, and corresponds to the road area. In this paper, 100 thresholds from 0 to 1 are averagely taken to segment the obstacle occupied probability map, the pixels over the threshold are labeled as obstacle.
Instance-level Metric: Three proposal metrics in  are used to make comparisons on recall rate for obstacle. Firstly, taking the top 1000 proposals, the IoU threshold ranges from 0.5 to 1. Secondly, setting the IoU threshold to 0.7, the number of proposals ranges from 1 to 1000. Thirdly, the average recall (AR) between IoU 0.5 to 1 is introduced, ranging the proposals number from 10 to 1000.
Iv-C Quantitative results
The comparison between variants of our method in pixel-level metric is shown in Table II. Note that Ours@1 has insufficient proposals when FPR is larger than 1.5%. Ours@4 performs favorably against the other variants, which achieves an accuracy of 85% when FPR is 2.1%. As for instance-level metric, the comparison between variants of our method is shown in Fig.7 and Table I. It is observed that both Ours@3 and Ours@4 provide the best results in all experiments. And there is a significant gap between Ours@1 and other variants. The reason is that Ours@1 and Ours@2 hardly discover the tiny obstacles at long distance, while Ours@3 and Ours@4 apply the layers revealing long distance to address this issue. As the best variant, Ours@4 is used to make comparisons with the state-of-art methods below.
By utilizing the same pixel-level metric and dataset, Table IV indicates the comparison of our method against other obstacle discovery methods. When FPR is fixed to 2%, our method achieves 16%/17% accuracy improvement over PHT-CStix and FPHT-CStix , respectively. Similarly, when FPR is lower, our method achieves considerable improvement in accuracy over these two methods. MergeNet  utilizes deep learning to discover obstacles, and achieves an accuracy of 85% when FPR is 2.0%. Although Our method is not based on deep learning, it achieves an approximate result.
Another quantitative result is shown in Fig.8 and Table III, which compares between our method and existing proposal extraction methods on instance-level recall rate. In Fig.8(a), given 1000 proposals, when the threshold of IoU overlap between proposals and groundtruth is fixed to 0.5, our method achieves obvious improvement in recall rate over Edge boxes. Meanwhile, our recall rate is also higher than that of MCG and OLP. In Fig.8(b), when the threshold of IoU overlap is fixed to 0.7, our method always obtains the highest recall rate for different number of proposals. In Fig.8(c), for average recall (AR) versus number of proposals, our method is also the best. Two reasons lead to this result. Firstly, the objectness scoring functions in these methods are too simple to express tiny obstacles on the road. Secondly, the weak cues of tiny obstacles at long distance lead to a lower likelihood of discovering them.
Iv-D Qualitative results
Fig. 9 depicts qualitative results of our methods on three challenging scenarios from the testing subset. The left column shows cargos discarded in the shadow of buildings. The middle column shows a bobby car parked on a chalk-marked street. The rightmost column shows a baby on the bobby car. Each obstacle in these scenario is very far from the camera. In the left column, our method completely detects distant obstacles while maintaining a low false positive in the shadow area. In the middle column, our method avoids detecting the ground textures and tree shades as obstacles. In the rightmost column, the irregular shape obstacle is not contained in the training set. All the obstacles are successfully discovered. Furthermore, the probability map fits the shape of obstacles by accumulating plentiful proposals.
In this paper, a novel obstacle discovery method is introduced. This method proposes a multi-layer framework to produce a set of novel obstacle-aware occlusion edge maps, which utilizes the pseudo distance. Proposals are extracted from the occlusion edge maps of all layers, which is able to enclose tiny obstacles as much as possible. In addition, an obstacle-aware regressor, which fuses the pseudo distance features, is built to find obstacle proposals. Extensive experiments validate the effectiveness of the proposed method.
-  P. Pinggera, S. Ramos, S. Gehrig, U. Franke, C. Rother, and R. Mester, “Lost and found: detecting small road hazards for self-driving vehicles,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016.
-  R. Manduchi, A. Castano, A. Talukder, and L. Matthies, “Obstacle detection and terrain classification for autonomous off-road navigation,” in Autonomous Robots, vol. 18, no. 1, 2005, pp. 81–102.
-  K. Gupta, S. A. Javed, V. Gandhi, and K. M. Krishna, “Mergenet: A deep net architecture for small obstacle discovery,” in IEEE International Conference on Robotics and Automation (ICRA), 2018.
-  S. Ramos, S. Gehrig, P. Pinggera, U. Franke, and C. Rother, “Detecting unexpected obstacles for self-driving cars: Fusing deep learning and geometric modeling,” in IEEE Intelligent Vehicles Symposium (IV), 2017.
-  G. Prabhakar, B. Kailath, S. Natarajan, and R. Kumar, “Obstacle detection and classification using deep learning for tracking in high-speed autonomous driving,” in IEEE Region 10 Symposium, 2017.
-  L. Mao, M. Xie, Y. Huang, and Y. Zhang, “Preceding vehicle detection using histograms of oriented gradients,” in International Conference on Communications, Circuits and Systems, 2010.
-  A. Ming, T. Wu, J. Ma, F. Sun, and Y. Zhou, “Monocular depth-ordering reasoning with occlusion edge detection and couple layers inference,” in IEEE Intelligent Systems (IS), vol. 31, no. 2, 2016, pp. 54–65.
Y. Lu and L. Shapiro, “Closing the loop for edge detection and object
AAAI Conference on Artificial Intelligence (AAAI), 2017.
Y. Zhou, J. Ma, A. Ming, and X. Bai, “Learning training samples for occlusion
edge detection and its application in depth ordering inference,” in
International Conference on Pattern Recognition (ICPR), 2018.
Y. Zhou, X. Bai, W. Liu, and L. J. Latecki, “Similarity fusion for visual
International Journal of Computer Vision (IJCV), vol. 118, no. 3, pp. 337–363, 2016.
-  M. Zhou, J. Ma, A. Ming, and Y. Zhou, “Objectness-aware tracking via double layer model,” in IEEE International Conference on Image Processing (ICIP), 2018.
-  Y. Zhou, X. Bai, W. Liu, and L. J. Latecki, “Fusion with diffusion for robust visual tracking,” in Neural Information Processing Systems (NIPS), 2012.
-  Y. Zhou, Y. Yang, Y. Meng, X. Bai, W. Liu, and L. J. Latecki, “Online multiple targets detection and tracking from mobile robot in cluttered indoor environments with depth camera,” International Journal of Pattern Recognition & Artificial Intelligence (IJPRAI), vol. 28, no. 01, pp. 798–805, 2014.
-  Y. Zhou and A. Ming, “Human action recognition with skeleton induced discriminative approximate rigid part model,” Pattern Recognition Letters, vol. 83, pp. 261–267, 2016.
-  J. Ma, J. Zhao, J. Jiang, H. Zhou, Y. Zhou, Z. Wang, and X. Guo, “Visual homing via guided locality preserving matching,” in IEEE International Conference on Robotics and Automation (ICRA), 2018.
-  S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 39, no. 6, 2015, pp. 1137–1149.
-  C. L. Zitnick and P. Dollar, “Edge boxes: Locating object proposals from edges,” in European Conference on Computer Vision (ECCV), 2014.
-  P. Dollar and C. L. Zitnick, “Structured forests for fast edge detection,” in IEEE International Conference on Computer Vision (ICCV), 2013.
-  P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik, “Contour detection and hierarchical image segmentation,” in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 33, no. 5, 2011, pp. 898–916.
-  J. Ma, A. Ming, Z. Huang, X. Wang, and Y. Zhou, “Object-level proposals,” in IEEE International Conference on Computer Vision (ICCV), 2017.
A. Criminisi, J. Shotton, and E. Konukoglu,
Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning. Now Publishers Inc, 2012.
-  B. Alexe, T. Deselaers, and V. Ferrari, “What is an object?” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010.
-  P. Arbelaez, J. Ponttuset, J. Barron, F. Marques, and J. Malik, “Multiscale combinatorial grouping,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.