In the past few years, autonomous driving has attracted tremendous attention and been developing rapidly. Vehicle-mounted sensors, such as LiDAR, radar and video camera, are extensively utilized in multiple environmental perception tasks ranging from object detection and tracking, semantic segmentation, lane and curb detection for autonomous driving applications. Recently, a various of benchmark datasets have been proposed to satisfy demands of algorithm evaluation and testing. For instance, [Geiger2012, Caesar2020, Sun2020, Wang2019, Cordts2015] collected a large amount of camera and LiDAR data for object detection, tracking and segmentation, and [Tusimple2017], [Xingang2018] were released for lane line detection based on video camera data. While there are relatively few public benchmarks or datasets available for the challenge of LiDAR-based curb detection, which plays a critical role in road environmental perception. We aim to address this gap and concentrate on how to build such a curb dataset efficiently.
Existing curb labeling methods (e.g. [Chen2015, Zhang2018, Liang2019]) are mostly based on manual ways. In [Chen2015], Chen et al. built a curb dataset containing 2,934 LiDAR scans in various urban scenes and 566 scans in the dataset were labeled manually. Zhang et al. [Zhang2018] collected about 200 scans in five different scenarios and manually labeled the curbs in each frame. Recently, [Younghwa2021] built and released a curb dataset consisting of about 5,200 scans with BEV labels and encoded images.
Labeling curbs manually are inefficient, costly and error-prone, especially in LiDAR point clouds. Furthermore, due to sparsity of faraway point clouds and blocking by road users, labeling curbs in single LiDAR frames often suffers from partial observations, which makes it provide less useful information in the training of DNN-based curb detection method.
In this paper, we propose an efficient two-stage curb labeling method with LiDAR data. Benefiting from multi-frame consecutive LiDAR data and a CI map, both visible and occluded curbs are labeled simultaneously. The contributions of this paper can be summarized as follows:
We propose an efficient two-stage curb labeling method which can label LiDAR data with point-wise and instance-wise annotations.
We present an annotated curb dataset of LiDAR sequences based on [Behley2019].
We perform curb instance segmentation and semantic segmentation on the labeled dataset and the curb annotations are validated.
Ii Related Works
Ii-a Road Map Generation
Curbs and lane lines act as two essential components for road map generation on structured roads (urban roads and highways). Benefiting from high contrast color of lane markers relative to road surface, most road map generation methods rely on lane lines detection using video cameras. [Jeong2017] proposed a Road-SLAM algorithm for road markings mapping and localization. In [Jang2018], Jang et al. proposed an automatic HD map generating algorithm with a monocular camera. Qin et al. [Qin2021] proposed a light-weight mapping and localization solution, which consists of on-vehicle mapping, on-cloud mapping and user-end localization. In contrast, curbs are irregular in color information, but exhibit robust consistency in spatial distribution, distinct from backgrounds. Inspired by this, some methods take advantage of LiDAR’s 3D point clouds to detect curbs and build curb maps. He et al. [He2018]
proposed a vector-based road structure mapping method using multi-beam LiDAR and used polyline as primary mapping element. In[Wang2017], a robust road shape model was proposed and Gaussian process (GP) was employed to generate smooth curves. [Darms2010]
presented two approaches for estimating road boundary map by using a radar sensor and a video camera.
Ii-B Curb Detection
LiDAR-based curb detection methods can be divided into two categories: traditional methods and DNN-based methods. Generally, the traditional methods use hand-crafted features to extract candidate points, which are subsequently clustered and fitted to get parameterized curb results. Due to convenient deployment on computing platforms, most LiDAR-based curb detection methods applied in autonomous driving systems are still traditional ones (e.g. [Chen2015],[Hata2014],[Zhang2018]). However, traditional methods fail in some complex scenarios, such as cross-roads, roundabouts and lower urban curbs (For instance, the height difference is less than
above road surface). Moreover, the hand-crafted features contain a large number of hyperparameters and cannot adapt to different scenarios. DNN-based methods are promising to overcome these constraints, but only several works (e.g. [Suleymanov2019],[Younghwa2021]) were published, due to lack of datasets with LiDAR data for curb detection.
Iii Proposed Method
Generally, curbs are the boundaries between a road area and non-road areas such as sidewalks or vegetation. A curb instance is a boundary with continuous spatial distribution along the road, and typically, there are two curb instances on a straight road, and four on cross-roads.
The framework of our two-stage curb labeling method is shown in Fig. 3. A CI map is generated from the LiDAR data sequence and synchronized pose data in the first stage, and then projected back to each LiDAR frame to label curbs in the second stage. The raw LiDAR data and pose data used in the examples and illustrations presented in this paper are from the dataset in [Behley2019]. But our labeling method is adequately general to apply to other similar datasets.
As mentioned above, curbs in single-frame LiDAR point clouds are commonly partially observed for curb labeling in some complex scenarios, due to occlusion or point cloud sparsity. As shown in Fig. 2(a), curbs on the left and right of the road area are occluded by three static cars. Fortunately, curbs are unmovable and smooth, thus after multi-frame point clouds are accumulated by pose data, as shown in Fig. 2(b)-(c), the boundaries between roads and sidewalks become more complete on account of multiple observations from a set of varied perspectives. Therefore, labeling curbs in multi-frame LiDAR data or in a SHD map can be promisingly achieve more accurate results and also have the advantage of high efficiency.
Iii-B Stage 1: Curb Instance Map Generation
In this stage, we generate a global CI map with LiDAR data and synchronized pose data to prepare for the curb labeling for a LiDAR data sequence. It is not advisable to roughly utilize the pose data to superimpose the multi-frame LiDAR data, which will lead to global inconsistency for a data sequence with loop closures. As mentioned in [Behley2019], streets which are revisited have different heights if multiple scans are superimposed above each other in a simple way.
Iii-B1 RHD Map Generation
In the first stage of our method, a semantic segmentation network[Gerdzhev2020] and a LiDAR-based SLAM system[Pan2021] are employed to build a SHD map as shown in Fig. 1(a). In [Gerdzhev2020], the novel semantic segmentation technique divides raw point clouds into 20 categories, while the categories of road users such as vehicles, pedestrians, etc., are removed in RHD map since they are irrelevant for extracting curbs, as shown in Fig. 1(b). Then we subdivide the RHD map into several RHD sub-maps in order to parallelize subsequent map processing and labeling.
Iii-B2 Curb Candidate Points Extraction
For each RHD sub-map, an empty 2D grid map is applied and the sub-map’s point cloud is projected into it for fast extracting of curb candidate points. Cells in the grid map could be divided into 4 categories: road cell, non-road cell, curb cell and unknown cell. A road cell only contains road points. A non-road cell contains non-road points, and there is no point in unknown cells. As for a curb cell, it should satisfy the condition that both road points and non-road ones are contained in the cell. However, point clouds of trunk and vegetation often invade the airspace of road area as shown in Fig. 4. Hence, an extra height condition is induced for curb cells: height distributions of road points and non-road points in each cell must be similar. Only the cells meeting both of the above two conditions are curb cells and the points in these curb cells are curb candidate points.
Iii-B3 Curb Points Growing for Aggregation
Curb candidate points in each RHD sub-map are disordered and a clustering algorithm needs to be implemented to generate curb instances. Since the distribution of curb candidate points is narrow and uneven, classical clustering algorithms, such as K-means[MacQueen1967] and DBSCAN (Density-Based Spatial Clustering of Applications with Noise)[Ester1996]
, often mistakenly divide a curb instance into several clusters or determines some sparse curb candidate points as noise. Therefore, we design an algorithm based on K-nearest neighbors (KNN) algorithm[Altman1992] for simultaneous clustering and sorting curb candidate points. The detailed procedure is shown in Alg. 1. Function is a dual growth strategy and is designed in iterations of curb growing. As shown in Fig. 5, if there is no curb candidate point within the first valid range due to obstruction of a parked car or other obstacles, the second growth takes effect, which has a larger growing range but a narrower valid range than the first growth. Once the growing iteration of the th curb completes, point array is flipped and combined with and as the final ordered point array of the th curb instance. After the procedure in Alg. 1, all curb candidate points are clustered into multiple sets of curb points, and the points in each set are ordered.
Curb Candidate Points Extraction and Curb Points Growing for Aggregation can be parallelized for the multiple subdivided RHD sub-maps. When a RHD map was subdivided into multiple sub-maps, curbs crossing adjacent sub-maps would be split into several pieces. Therefore, while merging curb points from all sub-maps to one global CI map, it is necessary to integrate the split pieces and re-number all curb point sets. The split curbs to be integrated should meet the following conditions: the distance between two endpoints in adjacent sub-maps is less than and the orientation angle between the curbs should be less than .
Iii-C Stage 2: Curb Labeling in Each LiDAR Frame
Iii-C1 Coarse Curb Extraction
The CI map is in a global coordinate frame, and it is necessary to transform it to each LiDAR coordinate frame by using the corrected pose data. The transformation is presented in Eq. 1.
where is the CI map and is the curb point in . and are rotation and translation in the th frame in the pose data. is the curbs of within a circular range around the position of LiDAR and is the transformed curb annotations corresponding to the th frame of LiDAR data. is slightly larger range than ROI of LiDAR.
If there are curb instances in totally, we describe these curb instances as Eq. 2.
Iii-C2 Fine Curb Extraction
In this step, we determine the lengths and endpoints of curb instances according to the distributions of road points and curb-related (sidewalk or vegetation) points in each single-frame LiDAR data. Eq. 3 is used to judge whether the curb points in satisfy the condition of fine extraction.
where indicates the th point in . and are point clouds of road category and curb-related category respectively. and are radii of circular range around . Function calculates the number of points in or within the corresponding circular range around . is a weighting factor.
Then the following Eq. 4 is used for fine curb extraction.
where is index set of the curb points satisfying the function . and are indexes of fine extraction endpoints of . is the th curb annotation in the th frame after fine extraction. is a score threshold of .
In Eq. 4, in order to keep continuity of curb annotations, we determine the lengths and endpoints of curb annotations by the maximum and minimum indexes in . Furthermore, we utilize segmental spline curves to fit the curb annotations and re-sample them with an equal interval to get the final curb annotations.
Our curb dataset could support both the semantic segmentation and instance segmentation, and performances of these tasks, in turn, could help confirm the validity of our dataset. We conduct experiments of curb dataset generation on the SemanticKITTI dataset[Behley2019], which is a large-scale dataset widely used for semantic segmentation with LiDAR data and consists of 22 sequences, splitting sequences 00-10 as the training set, and 11-21 as the test set. The file format of our curb dataset follows [Behley2019], and the curbs in sequences 00-10 are labeled by our labeling method, totaling 23,201 frames, 55,013 curb instances, and 23,149,310 curb points.
Iv-a Build a Curb Dataset
Iv-A1 Stage 1
In the curb candidate points extraction, the resolution of 2D grid map is set to (0.2 0.2). In each cell, we adopt average height of points indicating the height distribution. Height difference threshold between average heights of road points and non-road points is set to 0.3. In the curb points growing for aggregation, the point number threshold is set to . The first growing range (the initial searching range) is set as and the first valid angle range is set as . The second growing range and the second valid angle range are set to and , respectively.
Iv-A2 Stage 2
The coarse extraction range is set to . The parameters in fine curb extraction are set as: . Some examples of curb annotations with raw point clouds in BEV representation are shown in Fig. 9.
Iv-B Curb Instance Segmentation
We formulate curb instance segmentation as a pixel-wise segmentation task and present a curb instance segmentation network. As shown in Fig. 7, U-net[Ronneberger2015] is adapted as the basic architecture. With the preprocessing of LiDAR data and curb labels in [Younghwa2021], single-frame point cloud is encoded by a density image and multiple sliced height images in BEV representation. Curb annotations are projected into images as labels and the curb pixels are dilated to balance the proportion between positive and negative pixels for training. Furthermore, an embedding decoding layer is utilized for curb instance segmentation referring to the embedding branch in [Neven2018]. After masking the pixels of the embedding layer with the binary segmentation result from the segmentation layer, DBSCAN is used to cluster the embedded points for curb instance extraction.
We evaluate the performances of curb binary segmentation upon our dataset and the dataset in [Younghwa2021], [Suleymanov2019]. The basic architecture of binary segmentation in our instance segmentation network shares the same structures with them. Table I shows that our curb dataset performs as well as the other manual-labeling dataset. The resolution of BEV representation is set to . 1 pixel’s tolerance is used, which means that only segmented curb pixels located at a distance of less than 0.1 with respect to the curb labels are considered as true positive.
|Dataset||Precision||Recall||F-1 score||Image size||Tolerance|
To ensure a fair comparison with other datasets, we conduct experiments only using sequence 00 (4,541 frames) of our curb dataset.
frames in the sequence 00 are randomly selected as the valid set, and the rest are used as the training set. To train the network, the pixel-wise binary cross-entropy loss is employed as semantic loss function and the discriminative loss[DeBrabandere2017] is used in our instance loss function.
In order to investigate the effect of partition methods splitting the training set and the valid set in curb binary segmentation, as shown in Table II, a comparative experiment is conducted on sequence 00 with different partition approaches, and the curb dilating kernel pixel size is set to 7. In the front-back partition approach, the first 9/10 frames in the sequence are selected as the training set and the rest as the valid set. Since the frames of valid set randomly selected in the sequence have higher similarity with the adjacent frames in the training set, the random partition approach performs better in this experiment.
Furthermore, the impact of different dilating kernel pixel sizes is analyzed and shown in Table III. The resolution of BEV representation is set to 0.1 and the partition approach is set to random partition. A larger kernel size will result in more a favorable balance of positive and negative pixels, but greater position errors in binary segmentation results.
Finally, the performance of curb instance segmentation is demonstrated in Table IV
. The partition approach is set to random partition and the curb dilating kernel pixel size is set to 7. True positive curb instances are with IoU values beyond the IoU threshold. The curb instance segmentation implemented on our dataset achieve promising results, and a higher IoU threshold contributes to a lower precision and recall.
Iv-C Semantic Segmentation with Curbs
Semantic segmentation of LiDAR data is a point-wise classification task. Raw point clouds are projected into the dilated label image in Fig. 9
, and points that fall on the positive pixels are labeled as curb category. These points were previously classified as road, sidewalk or vegetation category, but now classified as curb category. The semantic segmentation network[Gerdzhev2020] is employed again to train the data with curbs. Ground truths and predictions are shown in Fig. 10. Following the original semantic segmentation challenge in [Behley2019], we train the network on sequences 00-07 and 09-10, and evaluate on sequence 08. As shown in Table V, segmentation results upon the dataset with curb category achieve comparable performances with the original dataset (20 categories), proving the validity of our annotations in the semantic segmentation task.
In this paper, we presented an efficient two-stage curb labeling method which can generate point-wise and instance-wise curb annotations on LiDAR data. A range of baseline experiments of both instance segmentation and semantic segmentation have been implemented and evaluated on a generated dataset by our proposed method. The generated curb dataset has been released.