How to Build a Curb Dataset with LiDAR Data for Autonomous Driving

10/08/2021
by   Dongfeng Bai, et al.
0

Curbs are one of the essential elements of urban and highway traffic environments. Robust curb detection provides road structure information for motion planning in an autonomous driving system. Commonly, video cameras and 3D LiDARs are mounted on autonomous vehicles for curb detection. However, camera-based methods suffer from challenging illumination conditions. During the long period of time before wide application of Deep Neural Network (DNN) with point clouds, LiDAR-based curb detection methods are based on hand-crafted features, which suffer from poor detection in some complex scenes. Recently, DNN-based dynamic object detection using LiDAR data has become prevalent, while few works pay attention to curb detection with a DNN approach due to lack of labeled data. A dataset with curb annotations or an efficient curb labeling approach, hence, is of high demand...

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

page 5

page 6

05/20/2020

Deep Learning for LiDAR Point Clouds in Autonomous Driving: A Review

Recently, the advancement of deep learning in discriminative feature lea...
03/31/2022

A Survey of Robust 3D Object Detection Methods in Point Clouds

The purpose of this work is to review the state-of-the-art LiDAR-based 3...
07/11/2019

Online Inference and Detection of Curbs in Partially Occluded Scenes with Sparse LIDAR

Road boundaries, or curbs, provide autonomous vehicles with essential in...
08/11/2021

Capture Uncertainties in Deep Neural Networks for Safe Operation of Autonomous Driving Vehicles

Uncertainties in Deep Neural Network (DNN)-based perception and vehicle'...
04/17/2022

Anomaly Detection in Autonomous Driving: A Survey

Nowadays, there are outstanding strides towards a future with autonomous...
11/18/2021

LiDAR Cluster First and Camera Inference Later: A New Perspective Towards Autonomous Driving

Object detection in state-of-the-art Autonomous Vehicles (AV) framework ...
12/10/2019

Scalability in Perception for Autonomous Driving: An Open Dataset Benchmark

The research community has increasing interest in autonomous driving res...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

(a) SHD map
(b) RHD map
(c) CI map
(d) Zoom-in view of the circular region in (a)
(e) Zoom-in view of the circular region in (b)
Fig. 1: In the first stage of our labeling method, a CI map is generated with consecutive LiDAR frames. (a) SHD map: generated by a SLAM framework and a semantic segmentation network. (b) RHD map: generated by removing the dynamic noise in (a). Blue pixels are road areas, green pixels are non-road areas, and red ones are parking areas. (c) CI map: output of the first stage, each curb instance is shown in a different color. (d) and (e) are zoom-in views in the circular regions of (a) and (b).

In the past few years, autonomous driving has attracted tremendous attention and been developing rapidly. Vehicle-mounted sensors, such as LiDAR, radar and video camera, are extensively utilized in multiple environmental perception tasks ranging from object detection and tracking, semantic segmentation, lane and curb detection for autonomous driving applications. Recently, a various of benchmark datasets have been proposed to satisfy demands of algorithm evaluation and testing. For instance, [Geiger2012, Caesar2020, Sun2020, Wang2019, Cordts2015] collected a large amount of camera and LiDAR data for object detection, tracking and segmentation, and [Tusimple2017], [Xingang2018] were released for lane line detection based on video camera data. While there are relatively few public benchmarks or datasets available for the challenge of LiDAR-based curb detection, which plays a critical role in road environmental perception. We aim to address this gap and concentrate on how to build such a curb dataset efficiently.

Existing curb labeling methods (e.g. [Chen2015, Zhang2018, Liang2019]) are mostly based on manual ways. In [Chen2015], Chen et al. built a curb dataset containing 2,934 LiDAR scans in various urban scenes and 566 scans in the dataset were labeled manually. Zhang et al. [Zhang2018] collected about 200 scans in five different scenarios and manually labeled the curbs in each frame. Recently, [Younghwa2021] built and released a curb dataset consisting of about 5,200 scans with BEV labels and encoded images.

Labeling curbs manually are inefficient, costly and error-prone, especially in LiDAR point clouds. Furthermore, due to sparsity of faraway point clouds and blocking by road users, labeling curbs in single LiDAR frames often suffers from partial observations, which makes it provide less useful information in the training of DNN-based curb detection method.

In this paper, we propose an efficient two-stage curb labeling method with LiDAR data. Benefiting from multi-frame consecutive LiDAR data and a CI map, both visible and occluded curbs are labeled simultaneously. The contributions of this paper can be summarized as follows:

  • We propose an efficient two-stage curb labeling method which can label LiDAR data with point-wise and instance-wise annotations.

  • We present an annotated curb dataset of LiDAR sequences based on [Behley2019].

  • We perform curb instance segmentation and semantic segmentation on the labeled dataset and the curb annotations are validated.

Ii Related Works

Ii-a Road Map Generation

Curbs and lane lines act as two essential components for road map generation on structured roads (urban roads and highways). Benefiting from high contrast color of lane markers relative to road surface, most road map generation methods rely on lane lines detection using video cameras. [Jeong2017] proposed a Road-SLAM algorithm for road markings mapping and localization. In [Jang2018], Jang et al. proposed an automatic HD map generating algorithm with a monocular camera. Qin et al. [Qin2021] proposed a light-weight mapping and localization solution, which consists of on-vehicle mapping, on-cloud mapping and user-end localization. In contrast, curbs are irregular in color information, but exhibit robust consistency in spatial distribution, distinct from backgrounds. Inspired by this, some methods take advantage of LiDAR’s 3D point clouds to detect curbs and build curb maps. He et al. [He2018]

proposed a vector-based road structure mapping method using multi-beam LiDAR and used polyline as primary mapping element. In

[Wang2017], a robust road shape model was proposed and Gaussian process (GP) was employed to generate smooth curves. [Darms2010]

presented two approaches for estimating road boundary map by using a radar sensor and a video camera.

Ii-B Curb Detection

LiDAR-based curb detection methods can be divided into two categories: traditional methods and DNN-based methods. Generally, the traditional methods use hand-crafted features to extract candidate points, which are subsequently clustered and fitted to get parameterized curb results. Due to convenient deployment on computing platforms, most LiDAR-based curb detection methods applied in autonomous driving systems are still traditional ones (e.g. [Chen2015],[Hata2014],[Zhang2018]). However, traditional methods fail in some complex scenarios, such as cross-roads, roundabouts and lower urban curbs (For instance, the height difference is less than

above road surface). Moreover, the hand-crafted features contain a large number of hyperparameters and cannot adapt to different scenarios. DNN-based methods are promising to overcome these constraints, but only several works (

e.g. [Suleymanov2019],[Younghwa2021]) were published, due to lack of datasets with LiDAR data for curb detection.

(a) Single-frame
(b) Multi-frame
(c) SHD map
Fig. 2: An example of occluded curbs. The occluded curbs are marked with red rectangles. (a) Single-frame LiDAR data is sparse to label the occluded curbs. (b) After superimposing multi-frame data, curbs are more complete, and blind areas are smaller. (c) Blind areas in the corresponding SHD map are similar to (b).
Fig. 3: Overview of our proposed two-stage curb labeling method. The input data includes a LiDAR data sequence and synchronized pose data. SHD map is generated by both a LiDAR-based SLAM framework and a semantic segmentation network. A CI map is then generated by 4 steps in Stage 1 and subsequently projected back to single frames for curb labeling in Stage 2. The synchronized pose data is used and corrected in the SLAM framework, and the corrected pose data is utilized in the projection to keep consistency between the CI map and the raw LiDAR data. Coarse curb extraction and fine curb extraction are employed to collect and fine-tune curb annotations respectively.

Iii Proposed Method

Iii-a Motivation

Generally, curbs are the boundaries between a road area and non-road areas such as sidewalks or vegetation. A curb instance is a boundary with continuous spatial distribution along the road, and typically, there are two curb instances on a straight road, and four on cross-roads.

The framework of our two-stage curb labeling method is shown in Fig. 3. A CI map is generated from the LiDAR data sequence and synchronized pose data in the first stage, and then projected back to each LiDAR frame to label curbs in the second stage. The raw LiDAR data and pose data used in the examples and illustrations presented in this paper are from the dataset in [Behley2019]. But our labeling method is adequately general to apply to other similar datasets.

As mentioned above, curbs in single-frame LiDAR point clouds are commonly partially observed for curb labeling in some complex scenarios, due to occlusion or point cloud sparsity. As shown in Fig. 2(a), curbs on the left and right of the road area are occluded by three static cars. Fortunately, curbs are unmovable and smooth, thus after multi-frame point clouds are accumulated by pose data, as shown in Fig. 2(b)-(c), the boundaries between roads and sidewalks become more complete on account of multiple observations from a set of varied perspectives. Therefore, labeling curbs in multi-frame LiDAR data or in a SHD map can be promisingly achieve more accurate results and also have the advantage of high efficiency.

Iii-B Stage 1: Curb Instance Map Generation

In this stage, we generate a global CI map with LiDAR data and synchronized pose data to prepare for the curb labeling for a LiDAR data sequence. It is not advisable to roughly utilize the pose data to superimpose the multi-frame LiDAR data, which will lead to global inconsistency for a data sequence with loop closures. As mentioned in [Behley2019], streets which are revisited have different heights if multiple scans are superimposed above each other in a simple way.

Iii-B1 RHD Map Generation

In the first stage of our method, a semantic segmentation network[Gerdzhev2020] and a LiDAR-based SLAM system[Pan2021] are employed to build a SHD map as shown in Fig. 1(a). In [Gerdzhev2020], the novel semantic segmentation technique divides raw point clouds into 20 categories, while the categories of road users such as vehicles, pedestrians, etc., are removed in RHD map since they are irrelevant for extracting curbs, as shown in Fig. 1(b). Then we subdivide the RHD map into several RHD sub-maps in order to parallelize subsequent map processing and labeling.

Fig. 4: Illustration of how to extract curb candidate points. Cell A belongs to a curb cell, while Cell B belongs to a road cell.

Iii-B2 Curb Candidate Points Extraction

For each RHD sub-map, an empty 2D grid map is applied and the sub-map’s point cloud is projected into it for fast extracting of curb candidate points. Cells in the grid map could be divided into 4 categories: road cell, non-road cell, curb cell and unknown cell. A road cell only contains road points. A non-road cell contains non-road points, and there is no point in unknown cells. As for a curb cell, it should satisfy the condition that both road points and non-road ones are contained in the cell. However, point clouds of trunk and vegetation often invade the airspace of road area as shown in Fig. 4. Hence, an extra height condition is induced for curb cells: height distributions of road points and non-road points in each cell must be similar. Only the cells meeting both of the above two conditions are curb cells and the points in these curb cells are curb candidate points.

Fig. 5: Illustration of Alg. 1. To deal with occluded curbs, we design two growing ranges in function . The first one is the same with an initial range and with a valid range (). The second one is larger but with a narrower valid range (). A valid range is a fan-shaped region and its orientation is determined by the corresponding iteration vector.

Iii-B3 Curb Points Growing for Aggregation

Curb candidate points in each RHD sub-map are disordered and a clustering algorithm needs to be implemented to generate curb instances. Since the distribution of curb candidate points is narrow and uneven, classical clustering algorithms, such as K-means

[MacQueen1967] and DBSCAN (Density-Based Spatial Clustering of Applications with Noise)[Ester1996]

, often mistakenly divide a curb instance into several clusters or determines some sparse curb candidate points as noise. Therefore, we design an algorithm based on K-nearest neighbors (KNN) algorithm

[Altman1992] for simultaneous clustering and sorting curb candidate points. The detailed procedure is shown in Alg. 1. Function is a dual growth strategy and is designed in iterations of curb growing. As shown in Fig. 5, if there is no curb candidate point within the first valid range due to obstruction of a parked car or other obstacles, the second growth takes effect, which has a larger growing range but a narrower valid range than the first growth. Once the growing iteration of the th curb completes, point array is flipped and combined with and as the final ordered point array of the th curb instance. After the procedure in Alg. 1, all curb candidate points are clustered into multiple sets of curb points, and the points in each set are ordered.

0:    Curb candidate points set = in a sub-map
0:    Multiple sorted and clustered point sets =
1:  Build a k-d tree with the input point set
2:  Un-queried set
3:  
4:  while   do
5:     Pick an initial point in randomly and query the neighbors of within a circular range in . The point number of is .
6:     
7:     if   then
8:        Subdivide into and by azimuths
9:        Sort , in ascending order of 2D distance from as ,
10:        , are the iteration (farthest) points in ,
11:        Iteration vectors:
12:        Iteration sets: ,
13:        Iteration times: ,
14:        Iteration flags: ,
15:        while  do
16:            curbgrow
17:           
18:           ;
19:           
20:        end while
21:        while  do
22:            curbgrow
23:           
24:           ;
25:           
26:        end while
27:         flip
28:        
29:     end if
30:  end while
31:  return   =
Algorithm 1 Curb Candidate Points Clustering

Iii-B4 Post-processing

Curb Candidate Points Extraction and Curb Points Growing for Aggregation can be parallelized for the multiple subdivided RHD sub-maps. When a RHD map was subdivided into multiple sub-maps, curbs crossing adjacent sub-maps would be split into several pieces. Therefore, while merging curb points from all sub-maps to one global CI map, it is necessary to integrate the split pieces and re-number all curb point sets. The split curbs to be integrated should meet the following conditions: the distance between two endpoints in adjacent sub-maps is less than and the orientation angle between the curbs should be less than .

Iii-C Stage 2: Curb Labeling in Each LiDAR Frame

Iii-C1 Coarse Curb Extraction

The CI map is in a global coordinate frame, and it is necessary to transform it to each LiDAR coordinate frame by using the corrected pose data. The transformation is presented in Eq. 1.

(1)

where is the CI map and is the curb point in . and are rotation and translation in the th frame in the pose data. is the curbs of within a circular range around the position of LiDAR and is the transformed curb annotations corresponding to the th frame of LiDAR data. is slightly larger range than ROI of LiDAR.

If there are curb instances in totally, we describe these curb instances as Eq. 2.

(2)
Fig. 6: Curb labeling in a single LiDAR frame with a CI map.

Iii-C2 Fine Curb Extraction

In this step, we determine the lengths and endpoints of curb instances according to the distributions of road points and curb-related (sidewalk or vegetation) points in each single-frame LiDAR data. Eq. 3 is used to judge whether the curb points in satisfy the condition of fine extraction.

(3)

where indicates the th point in . and are point clouds of road category and curb-related category respectively. and are radii of circular range around . Function calculates the number of points in or within the corresponding circular range around . is a weighting factor.

Then the following Eq. 4 is used for fine curb extraction.

(4)

where is index set of the curb points satisfying the function . and are indexes of fine extraction endpoints of . is the th curb annotation in the th frame after fine extraction. is a score threshold of .

In Eq. 4, in order to keep continuity of curb annotations, we determine the lengths and endpoints of curb annotations by the maximum and minimum indexes in . Furthermore, we utilize segmental spline curves to fit the curb annotations and re-sample them with an equal interval to get the final curb annotations.

Iv Experiment

Our curb dataset could support both the semantic segmentation and instance segmentation, and performances of these tasks, in turn, could help confirm the validity of our dataset. We conduct experiments of curb dataset generation on the SemanticKITTI dataset[Behley2019], which is a large-scale dataset widely used for semantic segmentation with LiDAR data and consists of 22 sequences, splitting sequences 00-10 as the training set, and 11-21 as the test set. The file format of our curb dataset follows [Behley2019], and the curbs in sequences 00-10 are labeled by our labeling method, totaling 23,201 frames, 55,013 curb instances, and 23,149,310 curb points.

Fig. 7: Overview of our curb instance segmentation network.

Iv-a Build a Curb Dataset

Iv-A1 Stage 1

In the curb candidate points extraction, the resolution of 2D grid map is set to (0.2 0.2). In each cell, we adopt average height of points indicating the height distribution. Height difference threshold between average heights of road points and non-road points is set to 0.3. In the curb points growing for aggregation, the point number threshold is set to . The first growing range (the initial searching range) is set as and the first valid angle range is set as . The second growing range and the second valid angle range are set to and , respectively.

Fig. 8: Illustrations of CI maps. Different curb instances are shown in different colors in each CI map.

In the post-processing, and are set to 0.5 and

. After merging the CI sub-maps, we interpolate the points of each curb instance to an interval of

. Some CI maps are visualized in Fig .8.

Iv-A2 Stage 2

The coarse extraction range is set to . The parameters in fine curb extraction are set as: . Some examples of curb annotations with raw point clouds in BEV representation are shown in Fig. 9.

Fig. 9: Examples of our curb annotations for curb instance segmentation in single-frame LiDAR data. Dilating kernel pixel size is set to 7.

Iv-B Curb Instance Segmentation

We formulate curb instance segmentation as a pixel-wise segmentation task and present a curb instance segmentation network. As shown in Fig. 7, U-net[Ronneberger2015] is adapted as the basic architecture. With the preprocessing of LiDAR data and curb labels in [Younghwa2021], single-frame point cloud is encoded by a density image and multiple sliced height images in BEV representation. Curb annotations are projected into images as labels and the curb pixels are dilated to balance the proportion between positive and negative pixels for training. Furthermore, an embedding decoding layer is utilized for curb instance segmentation referring to the embedding branch in [Neven2018]. After masking the pixels of the embedding layer with the binary segmentation result from the segmentation layer, DBSCAN is used to cluster the embedded points for curb instance extraction.

Fig. 10: Ground truths and predictions in semantic segmentation with curbs. The first row shows ground truths and the second row shows predictions. Curb points are displayed in indigo blue color.

We evaluate the performances of curb binary segmentation upon our dataset and the dataset in [Younghwa2021], [Suleymanov2019]. The basic architecture of binary segmentation in our instance segmentation network shares the same structures with them. Table I shows that our curb dataset performs as well as the other manual-labeling dataset. The resolution of BEV representation is set to . 1 pixel’s tolerance is used, which means that only segmented curb pixels located at a distance of less than 0.1 with respect to the curb labels are considered as true positive.

Dataset Precision Recall F-1 score Image size Tolerance
[Suleymanov2019] 0.8819 0.8921 0.8870 480*480 1 pixel
[Younghwa2021] 0.9391 0.9427 0.9408 416*320 1 pixel
Ours 0.9861 0.9785 0.9818 512*384 1 pixel
TABLE I: Binary segmentation results with different datasets

To ensure a fair comparison with other datasets, we conduct experiments only using sequence 00 (4,541 frames) of our curb dataset.

frames in the sequence 00 are randomly selected as the valid set, and the rest are used as the training set. To train the network, the pixel-wise binary cross-entropy loss is employed as semantic loss function and the discriminative loss

[DeBrabandere2017] is used in our instance loss function.

In order to investigate the effect of partition methods splitting the training set and the valid set in curb binary segmentation, as shown in Table II, a comparative experiment is conducted on sequence 00 with different partition approaches, and the curb dilating kernel pixel size is set to 7. In the front-back partition approach, the first 9/10 frames in the sequence are selected as the training set and the rest as the valid set. Since the frames of valid set randomly selected in the sequence have higher similarity with the adjacent frames in the training set, the random partition approach performs better in this experiment.

Partition Precision Recall F-1 score Tolerance
Front-back 0.9243 0.9041 0.9140 1 pixel
Random 0.9861 0.9785 0.9818 1 pixel
TABLE II: Binary segmentation results with different partitions

Furthermore, the impact of different dilating kernel pixel sizes is analyzed and shown in Table III. The resolution of BEV representation is set to 0.1 and the partition approach is set to random partition. A larger kernel size will result in more a favorable balance of positive and negative pixels, but greater position errors in binary segmentation results.

Kernel
size
Position
error
Precision Recall F-1 score Tolerance
5 0.9654 0.9539 0.9596 1 pixel
7 0.9861 0.9785 0.9818 1 pixel
9 0.9906 0.9892 0.9899 1 pixel
TABLE III: Binary segmentation results with different kernel sizes
IoU
threshold
Precision Recall F-1 score Tolerance
0.5 0.9762 0.9875 0.9818 1 pixel
0.7 0.9533 0.9643 0.9588 1 pixel
TABLE IV: Instance segmentation results

Finally, the performance of curb instance segmentation is demonstrated in Table IV

. The partition approach is set to random partition and the curb dilating kernel pixel size is set to 7. True positive curb instances are with IoU values beyond the IoU threshold. The curb instance segmentation implemented on our dataset achieve promising results, and a higher IoU threshold contributes to a lower precision and recall.

Iv-C Semantic Segmentation with Curbs

Semantic segmentation of LiDAR data is a point-wise classification task. Raw point clouds are projected into the dilated label image in Fig. 9

, and points that fall on the positive pixels are labeled as curb category. These points were previously classified as road, sidewalk or vegetation category, but now classified as curb category. The semantic segmentation network

[Gerdzhev2020] is employed again to train the data with curbs. Ground truths and predictions are shown in Fig. 10. Following the original semantic segmentation challenge in [Behley2019], we train the network on sequences 00-07 and 09-10, and evaluate on sequence 08. As shown in Table V, segmentation results upon the dataset with curb category achieve comparable performances with the original dataset (20 categories), proving the validity of our annotations in the semantic segmentation task.

Categories mIoU road sidewalk vegetation curb
20 0.611 0.908 0.753 0.841
20+curb 0.601 0.935 0.785 0.814 0.709
TABLE V: Semantic segmentation results

V Conclusions

In this paper, we presented an efficient two-stage curb labeling method which can generate point-wise and instance-wise curb annotations on LiDAR data. A range of baseline experiments of both instance segmentation and semantic segmentation have been implemented and evaluated on a generated dataset by our proposed method. The generated curb dataset has been released.

References