Fast Lidar Clustering by Density and Connectivity

03/01/2020 ∙ by Frederik Hasecke, et al. ∙ Aptiv Bergische Universität Wuppertal 0

Lidar sensors are widely used in various applications, ranging from scientific fields over industrial use to integration in consumer products. With an ever growing number of different driver assistance systems, they have been introduced to automotive series production in recent years and are considered an important building block for the practical realisation of autonomous driving. However, due to the potentially large amount of Lidar points per scan, tailored algorithms are required to identify objects (e.g. pedestrians or vehicles) with high precision in a very short time. In this work, we propose an algorithmic approach for real-time instance segmentation of Lidar sensor data. We show how our method leverages the properties of the Euclidean distance to retain three-dimensional measurement information, while being narrowed down to a two-dimensional representation for fast computation. We further introduce what we call skip connections, to make our approach robust against over-segmentation and improve assignment in cases of partial occlusion. Through detailed evaluation on public data and comparison with established methods, we show how these aspects enable state-of-the-art performance and runtime on a single CPU core.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 4

page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Precise segmentation of object instances is an important tool for a variety of applications ranging from object detection to computer-aided data labelling. Lidar sensor data is usually represented as a three-dimensional point cloud in Cartesian coordinates. Hence it makes sense to consider clustering algorithms to fulfil the task of object segmentation for this type of sensor. Classic spatial or multidimensional point-based approaches [7][1] are suitable to solve this problem, but suffer from long and fluctuating runtime. This is a problem, especially for automotive applications on embedded devices which are geared towards reliable and fixed timings of operations. Other methods, like the approach by Moosmann et al. [12], are targeted towards Lidar point clouds, but are also too complex for real-time application, since they work on finding clusters in the three-dimensional domain. More recent works in this field [4][3][17] use lower dimensional representations to segment clusters, but still work sequentially in several iterations, resulting in a potentially high and/or fluctuating runtime.
This work presents a Lidar point cloud segmentation approach which provides (1) a high level of accuracy in point cloud segmentation, while (2) being able to run in real time, faster than usual sensor recording frequencies at (3) a constant rate with very little fluctuation independent of the scene’s context. We do so by avoiding the creation of a three-dimensional point cloud from the range measurements provided by the Lidar scanner and work directly on the laser range values of the sensor. Alternatively, computations can be applied to a cylindrical range image projection of the three-dimensional point cloud. This approach circumvents the problem of sparsity in the point cloud by forcing a two-dimensional neighbourhood on each measurement and thus offers the advantage of working with dense, two-dimensional data with clearly defined neighbourhood relationships between adjacent measurements. We avoid a loop based implementation for the computation to bypass fluctuations in the runtime depending on the number of objects and complexity of the scene. A Python implementation of our approach runs in real time on a single CPU core.

Ii Related Work

There is a large number of previous works on Lidar instance segmentation especially in, but not limited to, automotive application. The main focus for most clustering approaches is on improving segmentation accuracy and execution time. Most of these separate objects in the three-dimensional space, resulting in high accuracy but comparatively long runtime. Implementations of these can be found in the DBSCAN [7], Mean Shift [8][6] and OPTICS [1] algorithm.
Other approaches use voxelization to reduce the complexity of the point cloud and find clusters in this representation [9] or bird’s eye view projection coupled with the height information to separate overlapping objects [10].
Moosmann et al. [12] proposed the use of a local convexity criterion on the spanned surface of three-dimensional Lidar points in a graph-based approach. While showing good results in urban environments and avoiding the additional step of a ground plane extraction, the algorithm fails to achieve real-time capability. Based on this work, Bogoslavskyi and Stachniss [4] used a similar criterion - the spanned angle between adjacent Lidar measurements in relation to the Lidar sensor origin - to define the convexity of the range image surface as a measure to segment objects. They further utilise the neighbourhood conditions in the range image to display a very fast execution time. Zermas et al. [17] exploit the same relationship in scan lines of Lidar sensors to find break points in each and merge the separate lines of channels into three-dimensional objects in a subsequent step.

Other current methods use machine learning directly on three-dimensional point clouds

[11][18][16], projections into a camera image [14], or on spherical projections of Lidar points in the range image space [15], to segment object instances in point clouds. These results look very promising in some cases, but suffer from a longer runtime, which currently prevents application on embedded automotive hardware.

Iii Methods

The raw data of Lidar sensors is usually provided as a list of range measurements, each coupled with a number relating to the origin channel and the lateral position. These two values are corresponding to the and position of the measurement in the range image shown in Figure 1. These values can also be used to create a three-dimensional representation of the measurements, but doing so will increase the computation time for the proposed method. Therefore, we work directly on the raw two-dimensional representation. Figure 1 shows an example of a range image as well as a three-dimensional representation of Lidar measurements.

Fig. 1: Top: Range image representation of the raw distance measurements. Bottom: Three-dimensional visualisation of the range measurements coupled with the respective channel of origin and lateral position.

For the application of object segmentation we assume that the Lidar is part of a ground-based vehicle. Therefore we want to extract and discard the points belonging to the ground plane from the computation. This will prevent the algorithm from connecting two instances via this plane. Here, a simple height based threshold is not sufficient, as the road surface itself can be uneven. Pitching and rolling of the ego vehicle can also influence the way the ground is perceived in the sensor data. Since we have detailed information about the Lidar sensor itself, we can use the angle position of each given channel to determine the angle in which the laser beam would hit a horizontal surface. We use this information to exclude all range image values belonging to a horizontal plane below a certain height. To do so, we compare each cell of the range image with the neighbouring one above using the equation

in which the selected cell value is and the one from the cell above is corresponding to the respective depth measurement. For the implementation, we use the function atan2 instead of the arctan to ensure proper handling of all quadrants of the Euclidean plane. As a result, we obtain an image representing the angle values of the entrance angle in relation to the point cloud surface. Together with a lookup table of the channel angles with respect to the channel

, we can now exclude all range image cells that have a high probability of belonging to the ground. A similar metric has been described by Chu et al.

[5]

to separate the ground without a ground plane estimation. The left drawing in

Figure 2 depicts the relationship of the channel angles to the surface angles .

Fig. 2: Trigonometrical relations used in the proposed clustering method. Left: The angle measurement of Lidar points in the vertical direction can be used to define a horizontal orientation for the ground plane extraction.                                                                                                                                    Right: With the Lidar Sensor in O, the lines OA and OB show two neighbouring distance measurements. The distance between the two measurements can be calculated using the spanned angle between the points.

After removing the Lidar measurements belonging to the ground plane from the range image, we use a systematic approach to create an object instance segmentation for Lidar sensor data through clustering in the image space of the range image. Inspired by the direct connectivity of Bogoslavskyi and Stachniss [4], we exploit the neighbourhood relationship of adjacent measurements in the range image. As visualised in the right illustration of Figure 2, we compare the given range values and for each pair of Lidar measurements and apply the cosine law

to calculate the Euclidean distance of the measured points. The angle is required for the calculation and usually provided by the manufacturer of the Lidar sensor for both the horizontal and vertical direction. Using the physical distance between two measured points, we can define a threshold value between those which are close enough together to belong to the same object, or too far apart to be considered neighbours on the same object. The distance of neighbouring points on a given object is in general relatively dense. The distances of those points in the range image from two separate objects are substantially larger.
With use of the Euclidean distance as a threshold value, we provide a single parameter implementation with a clear physical meaning which is adaptable to different sensors. By exclusively using variables which are given by the range measurements and reducing the computational effort by pre-calculating the cosine of the given angles, we can reduce the calculation of the squared Euclidean distance to a total of four multiplications, one addition and one subtraction.

These operations are present and optimised in most embedded devices and allow a much more efficient runtime for an automotive application of this approach.
With the calculated threshold between each measurement, we are able to connect all Lidar points in the range image into separate clusters and background points. Here we intentionally decided against a loop based implementation to iterate over all rows and columns of the range image to connect or separate the cells in the image, as this tends to be susceptible to disruptions in the runtime. Especially in scenes of multiple small objects, which force many iterations of the outer loop. We recommend the work by Bogoslavskyi and Stachniss [4] for further reading on a loop based implementation of a threshold calculation between measurements in a range image implementation.
Our approach exploits the vectorised nature of the range image by applying the connection or separation of neighbouring points, without the use of a loop over the rows and columns in the image. To do so, we create two virtual copies of the range image and shift one copy over the axis and the other one over the axis, thus shifting each measurement by one cell in the and direction respectively. These shifted images enable us to stack them on a third axis and compare each value with his vertical and horizontal neighbour over the whole multidimensional array.
By applying the threshold on these distance values calculated for each measurement and its direct neighbourhood, we end up with the original range image and two binary images representing the connection or separation between two points in the range image. For further use of these images and to reduce the computational expenditure, the range image can be reduced to a binary image representing the presence and absence of Lidar measurements for the corresponding pixels in the range image.
The three created binary images contain all required information to segment the Lidar measurements of the whole frame into clusters and background points. For this we utilise a simple and efficient image processing algorithm; the pixel connectivity. The 4-connected pixel connectivity, also known as von Neumann neighbourhood, is defined as a two-dimensional square lattice composed of a central cell and its four adjacent cells. To apply the pixel connectivity on our data, we need to combine the binary Lidar measurements with the binary threshold images of connected measurements. By arranging these three images as shown in Figure 3, we are able to apply 4-connectivity on the resulting image to label each island of interconnected measurements as a different cluster. The usage of this condition allows us to avoid a loop based implementation once again. The resulting segmented image can now be subsampled to the original range image.

Fig. 3: Combination of defined image representations for instance segmentation. The red squares represent the Lidar range values, the yellow and blue squares represent the horizontal and vertical connections of these measurements respectively. The black cross depicts how a 4-connectivity kernel is applied to the resulting combined image.

In a subsequent step we can apply a threshold on the labelled clusters for objects below a certain number of Lidar measurements to reduce false clusters resulting from noise in the sensor, in our case we decided on a minimum of 100 points to be considered a cluster candidate. We have thus segmented the measurements to connected components of separate objects and non-segmented points which correspond to the ground plane and background noise.

Iii-a Skip connections

Due to the characteristics of Lidar sensors, namely the sparsity (especially in vertical direction) and missing measurements resulting from deflected laser beams, which have no remission value back to the sensor, segmentation algorithms like the proposed one can be prone to under- or over-segmenting. Missing values can result in missing connections between areas of the same object, due to which the direct neighbourhood approach described above will over-segment a single object into multiple clusters. Examples of such challenging instances are shown in Figure 7.
To overcome the limitations of the direct neighbourhood approach and to ensure a more robust segmentation, we have extended the two-dimensional Euclidean clustering by what we have named skip connections. For this, we reduce the combined image shown in Figure 3 to a sparse matrix, connecting only every second point in the vertical and horizontal direction, thus connecting a subset of original points. The schematic visualisation in Figure 4 displays a connection of each measurement with its second neighbour. Due to the known angle between all measurements, we can extend the Euclidean distance calculation from each measurement to any other using the cosine law described above. This allows us to robustly connect segments of the same object, which have no direct connection due to missing measurements in the range image. An example of this improved segmentation can be seen in Figure 7. These additional skip connections can be adjusted according to the utilised sensor depending on the sparsity and proneness to errors. In the results shown in Section IV we have added two-skip connections as visualised in Figure 4.

Fig. 4: Additional skip connections (dotted lines) between non-neighbouring Lidar points on top of the direct connections via neighbouring points (yellow and blue squares).

The combined use of the direct connectivity of neighbouring measurements and the skip connections enables a pseudo three-dimensional Euclidean clustering while exploiting the fast runtime of two-dimensional pixel connectivity. Thus, we are able to improve the quality of the segmentation without sacrificing our real-time ability.

Iv Experimental Evaluation

We have set up the evaluation to prove our three claims of (1) a high level of accuracy in point cloud segmentation, (2) being able to run in real time at usual sensor recording frequencies and (3) offering a constant processing rate with very little fluctuation independent of the scene’s context. The first experimental evaluation refers to the second and third claim, the second experiment is mainly concerned with a qualitative metric of the segmentation results.

Iv-a Runtime

Following the experimental setup of [4][3] we designed our first experiment on the provided data by Moosmann et al. [13] to support the claim that the proposed approach can be used for online segmentation. All listed methods have been evaluated on the same Intel® Core™ i7-6820HQ CPU @ 2.70 GHz. Figure 5 shows the execution time of the four methods over the 2500 Frames dataset. Please note, that we have removed the empty frames of the dataset from the evaluation. The proposed method runs as fast as the algorithm of [4][3] while suffering from less fluctuation due to the vectorised implementation. A box-plot of the average runtime of [4] and the proposed method can be seen in Figure 6 which shows the fluctuating nature of the loop based method as opposed to ours.

Fig. 5: Frame-wise execution timings for a 64-beam Velodyne dataset [13]. Please note the logarithmic scale for the runtime.
Fig. 6: Averaged timings for segmenting approximately 2,500 scans from a 64-beam Velodyne dataset [13] with our approach and the method by Bogoslavskyi and Stachniss [4].

As can be seen in Figure 5, the proposed method together with the additional skip connections suffers from a slightly longer runtime, while still running at a frequency above 30 Hz. This is three times faster than the recording frequency of the used sensor.

Iv-B Segmentation Results

For our evaluation, we use the dataset ”SemanticKITTI” by Behley et al. [2]

as ground truth for object instance segmentation. For each ground truth object with more than 100 Lidar point measurements, we select the cluster of each algorithm with the most overlap with the ground truth. Using these two lists of points, we calculate the Intersection over Union (IoU). We compute this metric for each algorithm listed below, over all ten sequences with Lidar instance ground truth in the dataset. We present the mean and standard deviation of the IoUs with the best parameter for each method. With our threshold parameter as

on the direct neighbourhood implementation without skip connections, we outperform [4][3] with the same average runtime. Together with the proposed skip connections and our distance threshold as we manage to come very close to the more general three-dimensional Euclidean distance clustering algorithm DBSCAN as shown in Table I while being on average faster by a factor of 17 and a maximum-value difference by a factor of 60 due to the fluctuating nature of the DBSCAN algorithm (80 and 225 times faster without skip connections).
For a further evaluation of the instance-level performance, we compute the average precision in a similar fashion as established benchmarks for instance segmentation for image data. We define 10 bins of point-wise overlap of the ground truth and proposed clusters ranging from an IoU of 0.5 to 0.95 in steps of 0.05. We average the precision of all bins into one single metric score, the Average Precision (AP) which is shown in Table II for each method. We also additionally list the AP for the overlap values of 50%, 75% and 95% in which the evaluation is restricted to objects above the denoted IoU. We show in Table II that we matched as many ground truth instances as the DBSCAN algorithm for an IoU over 50%, while we are still above 60% AP for all instances with an IoU over 75%. This is particularly important in the context of driver assistance systems, since a missed instance is a bigger problem than an object that was not matched perfectly. We further proved, that the proposed skip connections improve the results of our algorithm and help to find otherwise missed objects.

Method
Bogoslavskyi et al.[4][3] 63.75 32.19 81.31 11.58
Ours 65.29 33.07 83.37 11.72

Ours
66.64 32.74 83.70 11.30
(Skip Connections)
DBSCAN 67.61 34.92 86.59 11.04
TABLE I: Comparison of the Intersection over Union of Bogoslavskyi et al. [4][3], DBSCAN [7], and the proposed method
Fig. 7: Left: Direct connectivity between neighbouring Lidar points.                                                                                                                                    Right: Additional skip connections between every second Lidar measurement. The proposed skip connections enable a more accurate segmentation of the car and reduce the over-segmentation of occluded objects as the truck in the bottom figures.
Method
Bogoslavskyi et al.[4][3] 49.66 73.45 54.12 4.66
Ours 52.80 73.59 57.45 8.66
Ours 54.72 75.57 60.11 8.23
(Skip Connections)
DBSCAN 58.54 74.86 63.27 17.92
TABLE II: Comparison of the Average Precision of Bogoslavskyi et al. [4][3], DBSCAN [7], and the proposed method

V Conclusion

We have presented an algorithm for real-time instance segmentation of Lidar sensor data by using raw range images. To make this approach more robust against over-segmentation, we introduced what we call skip connections, to use the larger neighbouring context for a more precise assignment of measured points to an instance, especially in cases of partial occlusion. These properties of our method facilitate the preservation of three-dimensional information in the measurements when being reduced to a two-dimensional representation for fast computation. In a detailed evaluation, we have shown, that our approach is as fast as comparable state-of-the-art methods, while being more stable in its runtime, and more importantly, providing an overall better performance in instance segmentation. The experiments show, that not only our accuracy in separating objects is higher than with comparable approaches, but we are able to match most ground truth instances that only the far more complex DBSCAN algorithm was able to match. This is particularly important in the context of driver assistance systems, since a missed instance is a bigger problem than an object that was not matched perfectly. We have also shown, that the proposed skip connections improve the results of our algorithm and help to find otherwise missed objects. The further use of segmented point clouds for classification and to remove false positives, is outside of the scope of this work and will be covered in future publications.

References

  • [1] M. Ankerst, M. M. Breunig, H. P. Kriegel, and J. Sander (1999) OPTICS: Ordering Points to Identify the Clustering Structure. SIGMOD Record (ACM Special Interest Group on Management of Data) 28 (2), pp. 49–60. External Links: Document, ISSN 01635808 Cited by: §I, §II.
  • [2] J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall (2019)

    SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences

    .
    (iii). External Links: 1904.01416, Link Cited by: §IV-B.
  • [3] I. Bogoslavskyi and C. Stachniss (2017) Efficient online segmentation for sparse 3d laser scans. PFG – Journal of Photogrammetry, Remote Sensing and Geoinformation Science, pp. 1–12. External Links: Link Cited by: §I, §IV-A, §IV-B, TABLE I, TABLE II.
  • [4] I. Bogoslavskyi and C. Stachniss (2016) Fast range image-based segmentation of sparse 3D laser scans for online operation. IEEE International Conference on Intelligent Robots and Systems 2016-Novem, pp. 163–169. External Links: Document, ISBN 9781509037629, ISSN 21530866 Cited by: §I, §II, §III, Fig. 6, §IV-A, §IV-B, TABLE I, TABLE II.
  • [5] P. Chu, S. Cho, S. Sim, K. Kwak, and K. Cho (2017) A fast ground segmentation method for 3D point cloud. Journal of Information Processing Systems 13 (3), pp. 491–499. External Links: Document, ISSN 2092805X Cited by: §III.
  • [6] D. Comaniciu and P. Meer (2002) Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (5), pp. 603–619. External Links: Document, ISSN 01628828 Cited by: §II.
  • [7] M. Ester, H. Kriegel, J. Sander, and X. Xu (1996) A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD’96, pp. 226–231. Cited by: §I, §II, TABLE I, TABLE II.
  • [8] K. Fukunaga and L. Hostetler (1975-01)

    The estimation of the gradient of a density function, with applications in pattern recognition

    .
    IEEE Transactions on Information Theory 21 (1), pp. 32–40. External Links: Document, ISSN 1557-9654 Cited by: §II.
  • [9] M. Himmelsbach, F. v. Hundelshausen, and H. -. Wuensche (2010-06) Fast segmentation of 3d point clouds for ground vehicles. In 2010 IEEE Intelligent Vehicles Symposium, Vol. , pp. 560–565. External Links: Document, ISSN 1931-0587 Cited by: §II.
  • [10] D. Korchev, S. Cheng, Y. Owechko, and K. Kim (2013-07) On real-time lidar data segmentation and classification. pp. . Cited by: §II.
  • [11] J. Lahoud, B. Ghanem, M. Pollefeys, and M. R. Oswald (2019) 3D Instance Segmentation via Multi-Task Metric Learning. External Links: 1906.08650, Link Cited by: §II.
  • [12] F. Moosmann, O. Pink, and C. Stiller (2009) Segmentation of 3D lidar data in non-flat urban environments using a local convexity criterion. IEEE Intelligent Vehicles Symposium, Proceedings, pp. 215–220. External Links: Document, ISBN 9781424435043 Cited by: §I, §II.
  • [13] F. Moosmann (2013) Interlacing self-localization, moving object tracking and mapping for 3d range sensors. Ph.D. Thesis, KIT Scientific Publishing, Karlsruhe, (english). External Links: Document, ISBN 978-3-86644-977-0, ISSN 1613-4214 Cited by: Fig. 5, Fig. 6, §IV-A.
  • [14] B. H. Wang, W. L. Chao, Y. Wang, B. Hariharan, K. Q. Weinberger, and M. Campbell (2019) LDLS: 3-D Object Segmentation Through Label Diffusion from 2-D Images. IEEE Robotics and Automation Letters 4 (3), pp. 2902–2909. External Links: Document, 1910.13955, ISSN 23773766 Cited by: §II.
  • [15] Y. Wang, Y. Yu, and M. Liu (2019) PointIT: A Fast Tracking Framework Based on 3D Instance Segmentation. External Links: 1902.06379, Link Cited by: §II.
  • [16] B. Yang, J. Wang, R. Clark, Q. Hu, S. Wang, A. Markham, and N. Trigoni (2019) Learning Object Bounding Boxes for 3D Instance Segmentation on Point Clouds. (NeurIPS), pp. 1–14. External Links: 1906.01140, Link Cited by: §II.
  • [17] D. Zermas, I. Izzat, and N. Papanikolopoulos (2017) Fast segmentation of 3D point clouds: A paradigm on LiDAR data for autonomous vehicle applications. Proceedings - IEEE International Conference on Robotics and Automation (May), pp. 5067–5073. External Links: Document, ISBN 9781509046331, ISSN 10504729 Cited by: §I, §II.
  • [18] F. Zhang, C. Guan, J. Fang, S. Bai, R. Yang, P. Torr, and V. Prisacariu Instance Segmentation of LiDAR Point Clouds. Cited by: §II.