Precise segmentation of object instances is an important tool for a variety of applications ranging from object detection to computer-aided data labelling. Lidar sensor data is usually represented as a three-dimensional point cloud in Cartesian coordinates. Hence it makes sense to consider clustering algorithms to fulfil the task of object segmentation for this type of sensor. Classic spatial or multidimensional point-based approaches 
are suitable to solve this problem, but suffer from long and fluctuating runtime. This is a problem, especially for automotive applications on embedded devices which are geared towards reliable and fixed timings of operations.
Other methods, like the approach by Moosmann et al. , are targeted towards Lidar point clouds, but are also too complex for real-time application, since they work on finding clusters in the three-dimensional domain.
More recent works in this field  use lower dimensional representations to segment clusters, but still work sequentially in several iterations, resulting in a potentially high and/or fluctuating runtime.
This work presents a Lidar point cloud segmentation approach which provides (1) a high level of accuracy in point cloud segmentation, while (2) being able to run in real time, faster than usual sensor recording frequencies at (3) a constant rate with very little fluctuation independent of the scene’s context. We do so by avoiding the creation of a three-dimensional point cloud from the range measurements provided by the Lidar scanner and work directly on the laser range values of the sensor. Alternatively, computations can be applied to a cylindrical range image projection of the three-dimensional point cloud. This approach circumvents the problem of sparsity in the point cloud by forcing a two-dimensional neighbourhood on each measurement and thus offers the advantage of working with dense, two-dimensional data with clearly defined neighbourhood relationships between adjacent measurements. We avoid a loop based implementation for the computation to bypass fluctuations in the runtime depending on the number of objects and complexity of the scene. A Python implementation of our approach runs in real time on a single CPU core.
Ii Related Work
There is a large number of previous works on Lidar instance segmentation especially in, but not limited to, automotive application. The main focus for most clustering approaches is on improving segmentation accuracy and execution time. Most of these separate objects in the three-dimensional space, resulting in high accuracy but comparatively long runtime. Implementations of these can be found in the DBSCAN , Mean Shift  and OPTICS  algorithm.
Other approaches use voxelization to reduce the complexity of the point cloud and find clusters in this representation  or bird’s eye view projection coupled with the height information to separate overlapping objects .
Moosmann et al.  proposed the use of a local convexity criterion on the spanned surface of three-dimensional Lidar points in a graph-based approach. While showing good results in urban environments and avoiding the additional step of a ground plane extraction, the algorithm fails to achieve real-time capability. Based on this work, Bogoslavskyi and Stachniss  used a similar criterion - the spanned angle between adjacent Lidar measurements in relation to the Lidar sensor origin - to define the convexity of the range image surface as a measure to segment objects. They further utilise the neighbourhood conditions in the range image to display a very fast execution time. Zermas et al.  exploit the same relationship in scan lines of Lidar sensors to find break points in each and merge the separate lines of channels into three-dimensional objects in a subsequent step.
Other current methods use machine learning directly on three-dimensional point clouds, projections into a camera image , or on spherical projections of Lidar points in the range image space , to segment object instances in point clouds. These results look very promising in some cases, but suffer from a longer runtime, which currently prevents application on embedded automotive hardware.
The raw data of Lidar sensors is usually provided as a list of range measurements, each coupled with a number relating to the origin channel and the lateral position. These two values are corresponding to the and position of the measurement in the range image shown in Figure 1. These values can also be used to create a three-dimensional representation of the measurements, but doing so will increase the computation time for the proposed method. Therefore, we work directly on the raw two-dimensional representation. Figure 1 shows an example of a range image as well as a three-dimensional representation of Lidar measurements.
For the application of object segmentation we assume that the Lidar is part of a ground-based vehicle. Therefore we want to extract and discard the points belonging to the ground plane from the computation. This will prevent the algorithm from connecting two instances via this plane. Here, a simple height based threshold is not sufficient, as the road surface itself can be uneven. Pitching and rolling of the ego vehicle can also influence the way the ground is perceived in the sensor data. Since we have detailed information about the Lidar sensor itself, we can use the angle position of each given channel to determine the angle in which the laser beam would hit a horizontal surface. We use this information to exclude all range image values belonging to a horizontal plane below a certain height. To do so, we compare each cell of the range image with the neighbouring one above using the equation
in which the selected cell value is and the one from the cell above is corresponding to the respective depth measurement. For the implementation, we use the function atan2 instead of the arctan to ensure proper handling of all quadrants of the Euclidean plane. As a result, we obtain an image representing the angle values of the entrance angle in relation to the point cloud surface. Together with a lookup table of the channel angles with respect to the channel
, we can now exclude all range image cells that have a high probability of belonging to the ground. A similar metric has been described by Chu et al.
to separate the ground without a ground plane estimation. The left drawing inFigure 2 depicts the relationship of the channel angles to the surface angles .
After removing the Lidar measurements belonging to the ground plane from the range image, we use a systematic approach to create an object instance segmentation for Lidar sensor data through clustering in the image space of the range image. Inspired by the direct connectivity of Bogoslavskyi and Stachniss , we exploit the neighbourhood relationship of adjacent measurements in the range image. As visualised in the right illustration of Figure 2, we compare the given range values and for each pair of Lidar measurements and apply the cosine law
to calculate the Euclidean distance of the measured points. The angle is required for the calculation and usually provided by the manufacturer of the Lidar sensor for both the horizontal and vertical direction. Using the physical distance between two measured points, we can define a threshold value between those which are close enough together to belong to the same object, or too far apart to be considered neighbours on the same object.
The distance of neighbouring points on a given object is in general relatively dense. The distances of those points in the range image from two separate objects are substantially larger.
With use of the Euclidean distance as a threshold value, we provide a single parameter implementation with a clear physical meaning which is adaptable to different sensors. By exclusively using variables which are given by the range measurements and reducing the computational effort by pre-calculating the cosine of the given angles, we can reduce the calculation of the squared Euclidean distance to a total of four multiplications, one addition and one subtraction.
These operations are present and optimised in most embedded devices and allow a much more efficient runtime for an automotive application of this approach.
With the calculated threshold between each measurement, we are able to connect all Lidar points in the range image into separate clusters and background points. Here we intentionally decided against a loop based implementation to iterate over all rows and columns of the range image to connect or separate the cells in the image, as this tends to be susceptible to disruptions in the runtime. Especially in scenes of multiple small objects, which force many iterations of the outer loop. We recommend the work by Bogoslavskyi and Stachniss  for further reading on a loop based implementation of a threshold calculation between measurements in a range image implementation.
Our approach exploits the vectorised nature of the range image by applying the connection or separation of neighbouring points, without the use of a loop over the rows and columns in the image. To do so, we create two virtual copies of the range image and shift one copy over the axis and the other one over the axis, thus shifting each measurement by one cell in the and direction respectively. These shifted images enable us to stack them on a third axis and compare each value with his vertical and horizontal neighbour over the whole multidimensional array.
By applying the threshold on these distance values calculated for each measurement and its direct neighbourhood, we end up with the original range image and two binary images representing the connection or separation between two points in the range image. For further use of these images and to reduce the computational expenditure, the range image can be reduced to a binary image representing the presence and absence of Lidar measurements for the corresponding pixels in the range image.
The three created binary images contain all required information to segment the Lidar measurements of the whole frame into clusters and background points. For this we utilise a simple and efficient image processing algorithm; the pixel connectivity. The 4-connected pixel connectivity, also known as von Neumann neighbourhood, is defined as a two-dimensional square lattice composed of a central cell and its four adjacent cells. To apply the pixel connectivity on our data, we need to combine the binary Lidar measurements with the binary threshold images of connected measurements. By arranging these three images as shown in Figure 3, we are able to apply 4-connectivity on the resulting image to label each island of interconnected measurements as a different cluster. The usage of this condition allows us to avoid a loop based implementation once again. The resulting segmented image can now be subsampled to the original range image.
In a subsequent step we can apply a threshold on the labelled clusters for objects below a certain number of Lidar measurements to reduce false clusters resulting from noise in the sensor, in our case we decided on a minimum of 100 points to be considered a cluster candidate. We have thus segmented the measurements to connected components of separate objects and non-segmented points which correspond to the ground plane and background noise.
Iii-a Skip connections
Due to the characteristics of Lidar sensors, namely the sparsity (especially in vertical direction) and missing measurements resulting from deflected laser beams, which have no remission value back to the sensor, segmentation algorithms like the proposed one can be prone to under- or over-segmenting. Missing values can result in missing connections between areas of the same object, due to which the direct neighbourhood approach described above will over-segment a single object into multiple clusters. Examples of such challenging instances are shown in Figure 7.
To overcome the limitations of the direct neighbourhood approach and to ensure a more robust segmentation, we have extended the two-dimensional Euclidean clustering by what we have named skip connections. For this, we reduce the combined image shown in Figure 3 to a sparse matrix, connecting only every second point in the vertical and horizontal direction, thus connecting a subset of original points. The schematic visualisation in Figure 4 displays a connection of each measurement with its second neighbour. Due to the known angle between all measurements, we can extend the Euclidean distance calculation from each measurement to any other using the cosine law described above. This allows us to robustly connect segments of the same object, which have no direct connection due to missing measurements in the range image. An example of this improved segmentation can be seen in Figure 7. These additional skip connections can be adjusted according to the utilised sensor depending on the sparsity and proneness to errors. In the results shown in Section IV we have added two-skip connections as visualised in Figure 4.
The combined use of the direct connectivity of neighbouring measurements and the skip connections enables a pseudo three-dimensional Euclidean clustering while exploiting the fast runtime of two-dimensional pixel connectivity. Thus, we are able to improve the quality of the segmentation without sacrificing our real-time ability.
Iv Experimental Evaluation
We have set up the evaluation to prove our three claims of (1) a high level of accuracy in point cloud segmentation, (2) being able to run in real time at usual sensor recording frequencies and (3) offering a constant processing rate with very little fluctuation independent of the scene’s context. The first experimental evaluation refers to the second and third claim, the second experiment is mainly concerned with a qualitative metric of the segmentation results.
Following the experimental setup of  we designed our first experiment on the provided data by Moosmann et al.  to support the claim that the proposed approach can be used for online segmentation. All listed methods have been evaluated on the same Intel® Core™ i7-6820HQ CPU @ 2.70 GHz. Figure 5 shows the execution time of the four methods over the 2500 Frames dataset. Please note, that we have removed the empty frames of the dataset from the evaluation. The proposed method runs as fast as the algorithm of  while suffering from less fluctuation due to the vectorised implementation. A box-plot of the average runtime of  and the proposed method can be seen in Figure 6 which shows the fluctuating nature of the loop based method as opposed to ours.
As can be seen in Figure 5, the proposed method together with the additional skip connections suffers from a slightly longer runtime, while still running at a frequency above 30 Hz. This is three times faster than the recording frequency of the used sensor.
Iv-B Segmentation Results
For our evaluation, we use the dataset ”SemanticKITTI” by Behley et al. 
as ground truth for object instance segmentation. For each ground truth object with more than 100 Lidar point measurements, we select the cluster of each algorithm with the most overlap with the ground truth. Using these two lists of points, we calculate the Intersection over Union (IoU). We compute this metric for each algorithm listed below, over all ten sequences with Lidar instance ground truth in the dataset. We present the mean and standard deviation of the IoUs with the best parameter for each method. With our threshold parameter ason the direct neighbourhood implementation without skip connections, we outperform  with the same average runtime. Together with the proposed skip connections and our distance threshold as we manage to come very close to the more general three-dimensional Euclidean distance clustering algorithm DBSCAN as shown in Table I while being on average faster by a factor of 17 and a maximum-value difference by a factor of 60 due to the fluctuating nature of the DBSCAN algorithm (80 and 225 times faster without skip connections).
For a further evaluation of the instance-level performance, we compute the average precision in a similar fashion as established benchmarks for instance segmentation for image data. We define 10 bins of point-wise overlap of the ground truth and proposed clusters ranging from an IoU of 0.5 to 0.95 in steps of 0.05. We average the precision of all bins into one single metric score, the Average Precision (AP) which is shown in Table II for each method. We also additionally list the AP for the overlap values of 50%, 75% and 95% in which the evaluation is restricted to objects above the denoted IoU. We show in Table II that we matched as many ground truth instances as the DBSCAN algorithm for an IoU over 50%, while we are still above 60% AP for all instances with an IoU over 75%. This is particularly important in the context of driver assistance systems, since a missed instance is a bigger problem than an object that was not matched perfectly. We further proved, that the proposed skip connections improve the results of our algorithm and help to find otherwise missed objects.
|Bogoslavskyi et al.||63.75||32.19||81.31||11.58|
We have presented an algorithm for real-time instance segmentation of Lidar sensor data by using raw range images. To make this approach more robust against over-segmentation, we introduced what we call skip connections, to use the larger neighbouring context for a more precise assignment of measured points to an instance, especially in cases of partial occlusion. These properties of our method facilitate the preservation of three-dimensional information in the measurements when being reduced to a two-dimensional representation for fast computation. In a detailed evaluation, we have shown, that our approach is as fast as comparable state-of-the-art methods, while being more stable in its runtime, and more importantly, providing an overall better performance in instance segmentation. The experiments show, that not only our accuracy in separating objects is higher than with comparable approaches, but we are able to match most ground truth instances that only the far more complex DBSCAN algorithm was able to match. This is particularly important in the context of driver assistance systems, since a missed instance is a bigger problem than an object that was not matched perfectly. We have also shown, that the proposed skip connections improve the results of our algorithm and help to find otherwise missed objects. The further use of segmented point clouds for classification and to remove false positives, is outside of the scope of this work and will be covered in future publications.
-  (1999) OPTICS: Ordering Points to Identify the Clustering Structure. SIGMOD Record (ACM Special Interest Group on Management of Data) 28 (2), pp. 49–60. External Links: Cited by: §I, §II.
SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences. (iii). External Links: Cited by: §IV-B.
-  (2017) Efficient online segmentation for sparse 3d laser scans. PFG – Journal of Photogrammetry, Remote Sensing and Geoinformation Science, pp. 1–12. External Links: Cited by: §I, §IV-A, §IV-B, TABLE I, TABLE II.
-  (2016) Fast range image-based segmentation of sparse 3D laser scans for online operation. IEEE International Conference on Intelligent Robots and Systems 2016-Novem, pp. 163–169. External Links: Cited by: §I, §II, §III, Fig. 6, §IV-A, §IV-B, TABLE I, TABLE II.
-  (2017) A fast ground segmentation method for 3D point cloud. Journal of Information Processing Systems 13 (3), pp. 491–499. External Links: Cited by: §III.
-  (2002) Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (5), pp. 603–619. External Links: Cited by: §II.
-  (1996) A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD’96, pp. 226–231. Cited by: §I, §II, TABLE I, TABLE II.
The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Transactions on Information Theory 21 (1), pp. 32–40. External Links: Cited by: §II.
-  (2010-06) Fast segmentation of 3d point clouds for ground vehicles. In 2010 IEEE Intelligent Vehicles Symposium, Vol. , pp. 560–565. External Links: Cited by: §II.
-  (2013-07) On real-time lidar data segmentation and classification. pp. . Cited by: §II.
-  (2019) 3D Instance Segmentation via Multi-Task Metric Learning. External Links: Cited by: §II.
-  (2009) Segmentation of 3D lidar data in non-flat urban environments using a local convexity criterion. IEEE Intelligent Vehicles Symposium, Proceedings, pp. 215–220. External Links: Cited by: §I, §II.
-  (2013) Interlacing self-localization, moving object tracking and mapping for 3d range sensors. Ph.D. Thesis, KIT Scientific Publishing, Karlsruhe, (english). External Links: Cited by: Fig. 5, Fig. 6, §IV-A.
-  (2019) LDLS: 3-D Object Segmentation Through Label Diffusion from 2-D Images. IEEE Robotics and Automation Letters 4 (3), pp. 2902–2909. External Links: Cited by: §II.
-  (2019) PointIT: A Fast Tracking Framework Based on 3D Instance Segmentation. External Links: Cited by: §II.
-  (2019) Learning Object Bounding Boxes for 3D Instance Segmentation on Point Clouds. (NeurIPS), pp. 1–14. External Links: Cited by: §II.
-  (2017) Fast segmentation of 3D point clouds: A paradigm on LiDAR data for autonomous vehicle applications. Proceedings - IEEE International Conference on Robotics and Automation (May), pp. 5067–5073. External Links: Cited by: §I, §II.
-  Instance Segmentation of LiDAR Point Clouds. Cited by: §II.