Ground and Non-Ground Separation Filter for UAV Lidar Point Cloud

11/16/2019 ∙ by Geesara Prathap, et al. ∙ JSC Innopolis 0

This paper proposes a novel approach for separating ground plane and non-ground objects on Lidar 3D point cloud as a filter. It is specially designed for real-time applications on unmanned aerial vehicles and works on sparse Lidar point clouds without preliminary mapping. We use this filter as a crucial component of fast obstacle avoidance system for agriculture drone operating at low altitude. As the first step, a point cloud is transformed into a depth image and then places with high density nearest to the vehicle (local maxima) are identified. Then we merge original depth image with identified locations after maximizing intensities of pixels in which local maxima were found. Next step is to calculate range angle image which represents angles between two consecutive laser beams based on improved depth image. Once a range angle image is constructed, smoothing is applied to reduce the noise. Finally, we find out connected components in the improved depth image while incorporating smoothed range angle image. This allows separating the non-ground objects. The rest of the locations of depth image belong to the ground plane. The filter has been tested on a simulated environment as well as an actual drone and provides real-time performance. We make our source code and dataset available online[%s]



There are no comments yet.


page 1

page 3

page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Over the past two decades, Lidar technology has been utilized in autonomous vehicles extensively addressing various direction including Multi-Target Tracking (MTT)  [3], road and road-edge detection  [17], pedestrian recognition and tracking  [16], etc. Mass produced Unmanned Aerial Vehicles (UAVs) were not able to apply Lidar based technologies because of various constrains such as computational complexity, limitations on the maximum weight and power for longer flight duration, etc. On the other hand, with the advent of the advanced unmanned aerial vehicles and faster growth of computational capabilities (e.g., processors, random access memory (RAM), etc.) for executing a large number of instructions per cycle in the recent past and producing of lightweight batteries with high power density, most of the Lidar technologies and complex algorithms now work for UAVs as well. UAV also started using Lidar as the one of the main sources of visionary data. This opens a new paradigm of UAV based robotics applications, where more detailed information for robust navigation and accomplishing of different tasks such as surveillance, transportation, etc. is needed. The recent work of B. Zhou [18] and S. Liu [8] propose a quadrotors motion planning system for fast flight in 3-D complex environments.

(a) Raw point cloud from Velodyne-16 Lidar
(b) White color represents the ground plane and other colors depicts non-ground objects on the point cloud which is shown in Fig 0(a)
(c) The projection of the point cloud on a camera space with filter result
Fig. 1: Ground and non-ground separation filter results

In this work, we propose a new filter named after Hagen for separation of the ground and non-ground specially designed for UAVs real-time applications. Thus, it avoids creation of complete map. Instead, it uses a set of consecutive measurements and concatenates them in sliding window fashion. Afterwards, the concatenated point cloud is utilized for further processing. Proposed filter is designed in a way that it starts detecting non-ground objects that are closer to the UAV from the top level to bottom level of Lidar laser beams. This feature could be used with a path planner, not let UAV to fly into a trap where there is no direction to come out. If Lidar is only the sensor applied for reasoning of the environment, search space of Lidar eventually will become the feasible search space of UAV. That is why filter is detecting non-ground from the top level to bottom level. That will help to avoid traversing UAV into terrain or obstacles, etc. Fig. 1 shows how ground and non-ground separation of 3D lidar point cloud happens.

Our Contributions: Proposing an end-to-end novel ground and non-ground separation filter is the main contribution. Under this,

  1. Improving depth image by finding local maxima by applying persistence homology,

  2. non-ground points detection starts with detecting objects closer to Lidar from the top-level (laser beams),

  3. Apply singular spectrum analysis (SSA) for range angle smoothing,

  4. Code level optimization for reducing the execution time.

Ii Related Work

In the work of C. Tongtong [15], ground segmentation is performed on 3D Lidar point cloud, building a polar grid map. It apples 1D Gaussian Process (GP) regression model and Incremental Sample Consensus (INSAC) algorithm for segmentation. They have achieved good segmentation results in a variety of scenarios ( e.g., urban and countryside environments, etc). Authors of [5] present a set of segmentation of dense 3D data algorithms. They have noticed empirically that prior ground extraction leads to an improvement of the segmentation performance. The work by M. Himmelsbach [7]

shows long-range 3D point clouds segmentation in a real-time manner and later classification of segmented objects. To reduce the execution time and increase efficiency, their approach was split into solving of two subproblems: ground plane estimation and fast 2D connected components labelling. In our approach, it happens in the opposite direction where fast 2D connected components labelling is followed by ground plane extraction. F. Moosmann 

[9] presents a fast algorithm that works with a high volume of 3D Lidar data in a real-time. It uses a novel unified generic criterion based on local convexity measures for separation of ground and non-ground which is based on a graph data structure. On the contrary, authors of [4] propose motion-based detection and tracking in 3D LiDAR scans. They solely rely on the motion cues. They sequentially detect all the motions in the scene. Afterwards, the Bayesian filter is used to segment the objects followed by tracking. Paper [14] presents an implementation of the ground detection methodology with filtration of forest points from LiDAR-based dense 3D point cloud using the Cloth Simulation Filtering (CSF) algorithm. The methodology requires dense mapping and offline analysis but allows to recover a terrestrial relief and create a landscape map of a forestry region.

Filtering is one of the most fundamental techniques of computer vision. For example, a Gaussian filter computes the weighted average around the given pixel location with a given size of a kernel. Bilateral filter also uses a somewhat similar approach but preserving edges around each considered pixel. Authors of


formulate obtaining dense depth-map using local spatial interpolation that depends on sliding window-based approach in which BF is employed. They have modified conventional BF to achieve proper upsampling which can preserve foreground-background discontinuities. That is how they were able to acquire high resolution depth-maps by upsampling of 3D-LIDAR data. In this research, Bilateral Filter (BF) is applied to sharpen locations around local maxima detected as shown in Fig. 


Fig. 2: Illustration of the superposition of objects between close (part of the car) and far (tree and part of the building), whose distance or depth from Lidar will be in the same neighborhood region of the depth image. Nearest points in point cloud from Lidar are marked with the red crosses.

In this work, it is expected to smooth out range angle image. Thus, a few smoothing techniques were evaluated. The main intuition of smoothing a signal is to construct an approximate signal which tries to capture meaningful patterns while leaving out noise or other rapid phenomena. Singular Spectrum Analysis (SSA) is the one of smoothing techniques which gives a highly accurate and consistent result. SSA is utilized especially in the biomedical related application for smoothing signals and find some hidden patterns within it (e.g., [11, 10] and [12]).

Fig. 3: (a) depicts depth image () which is transformed from a point cloud (corresponds to point cloud shown in Fig. 0(a)). The result of local maxima detection () is shown in (b). Mean image () is shown in (c). Range angle image and smoothed range angled image () are shown in (d) and (e) respectively

Iii Ground and Non-Ground Separation

Typically, ground and non-ground can be separated simply removing points that are lower than the location of the Lidar (assume that installed location is known) or RANSAC-based plane fitting. However, none of those methods does work for UAVs because UAV altitude can change anytime in opposite to ground vehicles. Thus, we designed an complete ground and non-ground separation filer for UAVs for Lidar. Algorithm. 1 shows main steps of the filter at a high level. Inputs for the filter are , (explained in Sec.  III-C), and (explained in Sec.  III-D).

Input , (PC), (), (W)
Output ,

Algorithm 1 Ground and non-ground separation filter

Iii-a Depth Image Estimation from Point Cloud

We used Velodyne VLP16 Lidar for our experiments, however proposed algorithm could be used with other Lidars (e.g, Velodyne HDL-64E, HDL-32E, etc). VLP16 provides the point cloud constructed from 16 laser beams cover 15 degrees vertically and 360 degrees horizontally. Processing point cloud itself is time-consuming which is not suited for real-time analysis. Thus, we have decided to reduce space dimension 3D to 2D where disparity map or depth map is employed for separation of the ground and non-ground. Width of the depth image is set for 870 (). 870 is selected empirically where horizontal resolution is set for 0.413 () degrees/pixel . It can be changed. But it may affect the performance and/or the accuracy. Depth image height is set for 16 because VLP16 Lidar has only 16 channels. To transform a point cloud into a depth image, each point in the point cloud to be assigned to a pixel in the depth image. Distance or depth to each considered point corresponds to an intensity of the chosen pixel. If there are multiple points corresponded to the same pixel location, we keep the closest distance from the Lidar. Once non-ground is separated in 2D space, it is needed to acquire the corresponding points of the point cloud from the depth image. Thus, while estimating the depth image, we keep a record which points belongs to which pixel location. Let’s call this mapper as the . Following steps are required to create the depth image:

  1. Let () are the points in a cloud, where is the number of points

  2. Estimate angles on and directions for a given point:

  3. 360 degrees are split into number of pixels; 30 degrees are split into number of pixels. Then we get closest matching row and column that corresponds with and in degrees. It will be the pixel location on the depth image. Here is the intensity value.

A similar idea has proposed in [2] as well. After performing this, depth image can be constructed as shown in Fig. 3a.

Iii-B Local Maxima Detection in Depth Image

The main reason for detection of local maxima locations is shown in Fig. 2. We are interested in finding locations of local maxima that present in the considered images. Extracting local maxima from given images that are in high-dimensional space is generally challenging because of incompleteness and noise. In applied mathematics, Topological Data Analysis (TDA) is an approach for analyzing the datasets by using topological properties. This allows to find some hidden structures or patterns. Persistence Homology (PH) is one of the dominant mathematical tools employed for this purpose. PH provides a proper way of analysing such data while having less sensitive to a particular metric and transforming original space into different space which is robust to noise.

Fig. 4: Blue color curve shows how normalized pixel intensities are propagated over given row of image. Red color vertical lines represent barcode for the most significant local minima

To get a clear intuition of how PH applies for local maxima detection, let’s consider Fig. 4. Though we are interested in local maxima, some of the local maxima might be irrelevant for time to time. For example, local maxima near 155 must be considered while discarding local maxima at 165. Those false positive are arisen because of the noise of the image. Hence, false-positive should be eliminated. Noise can be rid of by smoothing the image. Nonetheless, this comes with a cost (e.g., some of sharpen local maxima get vanished or dampened, intensity level of pixel will change if there is no proper normalization, etc). Thus, it would be better to operate on the original data itself. In Persistent Homology, there is a concept called barcode. A barcode represents each persistent generator with a vertical line which begins at the first filtration level when it arises and ending at the second filtration level when it disappears. Persistent generator, first filtration level and second filtration level are corresponded to distance between detected local maxima to consecutive local minima, local maxima and local minima respectively. Hence, PH finds the relative maxima in noise-resilient fashion. In Fig. 4, red colored lines depict the most persistent 0-dimensional cycles (0-simplifies) where the local maxima have been detected.

Fig. 5: Local minima detection of where blue color plane depicts the pixel locations of and strength of local mamixa is varied from blue to red along the orthogonal direction to the image plane

Hence, PH finds the relative maxima in noise-resilient fashion. Once the most persistent local maxima have been detected, it is needed to sharpen those locations in depth image without affecting neighbour pixels. A bilateral filter (BF) is one of the techniques to achieve this because BF is a non-linear, edge-preserving, and noise-reducing smoothing filter. It replaces the intensity of each of the pixels with a weighted average of intensity values from nearby pixels. We have modified BF slightly different way which get an accurate result compare to default BF.


and normalization term, is defined as


where is the filtered image, I is the depth image, x are the coordinates of local maxima that are detected, is the 8 neighbours around each of the x, is a Gaussian function where is the mean and

is the standard deviation,

and are the standard deviations which are constant all the time; values are set as 1.2 and 1.3 respectively which were estimated empirically.

Iii-C Constructing Range Angle Image

In each iteration, the depth image is constructed followed by range angle image. Since laser beams cover 15 degrees vertically and is 16, it is needed to calculated 15 vertical angles in which each angle estimated as angle between the horizontal plane of the Lidar and considered laser beam which varies from Lidar to Lidar. As shown in Fig. 6, and are two vertical angles that correspond to two consecutive rows with a considered column (e.g., r-1,c and r,c). Thus, each range angle () of the range angle image is calculated considering depths (i.e., depicts in red color lines in Fig. 6) at considered consecutive rows and a column in depth image. Hence, the range angle image is constructed with 15x870 () resolution. Let’s consider two consecutive laser beams that are projected on a car which are shown in red colored lines in Fig. 6. Since and are known, can be calculated as follows:

Fig. 6: Illustration of how is calculated. To construct range angle image, number of angles to be calculated as follows

where is distance or depth at (r,c) in the depth image. As shown in Fig.  6, and are vertical angles in between r and r+1 rows which corresponds with -th column.

Iii-D Smoothing Range Angles with SSA

SSA is a powerful non-parametric spectral estimation technique for time series analysis and forecasting. SSA incorporates elements of classical time series analysis, multivariate statistics, dynamical systems and signal processing. In this research, SSA applies for smoothing out the range angle image in column-wise. Lets define angles of length N (). The SSA algorithm consists of two main steps: decomposition of a given series and reconstruction by adding desired principal components.

Iii-D1 Decomposition

In this stage, is rearranged into WxK matrix which is called as the trajectory matrix . W is the window length or . K can be formulated as . We fix the value for W as 8 even it is a configurable parameter for the filter. This constraint is made because each column consist of 16 pixels and two iterations were assumed be enough. This assumption is made empirically. Furthermore, if you are going to use 32 or 64 channel Lidar it is better to increase .



. Then Singular Value Decomposition (SVD) is applied on X, where jth component of SVD is specified by jth eigenvalue

and eigenvector

of .


where . Since X is a Hankel matrix (or catalecticant matrix) and is positive-define matrix, their eigenvalues () are positive as well. Then, eigenvectors of X are ordered in decreasing order of corresponding eigenvalues.

Iii-D2 Reconstruction

In this stage, select a set of SVD components () and averaging along entries (hankelization) with indices of the X from the selected components of the SVD. More about hankelization can be found in [6]. In Fig. 7, it is shown how principal components are extracted. Based on , smoothness will vary.

Fig. 7: Principal components extraction corresponds to 370th column of Fig. 1d. In this figure, first subfigure depicts the intensity changes over column-wise. Second subfigure represents corresponding principal components

Iii-E Connected Components Extraction with BFS

This is the last step of the filter. In Algorithm. 2, it is given the basic steps of the process and is utilized for finding connected components using the breadth-first search (BFS). The idea used was initially proposed in [1], a few modifications have made to improve performance and accuracy. Once non-ground is labelled, corresponding cloud points can be extracted using the (explained in Sec. III-A). The result of this is shown in Fig. 0(b).

1:procedure SeparationGroundandNonGround()
5:     for c = 0, …,  do
8:         for r = -1, …,0 do
10:              if  then
13:              end if
14:              if  then
17:              end if
18:         end for
19:         if  then
21:         end if
22:     end for
23:     for  do
24:         LabelConnectedComponent(, )
25:     end for
26:end procedure
Algorithm 2 Connected components extraction with BFS

Iv Experimental Evaluation

(a) Ground truth (manually labeled) ground area
(b) Filter output (ground only) after the camera projection
(c) Manually labeled non-ground area
(d) Filter output (non-ground only) after the camera projection
Fig. 8: Labeled (ground and non-ground) images corresponding to Fig. 1

This section mainly focuses on evaluating specific aspect of the filer detailed in Sec. III

and assessing the performance of the system. Since filter is designed as a parametric model, there are three hyperparameters: PC,

and W are needed to fine tune for getting a stable result. To evaluate how those affect on the final result, initially Lidar and filter output (point clouds) are projected on camera as shown in Fig. 0(c) within the camera’s field of view (FOV). Thus, evaluation is done only on the region covered by camera’s FOV considering as an image segmentation problem. Hence,

is used as the evaluation metric which is one of the well-known widely used evaluation matrices in image segmentation applications:


where denotes the precision, is for recall, and are true positive, true negative, false positive and false negative respectively. The

score is defined as the harmonic mean of P and R. In mathematics, harmonic is the most suitable tool for work with rates and ratios. Since precision and recall are ratios, it is the most appropriate tool for the estimating of the filter accuracy.

As the first part of the evaluation, a set of images consisting of projected Lidar point cloud within camera’s FOV was manually selected. And we segment them into two separate classes: ground and non-ground as shown in Fig. 7(a) and Fig. 7(c). Afterword, we projected filter output on camera space as ground and non-ground separately and labelled them as shown in Fig. 7(b) and Fig. 7(d). In this way, 50 images were labelled for the each of the classes, overall 200 images for a given specific parameter configuration (PC, and W). As mentioned in Sec. III-D, W is fix for this experiment. Initially, we ran the filter on ROS bag file several times while changing PC and values. Along with that, we came up with a hypothesis for the proper value combination of the hyperparameters: PC (5) and (10 degrees). To claim our hypothesis is correct or wrong, PC and were varied around the observed values. Table I and Table II show how those affect the final result of the filter.

Fig. 9: The accuracy of our algorithm computed as score for separation of ground plane (Table. I) and non-ground objects (Table. II). angle (degree) and PC are the hyperparameters that are varied one at a time while holding other hyperparameters constant

According to the Fig. 9a, score of non-grounds objects is steeply increased from degree of 5 to peak of 10 degree. After that, it has a downwards trend. On the other hand, there is a dip in for ground plane. After 10 degrees, both scores have a decline. As shown in Fig. 9b, of ground plane is sharply increased from 2 to peak of 5. After that it has a steadiness. It is common for the of non-ground as well with less score. Still, there is space to increase PC value to achieve high score. We have decided not to increase PC further because it may be a overestimation of the accuracy of the filter.

To assess the performance of the system, we ran the filter through 3000 point clouds as 3 mini batches (1000 per each). Then, we measured average filter running time and its standard deviation per 360 degrees Velodyne VLP16 scan which is shown in Table III.

Angle Ground Non-ground
P R F1 P R F1
5 0.948 0.77 0.856 0.825 0.63 0.715
7.5 0.931 0.748 0.83 0.872 0.717 0.787
10 0.925 0.762 0.82 0.892 0.738 0.808
12.5 0.897 0.7 0.78 0.862 0.735 0.794
TABLE I: Changing the angle while fixing PC (5) to evaluate how will affect on the final accuracy of the filter
PC Ground Non-ground
P R F1 P R F1
2 0.715 0.547 0.62 0.79 0.678 0.73
4 0.917 0.7253 0.81 0.82 0.708 0.760
5 0.925 0.762 0.82 0.892 0.738 0.808
8 0.908 0.739 0.815 0.905 0.746 0.818
TABLE II: Changing the PC while fixing (10 degree) to evaluate how PC will affect on the final accuracy of the filter
Nvidia Xavier
ARMv8.2 @ 2.2GHz
Acer F5-573g
i5-6200U @2.30 GHz
134 ms 12 ms 126 ms 15 ms
TABLE III: Average running time and its standard deviation per 360 degrees Lidar scan

V Conclusions

In this paper, we have presented a complete filter for ground and non-ground separation on Lidar point cloud. The system was designed and implemented with a focus on UAVs real-time applications. Filter works on sparse Lidar point clouds without preliminary mapping. In particular, we have presented main steps of proposed algorithm – search of places with high density nearest to drone (localmaxima) on point cloud transformed into a depthimag, merging of original depth image with identified locations after maximizing intensities of local maxima pixels, utilizing of range angle image and final search of connected components in the improved depth image for ground plane and non-ground objects separation. Finally, we validated our approach on simulator and real drone with a series of experiments and evaluated accuracy and computational performance.


This research has been supported by the Russian Ministry of Education and Science within the Federal Target Program grant (research grant ID RFMEFI 60917X0100).


  • [1] I. Bogoslavskyi and C. Stachniss (2016) Fast range image-based segmentation of sparse 3d laser scans for online operation. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 163–169. Cited by: §III-E.
  • [2] I. Bogoslavskyi and C. Stachniss (2017) Efficient online segmentation for sparse 3d laser scans. PFG–Journal of Photogrammetry, Remote Sensing and Geoinformation Science 85 (1), pp. 41–52. Cited by: §III-A.
  • [3] J. Choi, S. Ulbrich, B. Lichte, and M. Maurer (2013) Multi-target tracking using a 3d-lidar sensor for autonomous vehicles. In 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013), pp. 881–886. Cited by: §I.
  • [4] A. Dewan, T. Caselitz, G. D. Tipaldi, and W. Burgard (2016) Motion-based detection and tracking in 3d lidar scans. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 4508–4513. Cited by: §II.
  • [5] B. Douillard, J. Underwood, N. Kuntz, V. Vlaskine, A. Quadros, P. Morton, and A. Frenkel (2011) On the segmentation of 3d lidar point clouds. In 2011 IEEE International Conference on Robotics and Automation, pp. 2798–2805. Cited by: §II.
  • [6] N. Golyandina, V. Nekrutkin, and A. A. Zhigljavsky (2001) Analysis of time series structure: ssa and related techniques. Chapman and Hall/CRC. Cited by: §III-D2.
  • [7] M. Himmelsbach, F. V. Hundelshausen, and H. Wuensche (2010) Fast segmentation of 3d point clouds for ground vehicles. In 2010 IEEE Intelligent Vehicles Symposium, pp. 560–565. Cited by: §II.
  • [8] S. Liu, M. Watterson, K. Mohta, K. Sun, S. Bhattacharya, C. J. Taylor, and V. Kumar (2017) Planning dynamically feasible trajectories for quadrotors using safe flight corridors in 3-d complex environments. IEEE Robotics and Automation Letters 2 (3), pp. 1688–1695. Cited by: §I.
  • [9] F. Moosmann, O. Pink, and C. Stiller (2009) Segmentation of 3d lidar data in non-flat urban environments using a local convexity criterion. In 2009 IEEE Intelligent Vehicles Symposium, pp. 215–220. Cited by: §II.
  • [10] N. Mourad (2019) ECG denoising algorithm based on group sparsity and singular spectrum analysis. Biomedical Signal Processing and Control 50, pp. 62–71. Cited by: §II.
  • [11] T. C. Pataky, M. A. Robinson, J. Vanrenterghem, and J. H. Challis (2019) Smoothing can systematically bias small samples of one-dimensional biomechanical continua. Journal of biomechanics 82, pp. 330–336. Cited by: §II.
  • [12] G. Prathap, T. N. Kumara, and R. Ragel (2018) Near real-time data labeling using a depth sensor for emg based prosthetic arms. In Proceedings of SAI Intelligent Systems Conference, pp. 310–325. Cited by: §II.
  • [13] C. Premebida, L. Garrote, A. Asvadi, A. P. Ribeiro, and U. Nunes (2016) High-resolution lidar-based depth mapping using bilateral filter. In 2016 IEEE 19th international conference on intelligent transportation systems (ITSC), pp. 2469–2474. Cited by: §II.
  • [14] A. Sabirova, M. Rassabin, R. Fedorenko, and I. Afanasyev (2019) Ground Profile Recovery from Aerial 3D LiDAR- Based Maps. In 2019 24th Conference of Open Innovations Association (FRUCT), pp. 367–374. Cited by: §II.
  • [15] C. Tongtong, D. Bin, L. Daxue, Z. Bo, and L. Qixu (2011) 3D lidar-based ground segmentation. In

    The First Asian Conference on Pattern Recognition

    pp. 446–450. Cited by: §II.
  • [16] H. Wang, B. Wang, B. Liu, X. Meng, and G. Yang (2017) Pedestrian recognition and tracking using 3d lidar for autonomous vehicle. Robotics and Autonomous Systems 88, pp. 71–78. Cited by: §I.
  • [17] W. Zhang (2010) Lidar-based road and road-edge detection. In 2010 IEEE Intelligent Vehicles Symposium, pp. 845–848. Cited by: §I.
  • [18] B. Zhou, F. Gao, L. Wang, C. Liu, and S. Shen (2019) Robust and efficient quadrotor trajectory generation for fast autonomous flight. IEEE Robotics and Automation Letters 4 (4), pp. 3529–3536. Cited by: §I.