1 Introduction
1.1 Motivation
The increasing prevalence of 3D scanners has resulted in a dramatic explosion in the availability of 3D data, especially raw sensor data often represented in the most basic 3D point cloud format. Such sensors include LIDAR scanners for modelling large outdoor scenes and GIS applications, as well as commercially available and inexpensive solutions for indoor scanning and modelling. Consequently the processing of point clouds of millions, or even hundreds of millions of points has become commonplace. Furthermore, new applications of range sensors that require the processing of large point clouds in realtime, such as selfdriving cars [2], have arose.
For such datasets and their applications to be useful, or even feasible, there is a demand for salient point selection algorithms based solely on an unorganized point cloud  as opposed to connectedgraph and meshbased algorithms which are typically more computationally and memory intensive. This has motivated the move towards using simple point cloud processing algorithms to filter a point cloud for salient points before applying complex algorithms, akin to the common usage of image processing filters in 2D computer vision algorithms.
One such image processing filter is the Difference of Gaussians (DoG). The DoG is an approximation to the Laplacian of the Gaussian (LoG) operator, and is widely used in applications such as image enhancement, blob detection, edge detection, finding points of salience, presegmenting images [13], and perhaps most notably in the form of a DoG pyramid for obtaining scale invariance in 2D object recognition [12]. Although the DoG operator easily generalizes to socalled 2.5D data (i.e. depth images) and volumetric images (e.g. in medical imaging) [22], extending it to unorganized data (i.e. point clouds), particularly in a computationally efficient manner, is less straight forward.
1.2 Contributions
In this paper, a multiscale operator of similar function to the DoG is introduced for unorganized 3D point clouds, namely the Difference of Normals (DoN). Despite the simplicity and efficiency of the operator, DoN is shown to be surprisingly powerful in assigning point saliency according to scale. While DoN is motivated in this paper as a multiscale saliency feature used in a segmentation and/or object recognition pipeline, it also has applications to oriented 3D edge detection and planar region segmentation. An open source implementation of the DoN operator is made available in the Point Cloud Library (PCL) [15] ^{1}^{1}1Available as a feature in the PCL trunk: http://www.pointclouds.org..
1.3 Organization
The rest of this paper is organized as follows: In §2 previous work on multiscale operators and segmentation in unorganized point clouds is summarized; §3 introduces the Difference of Normals (DoN) operator; §4 introduces a method for parameter selection, and shows the application of the DoN operator to the segmentation of real urban LIDAR scenes. A quantitative analysis of DoN segmentations from a publicly available dataset with ground truth annotations is evaluated; §5 summarizes the results and explores the potential for future work.
2 Previous Work
2.1 Scale and Unorganized Point Clouds
In 2D images, the concept of scale space is often described with a family of gradually smoothed images created through the convolution of a Gaussian kernel, such a Gaussian scalespace has a wide range of applications in image processing and 2D computer vision, such as in edge sharpening and interest point selection [12].
Extending the concept of scalespace to unorganized 3D point clouds is a nontrivial task due to the lack of a regular lattice from which points were sampled. One solution is to convert the data into an organized format, such as a dense voxel map, in which the generalization from 2D image processing is straightforward. This however, is an unrealistic task for large point clouds, such as outdoor scenes with millions of points, since the required number of voxels will be vast. Octree representation is a possible solution in reducing the memory requirements, but it comes at a computational cost.
Unnikrishnan et al., arguing the need for a multiscale unorganized point cloud operator, introduced such an operator derived from LaplaceBeltrami operator  a generalization of the Laplacian to Riemannian manifolds [19]. Their proposal of a multiscale unorganized point cloud operator was derived in a scalespace theoretic method, and being based on a Gaussian convolution kernel satisfies the scale space axioms. In application, however, the operator is relatively computationally and memory intensive compared with our proposed method, as it requires the computation of a geodesic distance graph for the point cloud and the convolution kernel is computationally expensive to compute. Furthermore it is unclear how the operator could be used to detect oriented features such as corners or edges.
2.2 Normal Support Radius in 3D Point Features
Many proposed features for unorganized point clouds have directly, or indirectly used the relation of support region size in surface normal estimation as a method of scale or saliency detection on the implicit surfaces of a 3D point clouds.
Rusu et al. proposed Persistent Point Feature Histograms for unorganized point clouds. These features were calculated in part using a series of normals estimated with a series of increasing radii between a fixed minimum and maximum radius [16]
. Novatnack and Nishino used surface normals to create scaledependent geometric features on triangular meshes. The mesh normals were interpolated over a 2D parametrization of the mesh over each vertex, creating a normal map that was used in edge detection. They argued that while a normal map does not satisfy the axioms of a scale space operator, it is a natural choice as normals are less effected by noise as compared to higher order derivative quantities such as curvature.
2.3 Point Cloud Segmentation
Various algorithms have been proposed in the area of point cloud segmentation. However, most algorithms for unorganized point clouds have require meshing or connectivity [7, 4]. Those that do not often require estimating the normal map as an integral step.
Liu et al. introduced Cell Mean Shift (CMS) [11], which maps the normal map of a point cloud to a Gaussian sphere, producing a Gaussian image. This spherical image can then be clustered to identify shapes. Woo et al. [21] propose an octreebased method for handling large point clouds, using edges to segment structures within.
3 Difference of Normals Operator
3.1 Theory
The concept of scalespace has a well established theoretical background for continuous and discrete signals, notably in linking the relationship between scalespace and the linear diffusion equation [8] and in establishing the scale space axioms [8, 20]. A set of axioms [8, 10, 20], the complete review of which is beyond the scope of this paper, are described to capture the properties of a desired and useful scalespace representation. Notably, it has been proven that the Gaussian kernel is the only convolutional filter that satisfies the complete set of scalespace axioms [8].
Although the Gaussian kernel is unique in satisfying the complete set of scalespace axioms, for many applications an operator that satisfies the full set of scalespace axioms is not required. Such approaches are more generally referred to as multiscale. Multiscale operators may be simpler, more computationally efficient and have desirable properties such as orientability.
This paper proposes to define a multiscale operator for unorganized point clouds directly using the estimated surface normal map of an unorganized point cloud. The primary motivation behind this, is the observation that surface normals estimated at any given radius reflect the underlying geometry of the surface at the scale of the support radius. Although there are many different methods of estimating the surface normals (see §3.3), normals are always estimated with a support radius (or via a fixed number of neighbours). This support radius determines the scale in the surface structure which the normal represents.
Fig. 1 illustrates this effect in 1D. Normals, , and tangents, , estimated with a small support radius are affected by smallscale surface structure (and similarly by noise). On the other hand, normals and tangent planes estimated with a large support radius are less affected by smallscale structure, and represent the geometry of larger scale surface structures.
The intuition behind the approach being proposed here is that if the direction of the two surface normals is nearly identical, then the structure of the surface does not change significantly from the first radius to the second. By contrast, if the structure of the larger neighbourhood around a center point is significantly different from that of the smaller neighborhood, then the direction of the two estimated normals are likely to vary by a larger margin. In that case, a value between the two radii is often a representative of the scale around near the center point.
Suppose a multiscale operator for a point cloud is simply defined as:
(1) 
with scale parameter , effected by the normal map of a point cloud estimated with support radius
. Notice the response of our operator is a vector, and is thus orientable, however the operator’s
norm provides a more conventional scale quantity.Just as described by the most basic and intuitive scale space axiom, the effect of the normals on the implicit surface sampled by a point cloud is to suppress most of the structures in the surface with a characteristic dimension of less than . Furthermore, with increasing values of the scale parameter , fine scale surface structure is increasingly suppressed. Despite this, Eqn. 1 does not satisfy all scale space axioms originally outlined by Witkin et al. [20] and more recently enumerated by Lindeburg et al. [10]. Notably the causality requirement introduced by Koenderink et al. [8].
3.2 Method
When applying the multiscale operator defined in Eqn. 1, we compare the responses at each point over several radii . In the most basic case we can compare the response of the operator across two different radii . Formally, the Difference of Normals (DoN) operator for any point in a point cloud , is defined as:
(2) 
where , , and is the surface normal estimate at point , given the support radius .
For a given and , the result of applying the operator to all the points in a point cloud is a vector map where a DoN vector is assigned to each point. Since each DoN is the normalized sum of two unit normal vectors, the magnitude of the vectors are always within .
The DoN vectors may be thresholded based on their magnitude, i.e. , or based on their component values, i.e. , , or for orientable surfaces and edges.
Calculating the two normal maps estimated with support radii for a scene is a process which is highly parallelizable and thus greatly benefits from GPU optimization. Consequently, DoN computation, even for very large scale point clouds, may be performed very efficiently (see §4.4).
3.3 Approximating Normals in Range Data
3.3.1 Normals Estimation
There are many methods for estimating normals (or equivalently tangent planes) in point clouds [5, 6, 1]. However, only those using a fixed support radius, rather than a fixed number of neighbors, are suitable for unorganized data, especially when the point cloud density is highly variable.
Applying a method based on a fixed number of neighbors to a point cloud with a high variability in sampling density, e.g. urban LIDAR data, results in each normal being computed using what may be a very different support radii, and thus the estimated normals at each point will represent the surface at very different scales. Such normals would be unsuitable for DoN calculations.
In our experiments, the normals were estimated by finding the tangent plane using the principal components of a local neighborhood of fixed support radius around each point. This neighborhood may contain any number of points . The result is that all the normals in the scene are calculated at the same scale. However, due to the highly variable sampling density/resolution of some range data, the accuracy of the normal estimate, may vary considerably across a scene with .
3.3.2 Resolving Normal Ambiguity
Surface normals estimated on point clouds exhibit a sign ambiguity in their direction. This is because any tangent plane to a point has two normals in opposite directions, either of which is mathematically valid. In many applications, this normal ambiguity is typically resolved with the sensor context, since the correct normal is always the one pointing in the hemisphere towards the range sensor [18]. For the particular application of DoN operations, the particular choice of resolving the normal sign ambiguity has no consequence so long as the normals for the two support radii are disambiguated in the same manner. Thus the disambiguation of the normals can simply be achieved by negating one of the normals if , i.e. if the angle between the two normals is greater than . This is under the assumption that the true surface normals must be within an angle of of each other, which is a realistic assumption given the limitation of scanning in the presence of selfocclusion.
4 Experimental Results
4.1 Parameter Selection
Selecting the parameters and for DoN is important since, while a wide range of parameters may elicit large responses from the surface of interest, naive parameters choices may also have large responses in other classes of surfaces. We propose a simple parameter selection algorithm where the objective is to choose parameters maximizing the DoN magnitude for the set of points within the objective class, while minimizing the DoN magnitude for other known classes of surfaces/objects in the scene.
In practice, given a set of ground truth point clouds for our objective object (e.g. cars) and a set of ground truth point clouds for the objects in close vicinity of the objective object (e.g. road, people), we compare the aggregate response statistics (i.e. median, mean, variance) for all points in each class across a selection of DoN parameters.
Fig. 2 shows the mean, median and variance responses for a set of object classes from a single data sequence in the KITTI dataset [2] over a range of parameters . Using this, for example, we empirically set the parameters for pedestrians and for cars in order to maximize the intraclass response distance in a scene containing both objects in close proximity.
4.2 TITAN Urban Mobile LIDAR Data
In GIS applications, there is a focus on the recognition of street furniture (a GIS term describing lamp posts, fire hydrants, curbs, etc.) and the extraction of largescale infrastructure (e.g. buildings, roads). The DoN operator is an ideal tool for addressing such problems by isolating objects in the scene based on their scale. The following results demonstrate various applications of DoN to realworld data, towards automatic segmentation and as a preprocessing step in object recognition or annotation in urban LIDAR data.
The points clouds used in the experiments reported here are from a TITAN system [3]. The raw data is collected via a LIDAR scanner mounted on a moving vehicle and scanning the urban scenery as the vehicle traverses the street. The individual scans are then registered together to form complete 3D point clouds of large urban areas. As the registration of many low ( cm) resolution scans from a mobile platform, the final point clouds have a highly variable sampling density governed mostly by the speed of the vehicle, the changing obstructions in the scene, and the registration error. Despite the large amount of data in a typical TITAN scene, individual objects are often composed of small numbers of points, e.g. points in the case of a person. All these factors make processing the TITAN point clouds a particularly challenging domain.
For illustration purposes, the results in this section are demonstrated using small ( ) sections of a realworld urban LIDAR data in the city of Kingston, ON, Canada, collected by the TITAN mobile terrestrial scanner. Similar results were also observed on much larger datasets of hundreds of millions of points.
4.2.1 DoN Features
The DoN operator has two parameters, a large radius () and a small radius (). While each structure may exhibit a response in a range of scales, it will generally have a natural scale at which this response is maximized. Empirically, it was found that thresholding the magnitude of the vectors obtained with scale ratios () of provided good results for filtering out large points belonging to large scale planar surfaces. Fig. (a)a illustrates a typical urban LIDAR scene, for which the magnitude of the DoN vectors for each scene points (i.e. at three different scales are shown in Figs. (b)b, (c)c, and (d)d. The magnitudes, which are in the range, are colorized according to the color map shown at the bottom of the image.
For DoN parameters corresponding to small scales (e.g. within the m range), points belonging to lower scale objects have strong responses. For example in Fig. (b)b, the finest scale structure exhibits the strongest response. These include road curbs, window ledges, and the details in building facades. For DoN parameters corresponding to larger scales (i.e. m), points belonging to larger structures have strong responses. For example in Fig. (d)d the building points have a large response, yet very large scale structures (i.e. the road surface) still exhibits a small response.
4.2.2 DoN ScaleBased Filtering
An important application of the DoN operator, as motivated by the results shown in Fig. 3, is to use it as a salience operator to prefilter point clouds. Fig. 4 shows the results of such a filtering of a point cloud, discarding all the points for which , on a typical urban scene, with various DoN parameters corresponding to a range of scales.
At the lowest scale ( m), shown in Fig. (b)b, sharp edges are clearly preserved, including building edges (window outlines and pipes) and ground edges (street curbs). Also preserved, however, is artificial structure derived from noise  to be expected at approximately the resolution of the data. By the next, incrementally larger, scale ( m), shown in Fig. (c)c, the noise has been filtered out. As the scale is increased, larger and larger objects are preserved, while smaller objects are increasingly discarded. At the largest scale ( m), shown in Fig. (d)d, larger building fronts and walls are segmented from the rest of the scene.
4.2.3 Segmentation
DoN filtering of a point cloud, such as that described in §4.2.2, was found to result in good isolation of points in urban LIDAR scenes. Applying a simple clustering method to the resulting point cloud, results in the clear clustering of many objects of interest in a scene. A simple Euclidean distance threshold based clustering algorithm, (Euclidean Cluster Extraction [14]), was applied with a distance tolerance of , a minimum of cluster points, and a maximum of cluster points. Fig. 5 shows the results of such clustering to DoN filtered scenes of various DoN parameters. Each cluster in the scene is assigned a random (nonunique) color. Figs (c)c(g)g illustrate various clusters in the scene, corresponding to various objects including a person, a traffic light fixture, a window, a car, and a tree.
Such segmentation might form a fundamental preprocessing step in an object recognition pipeline for finding objects in an urban LIDAR scene [18]. The pipeline would include the described clustering method, followed by feeding individual clusters into an object recognition algorithm. Since the the clustering algorithm isolates individual objects (such as cars, people, fire hydrants, etc), DoN clustering enables the use of global object recognition methods that require presegmentation [17].
4.3 KITTI Vision Benchmark Suite
4.3.1 KITTI Dataset
Although the TITAN Urban Mobile LIDAR Data application was motivational in the initial development of the proposed method, it is not a public dataset and does not have readily available ground truth. Thus for a repeatable quantitative evaluation, we used the KITTI Vision Benchmark Suite[2]. The KITTI dataset includes a large number of point clouds along with annotated ground truth bounding boxes for objects of interest to driving and navigation. Each sequence consists of a number of frames, where each frame has an inertially corrected 3D Velodyne point cloud (100k points per frame), and manually annotated 3D object bounding boxes for cars, trucks, trams, pedestrians, and cyclists.
Although the KITTI data also consists of unorganized point clouds, it is far sparser than the TITAN point clouds, and was captured with a single sensor rather than an array of line scanners. Fig. 6 illustrates a sample Velodyne point cloud from a single frame.
4.3.2 Method
In order to evaluate DoN based segmentation, as illustrated in §4.2.3, a set of DoN parameters () and DoN magnitude thresholds were chosen based on the parameter selection algorithm outlined in §4.1. As in the method outlined through sections §4.2.1 through §4.2.3, the DoN was calculated for a sequence of frames (point clouds) after which the DoN magnitude was thresholded by a fixed value () and Euclidean Cluster Extraction [15] was performed with a distance threshold equivalent to the smallest DoN radius and a set of clusters extracted.
For each frame, the set of clusters was compared with the each of the ground truth bounding box labels to identify the cluster with highest intersection. This candidate cluster was then compared with the ground truth point cloud by collecting various statistics.
The main measures used to evaluate quality of the segmentation were precision (ratio of correctly predicted object points to the total number of predicted object points), and recall (ratio of correctly predicted object points to the number of ground truth object points).
Due to the nature of the Velodyne data, in many frames the point clouds within ground truth bounding boxes may consist of very few points (). It was judged that such extremely sparse ground truth objects were unsuitable for evaluating the segmentation of smaller scale objects, and since our clustering algorithm’s minimum threshold was 100 points, for all of the object classes a minimum of 100 points was required for a ground truth point cloud to be used in evaluation.
4.3.3 KITTI Results
Fig. 7 illustrates the results of our evaluation in the form of a precision/recall graph over thousands of ground truth objects on two different sequences in the KITTI dataset, 2011_09_26_drive_0001
(Fig. (b)b, (c)c) and 2011_09_26_drive_0009
(Fig. (d)d). Each data point is of a size proportional to it’s ground truth point cloud’s size (note: scale is not preserved interclass).
The majority of the results have a . However the recall values depend more on the class of object and DoN parameters. Smaller scale objects, such as pedestrians and cyclists have higher recall/precision for the smaller parameters of as can be seen in Fig. (b)b, (c)c. While larger scale objects such as cars and vans have higher recall/precision for larger radii, as can be seen in Fig. (d)d.
While is is difficult to compare the performance of algorithms evaluated on different tests sets (we advocate the usage of the public KITTI dataset), the results appear favourable in comparison with the recall/precision of more computationally intensive mesh and graphbased segmentation methods evaluated on less challenging datasets [4].
4.4 Computational Efficiency
The computation of DoN on a scene requires the calculation of the normal maps and is bound by the nearest neighbor radius search for the largest radius parameter . In practice, for large radii, this can involve calculating the normal to a point using hundreds of thousands of points. Instead an approximation can be calculated by uniformly subsampling the point cloud used for the nearest neighbor search. The current implementation of DoN can do this given a decimation parameter , where the search point cloud is subsampled using a uniform resampling algorithm, with the point cloud coarsely voxelized into voxels of length for the small radius normals and for the large radius normals. It was found that an approximation with results in negligible error, while halving the time for calculating DoN with on the point cloud in Fig. 3 with 478,348 points to 3454.33 ms compared with 7812.5 ms for the full calculation on a 3.2Ghz i7. A preliminary GPU implementation of DoN was found to be an order of magnitude faster, taking only 565.46 ms on an NVIDIA GTX 480 for the full computation.
5 Conclusion and Future Work
The Difference of Normals as a multiscale operator was introduced for unorganized 3D point clouds. Illustrated results on dense urban LIDAR data qualitatively showcased the effectiveness of DoN filtering in keeping points belonging to objects at a given scale, while discarding those belonging to structures of other scales. The application of DoN as a scalebase filtering and segmentation tool was highlighted in urban LIDAR scenes. Results on a typical urban street intersection with clustering showed a clear segmentation of points belonging to various objects of interest at different scales, such as cars, road curbs, trees, and buildings  some having as few as 100 points. With urban LIDAR scenes typically containing millions of points, DoN filtering provides a substantial reduction in points for performing any further processing of the scene.
The quality of DoNbased segmentation was quantitatively evaluated on a large, publicly available dataset of sparse, unorganized urban LIDAR data. Objects such as cars and pedestrians were automatically segmented from the scene and compared with ground truth annotations.
Future work includes the development of a DoN based surface descriptor to exploit the defined scale operator over several radii, and integration with object recognition methods. The development of an interactive semiautomated tool for annotating large scale 3D point clouds in particular would go a long way towards simplifying the generation of GIS models from urban LIDAR point clouds.
Acknowledgments
The authors thank Geodigital Inc. for their support and providing us access to the TITAN data. We also acknowledge the financial support of the Geoide Network Center of Excellence of Natural Sciences and Engineering Research Council of Canada, and the Ontario Centres of Excellence.
References
 [1] M. Alexa, J. Behr, D. CohenOr, S. Fleishman, D. Levin, and C. T. Silva. Computing and rendering point set surfaces. IEEE Transactions on Visualization and Computer Graphics, 9(1):3–15, 2003.

[2]
A. Geiger, P. Lenz, and R. Urtasun.
Are we ready for autonomous driving? the kitti vision benchmark
suite.
In
Computer Vision and Pattern Recognition (CVPR)
, Providence, USA, June 2012.  [3] C. Glennie. Reign of point clouds: A kinematic terrestrial lidar scanning system. Inside GNSS, (Fall), 2007.
 [4] A. Golovinskiy and T. Funkhouser. Mincut based segmentation of point clouds. 2009 IEEE 12th International Conference on Computer Vision Workshops ICCV Workshops, 150:39–46, 2009.
 [5] H. Hoppe, T. DeRose, T. Duchamp, J. McDonald, and W. Stuetzle. Surface reconstruction from unorganized points. In SIGGRAPH ’92: Proceedings of the 19th annual conference on Computer graphics and interactive techniques, pages 71–78, New York, NY, USA, 1992. ACM.
 [6] H. Huang, D. Li, H. Zhang, U. Ascher, and D. CohenOr. Consolidation of unorganized point clouds for surface reconstruction. ACM Trans. Graph., 28(5):1–7, 2009.

[7]
J. Huang and C.H. Menq.
Automatic data segmentation for geometric feature extraction from unorganized 3d coordinate points.
Robotics and Automation, IEEE Transactions on, 17(3):268–279, Jun 2001.  [8] J. J. Koenderink. The structure of images. Biological Cybernetics, 50:363–370, 1984.
 [9] B. Li, R. Schnabel, R. Klein, Z. Cheng, G. Dang, and J. Shiyao. Robust normal estimation for point clouds with sharp features. Computers & Graphics, 34(2):94–106, Apr. 2010.
 [10] T. Lindeberg. Scalespace theory: A basic tool for analysing structures at different scales. Journal of Applied Statistics, 21:224–270, 1994.
 [11] Y. Liu and Y. Xiong. Automatic segmentation of unorganized noisy point clouds based on the gaussian map. ComputerAided Design, 40(5):576–594, 2008.
 [12] D. G. Lowe. Distinctive image features from scaleinvariant keypoints. International Journal of Computer Vision, 60(2):91–110, 2004.
 [13] Maar and Hildreth. Theory of edge detection. In Proceedings of Royal Society of London, pages 187–217, 1980.
 [14] R. B. Rusu. Semantic 3D Object Maps for Everyday Manipulation in Human Living Environments. PhD thesis, Computer Science department, Technische Universitaet Muenchen, Germany, October 2009.
 [15] R. B. Rusu and S. Cousins. 3D is here: Point Cloud Library (PCL). In IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, May 913 2011.
 [16] R. B. Rusu, Z. C. Marton, N. Blodow, and M. Beetz. Persistent point feature histograms for 3d point clouds. autonomous systems 10, page 119, 2008.
 [17] L. Shang and M. Greenspan. Realtime object recognition in sparse range images using error surface embedding. International Journal of Computer Vision., 89(23), 2010.
 [18] B. Taati and M. Greenspan. Local shape descriptor selection for object recognition in range data. Computer Vision and Image Understanding, 115(5):681–694, 2011.
 [19] R. Unnikrishnan and M. Hebert. Multiscale interest regions from unorganized point clouds. Computer Vision and Pattern Recognition Workshop, 0:1–8, 2008.
 [20] A. Witkin. Scalespace filtering: A new approach to multiscale description. In Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP ’84., volume 9, pages 150–153, Mar 1984.
 [21] H. Woo, E. Kang, S. Wang, and K. H. Lee. A new segmentation method for point cloud data. International Journal of Machine Tools and Manufacture, 42(2):167–178, 2002.
 [22] C. Wyatt, E. Bayram, and Y. Ge. Minimum reliable scale selection in 3d. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 28(3):481–487, March 2006.