From Planes to Corners: Multi-Purpose Primitive Detection in Unorganized 3D Point Clouds

01/21/2020 ∙ by Christiane Sommer, et al. ∙ Technische Universität München Stanford University 7

We propose a new method for segmentation-free joint estimation of orthogonal planes, their intersection lines, relationship graph and corners lying at the intersection of three orthogonal planes. Such unified scene exploration under orthogonality allows for multitudes of applications such as semantic plane detection or local and global scan alignment, which in turn can aid robot localization or grasping tasks. Our two-stage pipeline involves a rough yet joint estimation of orthogonal planes followed by a subsequent joint refinement of plane parameters respecting their orthogonality relations. We form a graph of these primitives, paving the way to the extraction of further reliable features: lines and corners. Our experiments demonstrate the validity of our approach in numerous scenarios from wall detection to 6D tracking, both on synthetic and real data.



There are no comments yet.


page 1

page 5

page 6

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Our everyday environments are composed of a large number of man-made structures, that are constructed after careful computer aided design (CAD). As a result, they involve a large portion of simple geometric primitive forms.

Many of those primitives are planar, being either parallel or orthogonal to each other [16]. This renders the issue of discovering perpendicularity relationships in 3D a vital task for low level vision algorithms.

Fig. 1: Steps of our algorithm on CuFusion dataset [48]. We can simultaneously detect orthogonal planes and their intersection lines (b,c), compute the orthogonal relation graph and use it to extract reliable corners with 6D local reference frames (d). Step (c) is intentionally stippled as the unoptimized lines fall behind the surface.

In this paper we first propose a geometric voting-driven method to jointly detect pairs of orthogonal planes in oriented 3D point clouds, without explicitly resorting to segmentation or plane-grouping. We introduce a new local parameterization for orthogonal plane pairs. This allows us to cast votes in only a 2D local accumulator space, making our algorithm more efficient than hypothesis validation, as used in RANSAC-based approaches. Our approach is more reliable in detecting orthogonality than the standard “detect-then-build-graph” approach, since orthogonality in our case is directly deduced from data rather than from intermediate results (such as plane parameters).

Fig. 2: Overview of our method: the detected plane pairs can serve for both corner extraction and planar scene abstraction.

We only cast one vote per point pair, which is significantly less computation than one inlier check on the whole point cloud per pair, as done in RANSAC [40]. Unlike region-growing [9], our algorithm can detect orthogonal pairs under occlusion, where planes can disconnect. The voting is remarkably similar to Hough transform of lines [23, 12] and extraction of intersection lines is achieved at no additional cost. A designated clustering follows the voting stage and subsequently, we build a relation graph out of the detected pairs, where an edge depicts orthogonality between two planes (nodes). Thanks to this graph structure, we can significantly reduce the dimensionality of the joint parameter estimation problem. We then propose a novel, softly-constrained orthgonal refinement loss using this compact re-parameterization, to optimize for the alignment of planes to their support. Finally, we add the next layer of abstraction, in which by detecting triangles in the plane graph, we can arrive at virtual corners decorated with local reference frames (LRFs) directly computed from robustly fitted planes. The virtual points found on the intersection of three-planes are also highly accurate and can thus be used for tracking and ICP registration [1] as we will show. Our method is very efficient and can handle large datasets.

Overall, our algorithm that jointly extracts primitives at different abstraction layers (from planes to corners) contributes in the following (see Fig. 2 for an overview):

  1. [nolistsep]

  2. A novel scheme to vote for orthogonal plane pairs (and thus, their lines of intersection) without segmentation;

  3. An efficient minimization scheme for constrained refinement over the reduced graph;

  4. A 6D corner extractor where a corner is composed of a 3D location and an LRF.

Ii Related Work

We briefly review the literature concerned with extraction of 3D planar structures. For further details and references, we point the reader to a recent, extensive review on the general primitive detection from 3D data [26].

Plane Detection

While detecting 3D planar structures is harder and rather ill posed in 2D domain [46, 32], the problem is well studied in the 3D domain. Borrmann et al. use an enhanced Hough accumulator to vote for 3D planes [7]. Yang and Förstner [47] proposed a RANSAC based scheme, which is widely applied in different computer vision tasks. They use minimum description length (MDL) to deal with several competing hypotheses. Schnabel et al. [40] generalized the RANSAC based approach to detect different primitives in point clouds, including planes. Deschaud et al. [9] as well as Feng et al. [14]

used filtered normals and voxel growing, a 3D analogous of region-growing, to devise a fast and accurate split-and-merge scheme. Further studies incorporated post-clustering and outlier elimination steps to robustify the pipelines 

[36, 38, 21, 45]. Drost and Ilic [10] introduced a Local Hough Voting scheme to retrieve multiple planes without segmentation, thanks to the use of point pair features. Similar to [2, 3], their algorithm addresses the discovery of other primitives such as spheres and cylinders but not pairs of orthogonal planes. It is also worth mentioning that using planes along with other primitives in a joint manner to approximate the objects has been tackled by many [10, 31, 44, 41].

Orthogonality in Action

Besides the Manhattan World reconstruction, a direct and widely accepted application of orthogonality to 3D data is SLAM (simultaneous localization and mapping). Many studies used the orthogonal planes as constraints to perform SLAM [30, 42, 29, 39, 20, 35, 28], or to aid robotic navigation [18, 33]. The common approach is to formulate SLAM to account for the orthogonality and directly use it in the pipeline. These works do not explicitly address the detection of the orthogonalities though.

The methods which consider problems similar to ours are [17, 25, 10]. In [17], Garcia et al. develop a box recognition algorithm. Many works in this family use triplets of points to define a plane. For the case of oriented point sets, this is an over-parameterization. Drost and Ilic [10] used point pair features for primitive detection. While their method is similar to ours, they only detect single primitive instances, without relations inbetween them. Jiang and Xiao [25] detect cuboids in images, but they do not on unstructured 3D data, as we do in this paper. Analogous to us, GlobFit [31] and Oesau et al. [37] use geometric regularization terms and relation graphs to position their primitives, but in contrast to us, these methods split the detection and relation graph building stage. Furthermore, they are designed for clean settings.

Iii Detection of Orthogonal Plane Pairs

Our method is separated into two stages: detection and refinement. The former extracts simplified point pair features [11, 4, 6] from the data, shows how to define orthogonality and devises a novel voting/clustering scheme for discovering orthogonal plane hypotheses. The latter simultaneously optimizes for all the parameters in the plane graph. Finally, the intersection points of the triangles in this plane graph lead to accurate corners.

Orthogonal Point Pair Features

The input to our method is a point set together with normals . Note that the normals do not need to be consistently oriented, so if no normals are given, we can easily compute them by fitting planes to local neighborhoods. We parameterize a 3D plane by a point and its normal. To characterize the orthogonal planes, we will speak of a pair of points which constitute the minimal set for defining an orthogonal plane pair (OPP). Imagine a pair of points with normals , as shown in Fig. 3. Let

be the vector joining two points, i.e.

. If each of the two points lies on a plane, the condition for the two planes to yield an orthogonal configuration is


This can easily be re-written in terms of the scalar product , allowing for an efficient computation. Yet, the data used in real life, e.g. from RGB-D sensors, never obey strict equality constraints. Hence, we introduce a noise threshold, maintaining certain tolerance: , where trades off noise tolerance vs accuracy. To make sure the two planes intersect at a meaningful point, one can further introduce a distance constraint: , where is a threshold.

Having all pair relations, we could now define the used point pair features (PPF) similar to [11, 10]. Since we do not need the actual angles, but can define the two constraints in terms of scalar products, we introduce simplified “features”, that do not need trigonometry operations:


It will become clear in the voting section why we also keep the second and third components of .

Fig. 3: (left) The concave geometric configuration that we are interested in. We jointly detect orthogonal planes and 3D lines, extract their relationship graph and obtain the corners. (right) The voting parameters shown in the 2D Cartesian system. is chosen such that .

Local Hough Voting

Given a point pair, the definition of orthogonal planes is immediate – the normals of two points uniquely define the orthogonal planes. However, finding the best candidate requires care if the scene is cluttered and occluded. The trivial option is to perform RANSAC [15], where random point pairs are tested for the satisfaction of the orthogonal plane constraints . While this is straightforward, it needs an inlier search at each step, making the whole procedure time consuming for large point sets. Local Hough voting, as in [10], circumvents this problem.

Thus, we evaluate the aforementioned constraints and create a voting table, similar to [10]. We sample the scene for reference points , each of which is paired with a maximum of other points in a -neighborhood of found on the sampled scene to compute the PPF. Each pair which satisfies the constraint casts a vote for the parameterization of the orthogonal planes to a local accumulator initiated per each reference point . While [10] uses such local voting for the detection of individual geometric primitives (planes, spheres, cylinders), we propose to port this idea to the detection of geometric relations inbetween planar primitives.

Once an oriented pair of points is found to be an OPP, the reference point defines the first plane. The orthogonal counterpart

, can freely rotate around the normal of the reference and is also free to slide orthogonally on this infinite reference plane. All such transformations result in the same PPF. Thus, we must resolve two degrees of freedom. We represent the second plane in 2D polar space, with respect to the reference: (

, ). denotes the normal direction (which, being parallel to the reference plane, only has one degree of freedom), and is the orthogonal distance from the intersection line to the point of reference . The vote (, ) can be cast in 2D space, by transforming the point pair to the origin, and aligning with the -axis using a matrix as in Fig. 3. In analogy to Hough Transform of lines [23, 12], the variables of the local voting space read:


Voting is performed locally for each reference point, resulting in and . For each reference point, two pairs describing the same set of orthogonal planes vote for the same and . The voting also requires quantization of this local reference frame, which can be chosen reasonably depending on the problem size. The parameters with the maximal vote are taken to represent the most likely OPP and stored for each reference point if the vote count exceeds a count threshold . This is important to make sure that very noisy reference points will not get accepted, and thus have an implicit noise handling. Note that this approach is semi-global and can recover the parameters even under severe occlusion [11].

In order to ensure that the reference point actually lies on a plane, we additionally track the number of paired points which are co-planar with , i.e. and only insert a plane pair into the list of candidates if this number exceeds .

Clustering and Graphical Representation

Rough detection results in an OPP hypothesis per selected reference point, giving rise to a pool of solutions which are to be clustered and merged. To this end, we use a disjoint forest clustering scheme [13] backed by a union-find structure. Planes are compared by computing the distance of each plane’s reference point to the other plane.

In order to store all of the retrieved planes and their orthogonality relations, we choose a graph data structure: each plane is a vertex in the graph , and two planes and are connected by an edge if they intersect and are orthogonal. Special structures in the graph translate to special plane configurations, e.g. a triangle represents a corner in the point cloud and is endowed with the LRF composed of the normals of the triplet of orthogonal planes surrounding it. This can for instance be used as a 6D feature for tracking or scan registration. Note that our graph structure is similar to the one proposed in GlobFit [31], but with the difference that our graph is built during detection, whereas GlobFit separates the detection, relation extraction and graph building stages.

Iv Refinement of Orthogonality Primitives

Due to sampling of the scene, quantization of the voting space, noise and artifacts, the orthogonal fitting obtained up to this point is only a rough estimate of the real pose of planes. Even though for certain applications this might well be sufficient, a refinement is still crucial for applications demanding accuracy. For that purpose, the most straightforward approach is a modified ICP-like non-linear optimization procedure, in which the distance from the points to the orthogonal planes are jointly minimized. While this has been done before in a very simple, unconstrained setting [1], we show how to use such modified ICP for joint plane refinement that respects the inter-plane geometric constraints – first for corners, which we parameterize efficiently in , and second for a multi-plane setting, where we show how graph reduction can strictly enforce parallelity. We take advantage of the closed form expression of point-to-plane distances in order to avoid the costly nearest neighbor search. This way, we achieve a highly efficient method.

Corner Refinement

As mentioned earlier, we can retrieve corners in the given point cloud by finding triangles in the plane graph . A corner found on the intersection of three orthogonal planes has six degrees of freedom, which can be used for tracking and scan registration. We formulate the objective function for corner refinement as:


where , and denote the mutually orthogonal planes, is the point cloud and is the point-to-plane distance. Without further constraints, this energy has no orthogonality-preserving nature. In order to model this constraint, while still efficiently parameterizing the energy in Eq. (5), we rewrite our triplet as a tuple of three orthogonal normals . The remaining three parameters to fully characterize the corner are the distances , , of the planes from the origin. Thus the corner refinement energy in Eq. (5) becomes


with and . This parameterization also endows the corner with an LRF that is unique up to sign flips. Note that the initialization of Sec. III does automatically ensure the orthogonality of plane pairs. Yet, when it comes to the mutually orthogonal triplets this is no longer the case. To ensure, we make use of the fact that the frame composed of the triplet normals has a diffeomorphic mapping to and project onto . It is important to make sure that is not a reflection but a rotation: we switch the order of and if . is 3D and can be re-parameterized using twist-coordinates [34] for efficient optimization on the manifold without resorting to costly constrained optimization.

In a real-world setting, if we want to use corners for tracking or alignment, we need to only use data points which are close to the 3D corner , to avoid outliers. is given by . Thus, we define a subset


on which we perform the optimization, i.e. we instead minimize . Strictly speaking, implicitly depends on the plane parameters. However, in practice, we use the initial estimate to select the point set , and then keep fixed.

Multi-plane Refinement and Parameter Reduction

For geometry refinement, it is important that we can refine all planes in a scene jointly. The unconstrained energy for this scenario is


Again, this energy totally lacks the notion of orthogonality between planes. Furthermore, in the case that the graph contains vertex groups of a specific structure, we can in addition to orthogonality deduce which planes are parallel [31], which is also not being taking into account in Eq. (8). We address the two constraint types (orthogonal and parallel) differently:

first, we re-structure our graph by combining parallel planes into one node, where each node is endowed with a list of distances . Then, we write the energy as


This way, the normal vector for each set of parallel planes needs to be optimized only once, significantly reducing the number of unknowns. Thus, the parallelity constraint is enforced by re-parameterization. Second, we add an orthogonality regularizer to :


with the edge set , resulting in the regularized energy:


Since reprojection to the feasible set of parameters is not straightforward, needs to be large enough to implicitly enforce the orthogonality between planes. In order to avoid further constraints on unit length of , we use the on-manifold optimization following the -parameterization given in [41] to achieve for all . Note that our refinement is more principled than the iterative approximation of the constraint satisfaction in GlobFit [31]

and relies less on heuristics: we avoid inputting a fixed points-to-planes assignment to the refinement. Rather, the cost function in Eq. (

8) by construction re-assigns points to their closest plane in each iteration of a minimization scheme. This procedure is more tolerant to wrong assignment in the detection phase and thus more accurate by construction. This also explains why GlobFit expects good initialization and clean input or else quickly gets stuck in a local minimum.

For robustness, we add an M-estimator to the data term . In terms of computational complexity, we keep costs for the point-to-plane assignment low in two ways: (1) we only compute the point-to-plane distance if the angle between point normal and plane normal is below a certain threshold , and (2) for each , we sort the according such that the time to find the is halved.

Modality Algorithm Pr. Rec. #Cor. Noise Miss AHC [14] + R.G. 0.97 0.81 9.77 0.33 2.23 O-Planes Schnabel [40] + R.G. 0.85 0.59 7.13 1.33 4.87 Ours 0.88 0.87 9.76 1.38 1.83 AHC [14] + R.G. 0.74 0.65 8.53 3.63 4.67 Lines Schnabel [40] + R.G. 0.57 0.34 4.43 5.20 8.77 Ours 0.77 0.73 8.86 2.90 3.97
Fig. 4: Orthogonal plane detection on the Orthogonal SegComp/ABW [22] dataset. We visualize the extracted line and corner primitives on the left. The table on the right reports the precision (Pr), recall (Rec), number of correct detections (#Cor) as well as false positives (Noise) and false negatives (Miss). O-Planes refers to the results of detecting planes that have orthogonal pairs. This corresponds to the vertices in our relation graph (R.G.). Lines refers to evaluating the edges of the graph, corresponding to the orthogonal planes. This one evaluates the performance of 3D line extraction.

Application: Corner-assisted ICP Registration

The 3D position of a corner can be anywhere in 3D space – in particular, it does not need to exactly co-incide with a data point. Thus, the accuracy of depends only very weakly on the sampling density of the given point set, and much more on its noise level. This way, we can see the corners as super-resolved key points in our point cloud. We use this fact in order to improve ICP-based registration of two point clouds: on the one hand, we can use the corners for coarse alignment of two scans. On the other, the refinement, which is typically done via ICP, can also take advantage of the corners: For a set of corners in the target point cloud and the corresponding set in the coarsely aligned source point cloud, we know that the -transform bringing the two into correspondence has to satisfy:

i.e. the transformation needs to align corresponding corners. We explicitly choose to only align the 3D positions of corners, since the rotation precision we obtain in an -neighborhood (which we use to find the corner position) easily becomes too low if scans are to be aligned globally. Note that the set can have more than only one element. depends on the number of corner correspondences:

  • [noitemsep]

  • If there are at least three corners that do not all lie on one line, and the unique minimizer is given by the Kabsch algorithm [43].

  • If two corners are present, or more corners that lie on one single line, and the elements in differ by rotation with an angle about that line.

  • If there is only one corner, with


In order to align the two point sets and , the ICP algorithm minimizes the cost


with being the nearest neighbor of . Having super-resolved corner positions, instead of minimizing over all , we constrain the problem to . This gives rise to new energies and , with lower-dimensional domains. The advantages of this dimensionality reduction are two-fold: not only does lower-dimensional optimization converge faster, but also, less data points are needed for the optimizer to converge to a minimum at all. Briefly stated, we can use the high accuracy of corner positions to either completely omit ICP ( corners), or to constrain the ICP problem such that a minimum can be found with less computation. We will also demonstrate this in the experiments section.

V Experimental Evaluation


In order to demonstrate the broad applicability of our proposed method, we evaluate our multi-purpose primitives on different datasets including SceneNN [24], ICL-NUIM [19], Cu3D [48] and Redwood [8]. It is noteworthy that for the task of primitive detection and discovery there are not many designated datasets. Due to the availability of ground truth (GT) segmentation and comparison metrics, we choose to augment the seminal SegComp dataset [22] with orthogonal planes, resulting in the augmented Orthogonal SegComp (O-SegComp). SegComp is a database of 30 scenes taken with a laser scanner. To create O-SegComp, we first fit planes robustly to the GT segmentation to extract GT plane parameters. We then build the relation graph and keep only those planes that have orthogonal counterparts. Ground truth data for O-SegComp thus consists of a subset of the SegComp planes, together with orthogonality information.

(a) Rotational error on CuFusion bunny sequence
(b) Translational error on CuFusion bunny sequence
(c) Rotational error on Redwood kiosk sequence
(d) Translational error on Redwood kiosk sequence
Fig. 5: Rotational and translational errors in corner assisted ICP on the CuFusion [48] and Redwood-Kiosk [8] datasets.

Implementation Details

We use the Ceres solver ( for energy minimization in all experiments. In particular, we locally parameterize via Sophus ( for the corner refinement in Eq. (6). For the multi-plane graph refinement in Eq. (11), we use a local parameterization of  [41] to represent the unit length normals. Prior to operation, we downsample large point sets to ensure spatial uniformity [5]. In particular, we sample the points that are at least apart and average the samples reducing the noise, whenever present. To preserve the efficacy, we apply a coarse-to-fine refinement, where the optimizer uses a hierarchy of samplings gradually increasing the resolution and hence enhancing and accelerating convergence. We compute the surface normals, which don’t need to be consistently oriented, by local plane fits. Our code, together with some pseudocode for easier understanding can be found here:

Choice of Parameters

Starting from the parameters given in [11, 10], we experimented with different settings to find the optimal trade-off between speed and accuracy, arriving at the following: We are using a set of 500–2000 reference points (low if speed is critical, high for higher accuracy) and pair them with about 250 points in a -neighborhood, where . The normal threshold is set to in all experiments, and the voting bin sizes for and are and , respectively. We accept the bin with the highest vote as plane pair candidate if . The parameter for downsampling the point cloud is chosen adaptively, based on size and shape of the point set.

Fig. 6: Detection of orthogonality primitives on an ICL-NUIM scene.

V-a Quantitative Results

Detection of Planes and Intersection Lines

We begin by evaluating the ability of our algorithm in extracting planes that belong to an orthogonal pair. We use O-SegComp for this and we report precision, recall, number of correct detections (true positives), noise (false positives) and misses (false negatives). Depicted as O-Planes in Fig. 4, our results are comparable to those of the state of the art.

To evaluate our second layer primitives, we intersect the planes of the Orthogonal SegComp, yielding the ground truth intersection lines. We then use an analogous evaluation metric to the plane case. We consider a line to be correctly detected if the angle it makes with the ground truth match is less than

. We report the result in Fig. 4. Note that there is a drop in the performance as opposed to the plane case. This is because missing a single plane yields an entire set of missing lines. Nevertheless, our approach still maintains a recall of with a precision of , better than using plane detectors of AHC [14] and Schnabel [40], together with subsequent relation graph building. We would like to emphasize that these results suggest that joint detection of plane parameters and orthogonality is a promising research direction, as it leads to better detection of orthogonality compared to the standard “detect-then-build-graph” approach. In the same figure, we also show the quality of our detection.

Corner-Assisted ICP Registration

As described earlier, sets of corresponding corners can be used to constrain the domain of the registration energy. To this end, we augment a standard implementation of the ICP algorithm by taking into account the corners we detect in a pair of scans. This experiment is to be understood as a proof of principle: we show that reliably extracted corners can improve a standard tracking/registration algorithm by comparing the baseline (no corners) to different corner-assisted modalities. In order to demonstrate the effect of super-resolved corners on ICP registration, we sample random scenes from the CuFusion bunny sequence [48] as well as Redwood Kiosk sequence [8] and run a pairwise registration. We plot the median of the relative pose error (RPE) against the downsampling factor for scans that are temporally about and seconds apart for CuFusion and Redwood sequences respectively. Fig. 5 shows the errors attained at the rotational and translational components. We consider the relative poses obtained from the DVO RGB-D odometry [27] as ground truth. This algorithm also uses the RGB information, so it is consistently more accurate, which justifies our GT choice. The figure shows that, while being comparable to the RPE of full 6DoF-ICP for dense sampling, the RPE of corner-assisted low-DoF tracking increases much slower and thus ICP-1D shows more stability even for high downsampling factors. Constraining the possible solution set of ICP to a lower-dimensional set thus in particular pays off if only sparse data is available or if the sampling density for any reason must be held low. In Fig. 9, we additionally show that both the time and the number of iterations until convergence go down for lower DoF ICP, which is expected, since the space of parameters that we optimize over is smaller.

Fig. 7: Cumulative plots of rotational and translational RPE on the ICL-NUIM livingroom sequence after single-corner alignment. For an error , we plot the percentage of successful alignments with an error below . We find that more than 90% of matches have a rotation error below 0.5 degrees, and close to 90% of matches have less than 2cm translation error.

Corner Alignment on ICL-NUIM

To asses the quality of 6D alignment of scans by matching detected corners, we do pairwise alignment of a subset of all frames by sampling each 15th frame of the sequence. Matching is considered successful if the overlap of the two scans after alignment is sufficiently large. For all pairs of successfully matched scans, we compute the relative pose error. Fig. 6 (left) shows the detected primitives whereas Fig. 7 reports the cumulative plots for the rotational and translational RPE components. Like the corner-assisted ICP, this experiment serves as a proof of concept. We are not aware of any other global registration algorithm that registers scans only based on one single 6D feature.

Fig. 8: 3D segmentation on SceneNN [24] by labeling the planes that are found to have at least one orthogonal “partner”. Note, the chairs in the lower-left are assigned a common plane, typically hard to achieve by region-growing [14].
(a) Computation time per point on CuFusion bunny sequence
(b) Average iteration count on CuFusion bunny sequence
Fig. 9: Statistics in corner assisted ICP registration on the CuFusion [48] bunny sequence.


On a desktop PC with an Intel Xeon CPU @3.5 GHz a single-threaded implementation needs about 10ms for voting and candidate extraction, and between 80-300ms for graph refinement, depending on the scene complexity. Thus, we are orders of magnitudes faster than GlobFit, which can (even without re-assigning points to planes during iterations) take 3-5min for a point cloud of comparable size. Other methods [40, 14] do not refine the inter-plane configurations, which renders a fair time comparison hard. The efficient RANSAC [40] implementation in CGAL ( needs about 70ms for extracting planes on the full cloud, and roughly 7ms for 2000 points. After plane extraction, a subsequent refinement is necessary to obtain accurate planes. AHC [14] can extract planes at more than 35Hz from depth data at VGA resolution, but it needs structured point clouds, whereas we can work with any type of point set. Further, AHC does not set planes in context to one another, which is the main point of our proposed method.

V-B Qualitative Evaluations

Orthogonal Plane Segmentation in Real Data

In Fig. 8 we show the success of the detected orthogonal plane pairs in semantically summarizing scenes where perpendicular plane configurations are dominant. This is typically the case for our man-made indoor environments. Thus, we choose the SceneNN [24] dataset and run our detector. As we do not explicitly result in a segmentation map, but directly compute plane parameters, we assign points to planes by considering a closeness-threshold and a normal coherence.

3D Reconstruction via Corner Alignment

Our final experiment involves reconstruction via detection, where a collection of unordered scans are processed to estimate pairwise transformations. Typically, the desired relative poses are found by some form of descriptive matching be it global or local. In 3D, most descriptors suffer from the ambiguities in the LRF. At this point our 6D corners can be helpful providing principled and reliable means of registration. We illustrate this in Fig. 10, where a scan of Kiosk and a scan of Drawer from the Redwood dataset [8] are brought into a consistent global alignment by registering the LRF of the found corners. Just like for the pairwise corner alignment, we consider the matching successful if the overlap after transformation is sufficiently large.

Fig. 10: Our accurate corners are used to align multiple scans. We qualitatively evaluate this 6D registration on the Redwood dataset [8]. For each subfigure, we show from left to right: (1) The RGB image (only used for visualization), (2) input scans overlayed on top of each other in the local coordinate frame of each camera, (3) scans shown after alignment to the frame of the first camera.

Vi Conclusion

We design a joint detection-refinement pipeline for orthogonal planes and higher-level primitives, such as lines and corners of intersection, on sparse or dense point sets. This is the first work incorporating semi-global PPFs into a local voting framework for this purpose. Our novel 2D local parametrization is sufficient to establish the full (5D) pose of orthogonal plane configurations. The method alleviates discretization artifacts from at least three of the parameters, while maintaining speed and accuracy. We can detect multiple orthogonal plane pairs and cluster them to describe the 3D geometry of the environment. Thanks to the optimization step, all the approximate orthogonal configurations detected in 3D point clouds can be refined up to machine precision and sensor noise yielding a very precise fit. In the future, we will extend our framework to even higher-level primitives like boxes etc. and use our orthogonal planes in SLAM.


  • [1] P. J. Besl and N. D. McKay (1992) Method for registration of 3-D shapes. In Robotics-DL tentative, pp. 586–606. Cited by: §I, §IV.
  • [2] T. Birdal, B. Busam, N. Navab, S. Ilic, and P. Sturm (2018) A minimalist approach to type-agnostic detection of quadrics in point clouds. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    pp. 3530–3540. Cited by: §II.
  • [3] T. Birdal, B. Busam, N. Navab, S. Ilic, and P. Sturm (2019) Generic primitive detection in point clouds using novel minimal quadric fits. IEEE transactions on pattern analysis and machine intelligence. Cited by: §II.
  • [4] T. Birdal and S. Ilic (2015)

    Point pair features based object detection and pose estimation revisited

    In 3D Vision (3DV), 2015 International Conference on, pp. 527–535. Cited by: §III.
  • [5] T. Birdal and S. Ilic (2017) A point sampling algorithm for 3D matching of irregular geometries. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6871–6878. Cited by: §V.
  • [6] T. Birdal and S. Ilic (2017) Cad priors for accurate and flexible instance reconstruction. In Proceedings of the IEEE International Conference on Computer Vision, Cited by: §III.
  • [7] D. Borrmann, J. Elseberg, K. Lingemann, and A. Nüchter (2011) The 3D Hough transform for plane detection in point clouds: a review and a new accumulator design. 3D Research 2 (2). Cited by: §II.
  • [8] S. Choi, Q. Zhou, S. Miller, and V. Koltun (2016) A large dataset of object scans. arXiv:1602.02481. Cited by: Fig. 10, Fig. 5, §V, §V-A, §V-B.
  • [9] J. Deschaud and F. Goulette (2010) A fast and accurate plane detection algorithm for large noisy point clouds using filtered normals and voxel growing. Proceedings of 3D Processing, Visualization and Transmission Conference. Cited by: §I, §II.
  • [10] B. Drost and S. Ilic (2015-10) Local Hough transform for 3D primitive detection. In 3D Vision, International Conference on, Cited by: §II, §II, §III, §III, §III, §V.
  • [11] B. Drost, M. Ulrich, N. Navab, and S. Ilic (2010) Model globally, match locally: efficient and robust 3D object recognition. In Conference on Computer Vision and Pattern Recognition, Cited by: §III, §III, §III, §V.
  • [12] R. O. Duda and P. E. Hart (1972) Use of the Hough transformation to detect lines and curves in pictures. Communications of the ACM. Cited by: §I, §III.
  • [13] P. F. Felzenszwalb and D. P. Huttenlocher (2004) Efficient graph-based image segmentation. International journal of computer vision (2). Cited by: §III.
  • [14] C. Feng, Y. Taguchi, and V. R. Kamat (2014)

    Fast plane extraction in organized point clouds using agglomerative hierarchical clustering

    In International Conference on Robotics and Automation, Cited by: §II, Fig. 4, Fig. 8, §V-A, §V-A.
  • [15] M. A. Fischler and R. C. Bolles (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24 (6). Cited by: §III.
  • [16] Y. Furukawa, B. Curless, S. M. Seitz, and R. Szeliski (2009) Manhattan-world stereo. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 1422–1429. Cited by: §I.
  • [17] S. Garcia (2009) Fitting primitive shapes to point clouds for robotic grasping. Master of Science Thesis. Royal Institute of Technology, Stockholm, Sweden. Cited by: §II.
  • [18] N. I. Giannoccaro, L. Spedicato, and C. di Castri (2012) A new strategy for spatial reconstruction of orthogonal planes using a rotating array of ultrasonic sensors. IEEE Sensors Journal 12. Cited by: §II.
  • [19] A. Handa, T. Whelan, J. McDonald, and A. J. Davison (2014) A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §V.
  • [20] A. Harati and R. Siegwart (2007) Orthogonal 3D-SLAM for indoor environments using right angle corners. In Proceedings of the 3rd European Conference on Mobile Robots: ECMR 2007; September 19-21, 2007, Freiburg, Germany, pp. 144–149. Cited by: §II.
  • [21] D. Holz, S. Holzer, R. B. Rusu, and S. Behnke (2011) Real-time plane segmentation using RGB-D cameras. In Robot Soccer World Cup, pp. 306–317. Cited by: §II.
  • [22] A. Hoover, G. Jean-Baptiste, X. Jiang, P. J. Flynn, H. Bunke, D. B. Goldgof, K. Bowyer, D. W. Eggert, A. Fitzgibbon, and R. B. Fisher (1996) An experimental comparison of range image segmentation algorithms. IEEE transactions on pattern analysis and machine intelligence 18 (7), pp. 673–689. Cited by: Fig. 4, §V.
  • [23] P. V. C. Hough (1962) Method and means for recognizing complex patterns. Note: US Patent 3,069,654 Cited by: §I, §III.
  • [24] B. Hua, Q. Pham, D. T. Nguyen, M. Tran, L. Yu, and S. Yeung (2016) Scenenn: a scene meshes dataset with annotations. In International Conference on 3D Vision (3DV), Cited by: Fig. 8, §V, §V-B.
  • [25] H. Jiang and J. Xiao (2013) A linear approach to matching cuboids in RGBD images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2171–2178. Cited by: §II.
  • [26] A. Kaiser, J. A. Ybanez Zepeda, and T. Boubekeur (2018) A survey of simple geometric primitives detection methods for captured 3D data. In Computer Graphics Forum, Cited by: §II.
  • [27] C. Kerl, J. Sturm, and D. Cremers (2013) Robust odometry estimation for RGB-D cameras. In 2013 IEEE International Conference on Robotics and Automation, pp. 3748–3754. Cited by: §V-A.
  • [28] P. Kim, B. Coltin, and H. Jin Kim (2018) Linear RGB-D SLAM for planar environments. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 333–348. Cited by: §II.
  • [29] P. Kohlhepp, G. Bretthauer, M. Walther, and R. Dillmann (2006) Using orthogonal surface directions for autonomous 3D-exploration of indoor environments. In 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3086–3092. Cited by: §II.
  • [30] G. H. Lee, F. Fraundorfer, and M. Pollefeys (2011-05) MAV visual SLAM with plane constraint. In Robotics and Automation (ICRA), 2011 IEEE International Conference on, pp. 3139–3144. External Links: Document, ISSN 1050-4729 Cited by: §II.
  • [31] Y. Li, X. Wu, Y. Chrysathou, A. Sharf, D. Cohen-Or, and N. J. Mitra (2011) GlobFit: consistently fitting primitives by discovering global relations. In ACM Transactions on Graphics (TOG), Cited by: §II, §II, §III, §IV, §IV.
  • [32] C. Liu, K. Kim, J. Gu, Y. Furukawa, and J. Kautz (2018) PlaneRCNN: 3D plane detection and reconstruction from a single image. arXiv preprint arXiv:1812.04072. Cited by: §II.
  • [33] M. Mura, S. Parrini, G. Ciuti, V. Ferrari, C. Freschi, M. Ferrari, P. Dario, and A. Menciassi (2016) A computer-assisted robotic platform for vascular procedures exploiting 3D us-based tracking. Computer Assisted Surgery 21 (1), pp. 63–79. Cited by: §II.
  • [34] R. M. Murray (2017) A mathematical introduction to robotic manipulation. CRC press. Cited by: §IV.
  • [35] V. Nguyen, A. Harati, and R. Siegwart (2007) A lightweight SLAM algorithm using orthogonal planes for indoor mobile robotics. In IEEE/RSJ Conference on Intelligent Robots and Systems, Cited by: §II.
  • [36] B. Oehler, J. Stueckler, J. Welle, D. Schulz, and S. Behnke (2011) Efficient multi-resolution plane segmentation of 3D point clouds. In International Conference on Intelligent Robotics and Applications, pp. 145–156. Cited by: §II.
  • [37] S. Oesau, F. Lafarge, and P. Alliez (2016) Planar shape detection and regularization in tandem. In Computer Graphics Forum, Vol. 35, pp. 203–215. Cited by: §II.
  • [38] S. Oßwald, J. Gutmann, A. Hornung, and M. Bennewitz (2011) From 3D point clouds to climbing stairs: a comparison of plane segmentation approaches for humanoids. In Humanoid Robots (Humanoids), 2011 11th IEEE-RAS International Conference on, Cited by: §II.
  • [39] K. Pathak, A. Birk, N. Vaskevicius, M. Pfingsthorn, S. Schwertfeger, and J. Poppinga (2010) Online three-dimensional SLAM by registration of large planar surface segments and closed-form pose-graph relaxation. Journal of Field Robotics 27 (1), pp. 52–84. Cited by: §II.
  • [40] R. Schnabel, R. Wahl, and R. Klein (2007-06) Efficient RANSAC for point-cloud shape detection. Computer Graphics Forum 26 (2), pp. 214–226. Cited by: §I, §II, Fig. 4, §V-A, §V-A.
  • [41] C. Sommer and D. Cremers (2018-Sep.) Joint representation of primitive and non-primitive objects for 3D vision. In 2018 International Conference on 3D Vision (3DV), Vol. , pp. 160–169. External Links: Document, ISSN 2475-7888 Cited by: §II, §IV, §V.
  • [42] A. J. Trevor, J. G. Rogers, and H. I. Christensen (2012) Planar surface SLAM with 3D and 2D sensors. In Robotics and Automation (ICRA), 2012 IEEE International Conference on, pp. 3041–3048. Cited by: §II.
  • [43] S. Umeyama (1991) Least-squares estimation of transformation parameters between two point patterns. IEEE Transactions on Pattern Analysis & Machine Intelligence (4), pp. 376–380. Cited by: item -.
  • [44] J. Wang and K. k. Xu (2016) Shape detection from raw LiDAR data with subspace modeling. IEEE Transactions on Visualization and Computer Graphics PP (99), pp. 1–1. External Links: Document, ISSN 1077-2626 Cited by: §II.
  • [45] J. Xiao, J. Zhang, B. Adler, H. Zhang, and J. Zhang (2013) Three-dimensional point cloud plane segmentation in both structured and unstructured environments. Robotics and Autonomous Systems 61. Cited by: §II.
  • [46] F. Yang and Z. Zhou (2018)

    Recovering 3D planes from a single image via convolutional neural networks

    In Proceedings of the European Conference on Computer Vision (ECCV), pp. 85–100. Cited by: §II.
  • [47] M. Y. Yang and W. Förstner (2010) Plane detection in point cloud data. In Proceedings of the 2nd int conf on machine control guidance, Bonn, Vol. 1, pp. 95–104. Cited by: §II.
  • [48] C. Zhang and Y. Hu (2017) CuFusion: accurate real-time camera tracking and volumetric scene reconstruction with a cuboid. Sensors 17 (10), pp. 2260. Cited by: Fig. 1, Fig. 5, Fig. 9, §V, §V-A.