SLOAM: Semantic Lidar Odometry and Mapping for Forest Inventory

by   Steven W. Chen, et al.
Universidade de São Paulo

This paper describes an end-to-end pipeline for tree diameter estimation based on semantic segmentation and lidar odometry and mapping. Accurate mapping of this type of environment is challenging since the ground and the trees are surrounded by leaves, thorns and vines, and the sensor typically experiences extreme motion. We propose a semantic feature based pose optimization that simultaneously refines the tree models while estimating the robot pose. The pipeline utilizes a custom virtual reality tool for labeling 3D scans that is used to train a semantic segmentation network. The masked point cloud is used to compute a trellis graph that identifies individual instances and extracts relevant features that are used by the SLAM module. We show that traditional lidar and image based methods fail in the forest environment on both Unmanned Aerial Vehicle (UAV) and hand-carry systems, while our method is more robust, scalable, and automatically generates tree diameter estimations.


page 1

page 5

page 7


SC-LiDAR-SLAM: a Front-end Agnostic Versatile LiDAR SLAM System

Accurate 3D point cloud map generation is a core task for various robot ...

LU-Net: An Efficient Network for 3D LiDAR Point Cloud Semantic Segmentation Based on End-to-End-Learned 3D Features and U-Net

We propose LU-Net -- for LiDAR U-Net, a new method for the semantic segm...

Automatic 3D Mapping for Tree Diameter Measurements in Inventory Operations

Forestry is a major industry in many parts of the world. It relies on fo...

Place Recognition in Forests with Urquhart Tessellations

In this letter we present a novel descriptor based on polygons derived f...

Large-scale Autonomous Flight with Real-time Semantic SLAM under Dense Forest Canopy

In this letter, we propose an integrated autonomous flight and semantic ...

Graph-based methods for analyzing orchard tree structure using noisy point cloud data

Digitisation of fruit trees using LiDAR enables analysis which can be us...

UPSLAM: Union of Panoramas SLAM

We present an empirical investigation of a new mapping system based on a...

I Introduction

Obtaining accurate timber inventory is crucial for forest managers. Foresters currently use Terrestrial Laser Scanning (TLS) sensors [5, 11] to extract tree metrics with high accuracy and reduced manpower [5]. However, these sensors need to be moved around to measure the timber inventory at multiple locations for full coverage of the forest, requiring additional time and manpower. We propose a robotic timber cruise involving a robot navigating a stand of timber, measuring samples, and estimating the total forest volume. Fig. 1 shows our Unmanned Aerial Vehicle (UAV) used in this work, and the resulting estimated timber map. This approach provides increased data granularity allowing foresters to make optimized management and harvesting decisions.

Deploying UAV systems is appealing for a variety of applications including fruit counting [3, 12], disaster management [10], and penstock inspection [16]. Previous works [17, 4, 15, 8] focus on autonomous navigation in cluttered [8] or GPS-denied environments [17]. Many works employ UAVs in forests [4, 15, 8, 9, 14, 26, 19]. Most enable autonomous navigation by detecting and following forest trails using learning-based approaches [15, 14, 9]. Cui et al. [4] uses a 2D laser range finder to navigate autonomously, but makes the assumption that the UAV height does not drastically change. More recently, [8] develops a UAV platform equipped with a 3D lidar and IMU for navigating in timber environments and building point cloud maps, and [26, 19] review the UAV applications in forestry. None of these prior works focus on the problem of estimating timber volume.

Figure 1: Our UAV performing a timber inventory acquisition by simultaneously estimating its state and the tree diameters. Bounding boxes are manually added to illustrate the same landmarks in the 3D point cloud and the 2D image.

Reche et al. [21] addresses the timber inventory problem by modeling trees from photographs, and Fritz et al. [7] uses photogrammetric point clouds collected over canopy to map tree stems. Other works estimate tree attributes using TLS [5, 11]. Relying on photography or TLS can be time consuming, and capturing data over dense forest canopy may not provide accurate tree diameter at breast height (DBH).

Our work leverages UAVs and lidars for measuring timber inventory, particularly tree DBH estimation, by focusing on solving the simultaneous localization and mapping (SLAM) problem. Previous methods, such as lidar odometry and mapping (LOAM) [28], rely on texture-based features that are brittle in forest environments. Our goal is to develop a Semantic LOAM (SLOAM) algorithm that extends ideas from semantic SLAM [1] for forestry applications. Our key idea is utilizing a parameterized landmark shape representation to obtain more robust solutions to the SLAM and DBH problems.

We present an end-to-end pipeline for solving the DBH estimation problem. While our approach is specific to our primary forestry problem, the framework and tools we develop can be generalized to other environments

. The pipeline consists of: (1) 3D point cloud labeling with virtual reality (VR); (2) range image segmentation with a fully convolutional neural network (FCN); (3) landmark instance detection with trellis graphs; (4) semantic lidar odometry; and (5) semantic lidar mapping. We first present the semantic lidar odometry and mapping modules, as they form the core ideas of the work. We then present the first 3 modules which relate to accurate detection of landmarks from individual lidar sensor readings. Finally, we present experimental results of our system in the field. The primary contributions of this paper are:

  • A complete end-to-end pipeline for solving the forest DBH estimation problem using UAV systems; and

  • SLOAM, a semantic framework that is capable of handling challenging environments with aggressive motions while simultaneously estimating landmark parameters.

Ii Problem Formulation

Primary Problem (Forest DBH Estimation). Given a forest consisting of trees , estimate the number of trees , and the position and diameter at breast height (DBH) of each tree .

The use of a mobile lidar sensor to solve the DBH estimation problem requires registering a sequence of lidar sweeps into a common reference frame. This registration typically involves estimating the trajectory of the lidar sensor over time, which leads to the following problem:

General Problem (Semantic LOAM). Given a sequence of lidar sweeps collected in an environment consisting of objects , estimate the sensor state trajectory , the number of objects , and the model parameters and classes of each object .

Under specific assumptions on the model parameterization and class of objects , a solution to the general Semantic LOAM problem will yield a solution to the primary DBH Estimation problem.

Iii Semantic Lidar Odometry and Mapping

The purpose of the lidar odometry algorithm is to estimate the rigid 6-DOF motion of the lidar within a lidar sweep. The purpose of the lidar mapping algorithm is to estimate the 6-DOF pose of the lidar in the world, and register the point cloud in the world frame. We follow the framework introduced in the original LOAM paper [28]. SLOAM incorporates semantic landmark features in an end-to-end system to increase robustness, scalability, and performance in challenging environments where LOAM fails, such as forests.

SLOAM relies on landmark models and features to estimate motion and register the lidar point clouds. The choice of which landmark features to use, and how to model the landmarks, depends on the environment. In forests, good landmark features are trees and ground modeled as cylinders and planes. We will assume that these features have been segmented and clustered into individual tree instances by an external segmentation and instance detection module described in Sec. IV.

For the previous lidar sweep that started with time stamp , let be the set of tree landmark models of size . For a given , let be the set of tree feature points of size associated with the landmark, and let be the aggregation of all tree features. Let be the set of ground feature points of size . Denote as the projections of , , to the lidar frame at time stamp . Let be the current time stamp, and be the lidar pose transform between , where are the translations along the , , and axes in the lidar frame , and , , and are the rotation angles following the right-hand rule. Let be similarly defined in the world frame , which defines the sensor state trajectory . Both the semantic odometry and mapping algorithms estimate the pose transforms by performing a data association between tree and ground features in the current sweep with tree and ground models in a previous sweep (odometry) or map (mapping).

Iii-a Model Parameterization and Distance Functions

We model the ground locally as a plane parameterized by , where is the normal of the plane, and is the offset such that the plane is defined by . Given a point and plane , let be a point on the plane. We can then define a point to plane distance:


Since the ground is relatively flat and lidar upright, ground features provide constraints in the axis, but not the and axes, and we need to also detect tree models and features.

We parameterize the tree cylinder models following the method in [13]. Let be the parameters of a cylinder model. To understand these parameters, let be the cylinder normal at the point on the cylinder closest to the origin, and be the axis of the cylinder. The first parameter is the distance from the origin to the closest point on the cylinder, is the angle between the projection of onto the plane with the axis, is the angle between and the axis, is the angle between and the partial derivative of with respect to , and is the radius of the cylinder. For an illustration of the cylinder model parameters and angles, please refer to [13, Fig. 3]. More specifically, given , we can compute the normal and axis as:


where is the partial derivative of w.r.t , and is derived from the partial derivative of w.r.t .

The point to cylinder distance between a point and cylinder is:


Iii-B Estimating Tree and Ground Models

Given the set of feature points of size corresponding to landmark , the geometric least squares approach to estimating the parameters of a cylinder is to stack the point to cylinder distances for each feature point and optimize over the parameters of . However, the point to cylinder distance can encounter singularities when the curvature decreases, which introduces difficulties in the numerical optimization procedure [13]. As a result, for a feature and cylinder , we approximate the point to cylinder distance with the following distance which has the same zero set and derivatives at the zero set as the true distance function, but additionally behaves well when the curvature decreases:


The geometric least squares optimization problem is then solved as follows:


It is easy to ensure that , by optimizing over . Thus given a set of tree features , solving Prob. (5) will yield a cylinder landmark model .

Our approach to estimate the local ground plane models is similar to previous works [24, 28]. However, the challenge with forest environments is that the ground is frequently covered by brush and small shrubs that are irregularly shaped. While we describe a ground segmentation method that tries to filter out this noise in Sec. IV, it is extremely challenging to completely filter out all of the underbrush in natural forest environments. This noise can be problematic for standard ground plane estimation methods used in methods such as [24] which only utilize points to estimate the plane.

Instead, we use the following robust ground estimation method. Given a set of ground feature points denoted as the matrix

, we use Singular Value Decomposition (SVD) to estimate a model of the ground plane. Let

be the centroid, and be the mean shifted ground feature points. We can then compute the SVD of where , and . The estimate of the normal

will be the left singular vector of

corresponding to the least singular value. Given and the centroid , we can compute by using the equation of the plane.

Iii-C Motion Estimation and Data Association

Given a set of tree features and ground features in the current sweep , as well as tree features , ground features , and tree landmark models from the previous sweep projected to , we can estimate the pose transform by optimizing the following non-linear least squares problem:


where and balance the frequency between tree and ground features. In this optimization problem, we use the true point to cylinder distance defined in Eqn. (3), as the singularity from is eliminated since is now a constant. Notice that each of the distance function () depends on associating each feature point () with an object model (), which is called data association. This data association process is different for the tree features and ground features.

For a tree feature , we need to find its landmark correspondence in . We first project the feature taken at time to in the frame at time using an initial guess of at time . There are two methods to perform the data association for tree features. In the first method, we match each feature to the model in with the smallest orthogonal distance to . In the second method, we find the nearest neighbor of in . Since each feature point in is associated with a cylinder model in , we then perform data association by matching to the cylinder associated with the nearest neighbor.

The first data association method is desirable, especially during the lidar mapping algorithm, because it only needs to maintain , whereas the second method needs to maintain both and . In a global map, the size of is independent of the number of sweeps seen, whereas the size of will grow over time. However, the second method can be more accurate since it preserves more detailed information through the features at the expense of having a map representation that increases with the number of sweeps. In this work, our main concern is to first obtain accuracy and robustness in forest environments, so we decide to employ the second method for data association. However, it is easy to switch to the first method when scaling to large environments.

Both data association methods are general to other environments beyond forests. The first method only requires a convex representation of the landmark models in order to efficiently compute a distance function, while the second method can be applied more generally as it directly compares features to features. As a result, our proposed data association methods extend beyond our primary problem of Forest DBH Estimation to the more general Semantic LOAM problem.

SLOAM handles the ground feature points differently than the tree feature points. For a given ground feature point at time , we project it back to at time stamp . We then find the set of nearest neighbors to in , and compute the ground plane corresponding to this set using the SVD. We thus compute a ground plane for each feature , rather than explicitly maintaining and updating the plane model parameters as we do for the tree cylinders. Our treatment of the ground plane is thus similar to [24].

Iii-D Semantic Lidar Odometry and Mapping

1:input : from
2:    tree and ground features from
3:    initial pose transform from last recursion
4:output : , newly computed
5:if at the beginning of a sweep  then
8:end if
9:for each tree instance  do
10:     Update tree models by solving (5)
11:end for
12:for each tree instance  do
13:     for each tree point do
14:         Find a tree model in as the
15:          correspondence, then compute a point to
16:          cylinder distance based on (4) and stack
17:          the equation to (6)
18:     end for
19:end for
20:for each ground point do
21:     Find a ground plane model as the
22:      correspondence, then compute a point to
23:      plane distance based on (1) and stack
24:      the equation to (6)
25:end for
26:Update by solving (6)
27:if at the end of a sweep  then
28:     Project to to form
29:     Project to to form
30:     return
32:     return
33:end if
Algorithm 1 Semantic Lidar Odometry

The semantic lidar odometry algorithm is presented in Alg. 1. Depending on implementation, multiple recursions can be executed with growing and as the sweep begins from and ends at , or it can be executed just once per sweep. It follows the overall framework of the original lidar odometry algorithm in [28], except with the incorporation of semantic models and features.

The semantic lidar mapping algorithm has analogous inputs and outputs to the lidar odometry algorithm. It takes as input , which is the output of the lidar odometry algorithm. It then follows the similar steps in Alg. 1 lines 1226 to estimate . Rather than comparing the new features to , the mapping algorithm compares them to its map parameters .

The main difference is initializing and updating these map parameters. At time , the set of map landmarks is empty . At each subsequent sweep , we combine the odometry output with . For each tree feature in the odometry output , we assign it to a tree cylinder in using the same data association process as in the lidar odometry algorithm. However, if a feature point is not close to any tree cylinders, we mark it as an unassigned point. After performing this process, we check the unassigned points and determine which belong to the same tree in the odometry output . If a large enough number of points belong to the same tree in , we add that tree to , taking care to project the tree cylinder model into the world frame.

The process of aggregating the tree and ground features is the same as in the original LOAM algorithm [28]. We can thus combine the semantic lidar odometry and mapping algorithms to estimate the global pose transform as well as accumulate both the feature points , and the tree landmark models . The explicit estimation of allows us to automatically retrieve the number of trees, along with an estimate of their radius, at the end of SLOAM. It thus demonstrates how a solution to SLOAM will yield a solution to our primary DBH Estimation problem.

Iv Robust Tree and Ground Detection

Sec. III assumes access to the tree features and ground features

, and its performance is highly dependent on the quality and reliability of these features. However, obtaining these features is challenging in itself, and we present a 3 part module that reliably extracts tree features, as well a specialized segmentation method to extract ground features. These 3 modules for tree feature extraction can be generalized to other semantic features as well, and as a result can be combined with the SLOAM framework as a full end-to-end system for general environments.

Iv-a Virtual Reality Point Cloud Labeling

Our data-driven deep learning segmentation method requires a large training data set in order to obtain good performance and generalization. However, there are few data sets for lidar point clouds in forests, and we need to collect and label our own data. Most current labeling tools use mouse and keyboard and are designed for 2D images, which are ill-suited for labeling 3D point clouds.

Virtual reality has frequently been used to visualize point clouds [2], and recently to label them [25]. However, most of these applications and tools have been developed in the graphics community, and are either not suitable nor easy to adapt to a robotics application. As a result, we designed a custom VR labeling tool that interfaces with ROS in order to obtain and label point clouds from standard mobile lidar platforms. The VR labeling tool is programmed using the Unity video game engine with the Oculus Rift.

In order to increase tree labeling efficiency, we design a labeling cylinder primitive, and other similar primitives can be used to label other objects. As shown in Fig. 2, the user labels points by placing the labeling primitive over the points. The user is able to easily rotate, translate and scale the point cloud using these handsets. After all tree points have been successfully labeled, the user can easily store the point cloud and cylinders into a database and proceed to the next point cloud without leaving the labeling environment, thus facilitating large acquisitions of labels.

Figure 2: Oculus Rift virtual reality point cloud labeling tool. The white points represent the point cloud, the red cylinder is the labeling primitive, and the gray hands represent the virtual hands of the labeling user.

Iv-B Deep Learning Tree Segmentation

Natural forests feature large levels of occlusion and noise, and it is challenging to reliably segment lidar tree points at large scale [27]. Most approaches use clustering-based methods [18], which are limited to relatively clean forests. As a result, our approach is to use a deep neural network for point cloud segmentation to extract the tree features .

Rather than directly operating on the point cloud and requiring the use of slower network architectures such as PointNet++ [20], our segmentation network takes as input a 2D range image of size where is the number of beams and is the number of azimuth readings of the sensor. Operating on the range image opens up the use faster FCN architectures. Prioritizing speed, we use a simplified version of ERFNet [22] with Non-bottleneck-1D structure, Downsampler and Deconvolution layers [22, Sec. IIIb].

The input is constructed by representing every 3D point as a pixel in the range image according to its position and the lidar sensor model. This representation captures spatial relationships through convolution operations and is more computationally efficient. One drawback is that far apart points with similar altitude and azimuth angles, but different depth ranges, will be neighbors in the range image, and as a result can be challenging to differentiate. To minimize this effect, during training and inference we sample the point cloud according to a radius range relative to the sensor. The segmentation network will output a predicted segmentation for each sampled image, and the final network prediction is given by the sum of all the sampled predictions. This output is a mask assigning semantic labels for every pixel which are projected back to the corresponding 3D points in the lidar point cloud.

Figure 3: Left: Trellis graph with 5 detected trees (lower beams at top, higher beams at bottom). Tree 13 exhibits a fork structure. Right: Detected tree points. Each of the 16 beams of our VLP-16 lidar passes over the trees, highlighting the lidar output structure and connection to trellis graphs.

Iv-C Ground Segmentation

We next perform ground segmentation to extract

. Rather than a neural network, we use simple heuristics due to their effectiveness and simplicity. We make the basic assumption that the ground is locally planar, but not necessarily globally planar. In a given scan, we first remove points that the neural network determines to be part of a tree. As a result, we are left with points that belong either to the ground, shrub, or leaves. The core idea is that the ground should appear below all of these other points. However, we cannot simply extract the lowest points in

from the lidar sweep, as this approach will fail if the terrain is sloped or features other complicated behavior. We instead divide the lidar sweep into a circular grid specified by the distance and angle of the space around the sensor. Within each grid cell, we retain the lowest points in the

direction, where the number of points to retain is a hyperparameter. By retaining fewer points, it is more likely that the points in

are actually ground points, but the total number of ground features will be smaller.

Iv-D Instance Detection

Once we segment the tree points in the lidar sweep, we next group them into individual tree landmarks . The lidar output has a natural structure as exhibited in the range image, specifically that points can be sorted by beam and arranged by the rotation direction of the lidar. This output can be viewed as a trellis graph [6], where each slice of the trellis represents a single lidar beam. We exploit this trellis structure to detect tree instances. The key observation motivating our approach is that in the presence of gravity, a higher beam detecting an object indicates that a lower beam will likely also detect that same object. As a result, in the trellis graph, an object will appear as a path that starts from earlier slices (lower beams) and ends at later slices (higher beams). By properly constructing a weighted trellis graph, we can identify tree instances by efficiently solving for shortest routes using the dynamic programming Viterbi Algorithm [6], or even more simply a greedy algorithm. Since this approach only assumes gravity, it generalizes to other objects besides trees.

The vertices of the trellis graph are clusters of points belonging to the same object in a beam, and the edges are the connections between these vertices across beams. We assume that points on the same object have similar depth readings, and construct vertices in the trellis graph by clustering contiguous points within a single beam according to a depth threshold. We create the edges by connecting the vertices between previous and succeeding slices. To reduce computation, we then remove edges where the distance between the centroid of the points in each vertex exceeds a distance threshold, as those vertices are unlikely to belong to the same tree. We specify the weights of each edge by constructing a score function, such as the distance between the vertex centroids.

Once we have specified the trellis graph, we greedily find the tree instances by starting from vertices in the early slices and finding shortest routes through the trellis. We add the route as a tree instance if it exceeds a minimum path length threshold indicating that the tree was detected by enough lidar beams, and if the total path cost is less than a path weight threshold, indicating that the centroids of each vertex in the tree are close together. Although these shortest routes can be solved optimally with the Viterbi algorithm, we found that a greedy approach of following the minimum-weight outgoing edges at each vertex worked for tree instances.

There are auxiliary benefits of using this trellis approach. First, it immediately provides an initial guess for the cylinder parameters . Since the geometric least squares Prob. (5) is non-linear, it is extremely important to obtain a good intialization. We define the focus point of a vertex as the mean of the two points that are furthest from each other in the vertex. The focus point is an estimate of the center of the circle defining the shape of an individual vertex. The radius of this circle can be used to initialize the radius of the tree, and an Orthogonal Distance Regression (ODR) through the focus points of a tree can be used to estimate the axis and normal of the cylinder model of the tree. The second benefit of this trellis approach is that it can identify other properties of trees, for example forks in the tree where a single trunk branches into two parts. Fig. 3 displays two trees detected in an organized lidar scan. It also depicts a portion of the corresponding trellis graph with trees, where tree exhibits a fork structure.

V Experiments

We use a Velodyne Puck VLP-16 lidar in our experiments. The frequency of each lidar sweep is 5Hz. For our method, we run semantic lidar odometry (Alg. 1) per sweep, and semantic lidar mapping once per sweep. The lidar is equipped onto a UAV platform that is manually flown, as well as a handheld sensor suite. The experiments were performed in Wharton State Forest in New Jersey.

Layer Type Filter
1 Downsampler block
2-3 Non-bt-1D (no dilation)
4 Downsampler block
5-6 Non-bt-1D (no dilation)
7 Deconvolution
8-11 Non-bt-1D (no dilation)
12 Deconvolution Num. Classes
Table I: ERFNet inspired architecture used for semantic segmentation.
Figure 4: Trajectories of benchmark methods in hard UAV experiment loop.
Method Distance from the goal (m) Error
T265 (VIO)
Table II: Error: Distance relative to trajectory length in hard UAV experiment loop.

We benchmark our method (SLOAM) against A-LOAM, Generalized ICP (GICP), and the Intel Realsense T265. A-LOAM is an open source implementation of LOAM 

[28]. GICP [23] is an open source implementation of the Iterative Closest Point (ICP) algorithm available through PointCloud Library (PCL). In order to increase the speed of ICP, we apply a voxel grid filter to reduce the number of points. The Intel Realsense T265 is a commercial off-the-shelf tracking camera that relies on visual-inertial odometry (VIO).

We evaluate on 2 experiments, which we classify as medium and hard. The medium scenario involves walking the handheld sensor suite through a dense forest environment for 1 minute in a straight line. The sensor readings entail rotation motions and instabilities caused by the operator dodging vines and thorns. The hard scenario is the UAV flying for 2 minutes that features significant rotation motions. It starts from hover and flies in a 65m trajectory until it loops back to the initial position and lands on a landing platform. Since the start and end marks are different in the

axis, we offset the goal coordinate by 1 meter instead of using the origin.

The segmentation network is trained on scans from which are from the handheld dataset and the remaining are from other regions of the same forest. No data from the UAV flight was used for training. The network architecture was chosen emphasizing inference speed. Our network architecture shown in Table I runs at 100Hz inference speeds with an Intel NUC5i7RYH on-board our UAV. The final model achieves average IoU score on a 10-fold cross validation.

We consider a variety of qualitative and quantitative metrics. Qualitatively we evaluate the trajectory and the accumulated point cloud to observe if any ghosting or duplication of trees occur. Quantitatively, we evaluate the error between start and end in the UAV experiment which performs a loop. Since SLOAM explicitly estimates the radius of each tree, we also quantitatively evaluate the DBH estimation compared to field measurements using a tape measure.

(a) A-LOAM: Line and plane feature points are green (bottom).
(b) SLOAM (Our Method): Explicitly detects and models semantic landmarks.
(c) GICP: No distinction made between points.
Figure 5: SLOAM is the only method succeeds in handling aggressive motions (yaw). Top Row (UAV dataset): Colored by z-axis. A-LOAM and GICP maps are blurry, indicating failure in mapping. Bottom Row (Handheld dataset): Colors illustrate the different ways each method treats the points.

Fig. 4 displays each method’s trajectory on the UAV dataset, and Table II quantifies the drift error between start and end. SLOAM achieves the lowest drift. A-LOAM clearly drifts, while the T265 outright fails. The VIO failure is expected, as it cannot handle extreme rotations. According to these trajectory metrics and plots, GICP seems to track closely with SLOAM, and achieves a similar magnitude of drift.

However, Fig. 5 demonstrates that GICP also has difficulty with these datasets. The top row demonstrates the differences in the point cloud maps for each of the lidar based methods. Both A-LOAM and GICP produce blurry maps with frequent ghosting of trees. This blurring is unacceptable when we need to estimate the diameter of the trees with high accuracy. SLOAM, on the other hand, produces a crisp map that preserves fine details in the tree shapes. The fact that GICP performs comparably to SLOAM in the trajectory metrics, but much worse when viewing the point cloud maps, indicates that it has difficulty with rotation motions such as yaw, since the trajectory only measures ,, and positions.

SLOAM outperforms A-LOAM and GICP because our semantic features are more reliable than texture-based lines and planes. Specifically, for both ground and tree features, data association is more robust since it inherently filters out noise, and the resulting cost function is more informative due to the use of landmark shapes. While these texture features are reliable in man-made environments, they are problematic in natural environments which lack clear planar and edge surfaces. The bottom row of Fig. 5 illustrates the different ways each method treats the points. SLOAM detects each semantic landmark, indicated by the different colorings of each tree instance. On the other hand, the A-LOAM features appear random, indicating that they are not distinctive and are thus prone to frequent data misassociation. Finally, GICP does not make a distinction between the points. This approach works well when the motion is slight, but it is also susceptible to data misassociation during extreme rotations.

Compared to SLOAM which uses a point to cylinder cost function, both A-LOAM and GICP only utilize a point to plane or point to line cost functions to compute the pose transformation. These approaches force a false planar model onto the cylinders, and will introduce slight errors that manifest as wider, blurry trees. While these small errors will not lead to an outright failure in the sensor state estimates, they are still unacceptable due to the high precision and accuracy necessary to measure tree diameters.

Detected Trees Mean Median Max Min
Table III: DBH Metrics in hard UAV experiment

We next evaluate how well SLOAM can estimate the DBH of the tree landmarks. We obtain these estimate for free, as we can use the semantic models and features to extract out the diameter for each landmark. For the hard UAV dataset described in the previous section, we manually measured trees that were in the path of the robot and use the models generated by our method to estimate DBH. We summarize the DBH estimation results compared to human measurements in Table III. SLOAM detected trees with an average error is in, which falls within the desired accuracy as typically in industry the measurements are taken to the nearest inch.

We found that using the diameter parameter of our cylinder models to estimate the DBH had a few large outliers. Instead, we found that it was more effective to take the median of all the radii estimates across all beams in all scans, which yielded the results presented in Table 


. This process still requires accurate registration from SLOAM to group these beams together. We believe that since only half of the cylinder can be viewed from the lidar at a single scan, the presence of noise or insufficient features can cause instability in the geometric least squares optimization process. It does not seem to affect the pose estimation optimization process, as the presence of many tree landmarks offers robustness to noise.

Beyond the median operation, no additional post-processing steps are required to obtain the DBH results.

The ability to quantify mapping performance through landmark ground truth measurements highlights another strength of our approach. It is difficult to quantify the performance of traditional SLAM algorithms, as ground truth measurements of trajectories are hard to obtain in natural environments. For example, the dense forest canopy prevents the use of GPS as ground truth, as the errors can range up to tens of meters. On the other hand, ground truth measurements of the landmark shapes are easily obtained, and provide an alternative quantitative metric to benchmark various algorithms.

Vi Conclusions

We pose the DBH estimation problem as a special case of the semantic lidar odometry and mapping problem, providing a generic formulation that can be extended to other scenarios. We develop a VR labeling tool to facilitate annotation of 3D lidar scans by fitting geometric primitives to the raw data. This enables us to train a segmentation model that is used with a graph based method for instance detection and extraction of relevant attributes of each tree such as radius and focus point. We demonstrate the difficulty of the DBH estimation problem by benchmarking against other lidar state estimation and VIO methods and observing large drifts. Finally, we show that the semantic shape models are critical in achieving accuracy and scalability in challenging natural environments.


  • [1] S. L. Bowman, N. Atanasov, K. Daniilidis, and G. J. Pappas (2017) Probabilistic data association for semantic slam. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 1722–1729. Cited by: §I.
  • [2] G. Bruder, F. Steinicke, and A. Nüchter (2014) Poster: immersive point cloud virtual environments. In 2014 IEEE Symposium on 3D User Interfaces (3DUI), pp. 161–162. Cited by: §IV-A.
  • [3] S. W. Chen, S. S. Shivakumar, S. Dcunha, J. Das, E. Okon, C. Qu, C. J. Taylor, and V. Kumar (2017) Counting Apples and Oranges with Deep Learning: A Data-Driven Approach. IEEE Robotics and Automation Letters 2 (2), pp. 781–788. External Links: Document, ISBN 2015670212385, ISSN 23773766 Cited by: §I.
  • [4] J. Q. Cui, S. Lai, X. Dong, and B. M. Chen (2016) Autonomous Navigation of UAV in Foliage Environment. Journal of Intelligent and Robotic Systems: Theory and Applications 84 (1-4), pp. 259–276. External Links: Document, ISSN 15730409 Cited by: §I.
  • [5] T. de Conto, K. Olofsson, E. B. Görgens, L. C. E. Rodriguez, and G. Almeida (2017) Performance of stem denoising and stem modelling algorithms on single tree point clouds from terrestrial laser scanning. Computers and Electronics in Agriculture 143 (November), pp. 165–176. External Links: Document, ISSN 01681699 Cited by: §I, §I.
  • [6] G. D. Forney (1973) The viterbi algorithm. Proceedings of the IEEE 61 (3), pp. 268–278. Cited by: §IV-D.
  • [7] A. Fritz, T. Kattenborn, and B. Koch (2013) UAV-Based photogrammetric point clouds – Tree stem mapping in open stands in comparison to terrestrial laser scanner point clouds. ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XL-1/W2 (September), pp. 141–146. External Links: Document Cited by: §I.
  • [8] F. Gao, W. Wu, W. Gao, and S. Shen (2019) Flying on point clouds: Online trajectory generation and autonomous navigation for quadrotors in cluttered environments. Journal of Field Robotics 36 (4), pp. 710–733. External Links: Document, ISSN 15564967 Cited by: §I.
  • [9] A. Giusti, J. Guzzi, D. C. Ciresan, F. L. He, J. P. Rodriguez, F. Fontana, M. Faessler, C. Forster, J. Schmidhuber, G. D. Caro, D. Scaramuzza, and L. M. Gambardella (2016)

    A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots

    IEEE Robotics and Automation Letters 1 (2), pp. 661–667. External Links: Document, ISSN 23773766 Cited by: §I.
  • [10] S. Lee, D. Har, and D. Kum (2017) Drone-Assisted Disaster Management: Finding Victims via Infrared Camera and Lidar Sensor Fusion. Proceedings - Asia-Pacific World Congress on Computer Science and Engineering 2016 and Asia-Pacific World Congress on Engineering 2016, APWC on CSE/APWCE 2016 (C), pp. 84–89. External Links: Document, ISBN 9781509057535 Cited by: §I.
  • [11] X. Liang, V. Kankare, J. Hyyppä, Y. Wang, A. Kukko, H. Haggrén, X. Yu, H. Kaartinen, A. Jaakkola, F. Guan, et al. (2016) Terrestrial laser scanning in forest inventories. ISPRS Journal of Photogrammetry and Remote Sensing 115, pp. 63–77. Cited by: §I, §I.
  • [12] X. Liu, S. W. Chen, C. Liu, S. S. Shivakumar, J. Das, C. J. Taylor, J. Underwood, and V. Kumar (2019) Monocular Camera Based Fruit Counting and Mapping with Semantic Data Association. IEEE Robotics and Automation Letters 4 (3), pp. 2296–2303. External Links: Document, ISBN 2015670212385, ISSN 23773766 Cited by: §I.
  • [13] G. Lukács, A. Marshall, and R. Martin (1997) Geometric least-squares fitting of spheres, cylinders, cones and tori. RECCAD, Deliverable Document 2 and 3, COPERNICUS project (1068). Cited by: §III-A, §III-B.
  • [14] B. G. Maciel Pearson, S. Akcay, A. Atapour-Abarghouei, C. Holder, and T. Breckon (2019) Multi-Task Regression-based Learning for Autonomous Unmanned Aerial Vehicle Flight Control within Unstructured Outdoor Environments. IEEE Robotics and Automation Letters 4 (4), pp. 1–1. External Links: Document Cited by: §I.
  • [15] B. G. Maciel-Pearson, P. Carbonneau, and T. P. Breckon (2018) Extending deep neural network trail navigation for unmanned aerial vehicle operation within the forest canopy. In Annual Conference Towards Autonomous Robotic Systems, pp. 147–158. Cited by: §I.
  • [16] T. Nguyen, S. S. Shivakumar, I. D. Miller, J. Keller, E. S. Lee, A. Zhou, T. Özaslan, G. Loianno, J. H. Harwood, J. Wozencraft, C. J. Taylor, and V. Kumar (2019-10) MAVNet: an effective semantic segmentation micro-network for mav-based tasks. IEEE Robotics and Automation Letters 4 (4), pp. 3908–3915. External Links: Document, ISSN 2377-3766 Cited by: §I.
  • [17] F. J. Perez-Grau, R. Ragel, F. Caballero, A. Viguria, and A. Ollero (2018) An architecture for robust uav navigation in gps-denied areas. Journal of Field Robotics 35 (1), pp. 121–145. Cited by: §I.
  • [18] M. Pierzchała, P. Giguère, and R. Astrup (2018) Mapping forests using an unmanned ground vehicle with 3d lidar and graph-slam. Computers and Electronics in Agriculture 145, pp. 217–225. Cited by: §IV-B.
  • [19] S. Puliti, H. O. Ørka, T. Gobakken, and E. Næsset (2015) Inventory of small forest areas using an unmanned aerial system. Remote Sensing 7 (8), pp. 9632–9654. External Links: Document, ISSN 20724292 Cited by: §I.
  • [20] C. R. Qi, L. Yi, H. Su, and L. J. Guibas (2017) Pointnet++: deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems, pp. 5099–5108. Cited by: §IV-B.
  • [21] A. Reche-Martinez, I. Martin, and G. Drettakis (2004) Volumetric reconstruction and interactive rendering of trees from photographs. In ACM transactions on graphics (ToG), Vol. 23, pp. 720–727. Cited by: §I.
  • [22] E. Romera, J. M. Alvarez, L. M. Bergasa, and R. Arroyo (2017) Erfnet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Transactions on Intelligent Transportation Systems 19 (1), pp. 263–272. Cited by: §IV-B.
  • [23] A. Segal, D. Haehnel, and S. Thrun (2009) Generalized-icp.. In Robotics: science and systems, Vol. 2, pp. 435. Cited by: §V.
  • [24] T. Shan and B. Englot (2018) LeGO-loam: lightweight and ground-optimized lidar odometry and mapping on variable terrain. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4758–4765. Cited by: §III-B, §III-C.
  • [25] J. D. Stets, Y. Sun, W. Corning, and S. W. Greenwald (2017) Visualization and labeling of point clouds in virtual reality. In SIGGRAPH Asia 2017 Posters, pp. 31. Cited by: §IV-A.
  • [26] C. Torresan, A. Berton, F. Carotenuto, S. F. Di Gennaro, B. Gioli, A. Matese, F. Miglietta, C. Vagnoli, A. Zaldei, and L. Wallace (2017) Forestry applications of UAVs in Europe: a review. International Journal of Remote Sensing 38 (8-10), pp. 2427–2447. External Links: Document, ISSN 13665901 Cited by: §I.
  • [27] J. Tremblay, M. Béland, F. Pomerleau, R. Gagnon, and P. Giguère (2019) Automatic 3d mapping for tree diameter measurements in inventory operations. arXiv preprint arXiv:1904.05281. Cited by: §IV-B.
  • [28] J. Zhang and S. Singh (2014) LOAM: lidar odometry and mapping in real-time.. In Robotics: Science and Systems, Vol. 2, pp. 9. Cited by: §I, §III-B, §III-D, §III-D, §III, §V.