Log In Sign Up

ARES: Accurate, Autonomous, Near Real-time 3D Reconstruction using Drones

by   Fawad Ahmad, et al.

Drones will revolutionize 3D modeling. A 3D model represents an accurate reconstruction of an object or structure. This paper explores the design and implementation of ARES, which provides near real-time, accurate reconstruction of 3D models using a drone-mounted LiDAR; such a capability can be useful to document construction or check aircraft integrity between flights. Accurate reconstruction requires high drone positioning accuracy, and, because GPS can be in accurate, ARES uses SLAM. However, in doing so it must deal with several competing constraints: drone battery and compute resources, SLAM error accumulation, and LiDAR resolution. ARES uses careful trajectory design to find a sweet spot in this constraint space, a fast reconnaissance flight to narrow the search area for structures, and offloads expensive computations to the cloud by streaming compressed LiDAR data over LTE. ARES reconstructs large structures to within 10s of cms and incurs less than 100 ms compute latency.


page 2

page 5

page 16

page 18


P3-LOAM: PPP/LiDAR Loosely Coupled SLAM with Accurate Covariance Estimation and Robust RAIM in Urban Canyon Environment

Light Detection and Ranging (LiDAR) based Simultaneous Localization and ...

ART-SLAM: Accurate Real-Time 6DoF LiDAR SLAM

Real-time six degree-of-freedom pose estimation with ground vehicles rep...

Relative Drone-Ground Vehicle Localization using LiDAR and Fisheye Cameras through Direct and Indirect Observations

Estimating the pose of an unmanned aerial vehicle (UAV) or drone is a ch...

Real-time Multi-Adaptive-Resolution-Surfel 6D LiDAR Odometry using Continuous-time Trajectory Optimization

Simultaneous Localization and Mapping (SLAM) is an essential capability ...

Large-scale Autonomous Flight with Real-time Semantic SLAM under Dense Forest Canopy

In this letter, we propose an integrated autonomous flight and semantic ...

Online Robust Sliding-Windowed LiDAR SLAM in Natural Environments

Despite the growing interest for autonomous environmental monitoring, ef...

Deep Learning on Home Drone: Searching for the Optimal Architecture

We suggest the first system that runs real-time semantic segmentation vi...

1 Introduction

Drone-based 3D reconstruction. The last few years have seen impressive advances in the commoditization of drones. Today, drones can be equipped with on board compute, a cellular (LTE) radio and sophisticated sensors (e.g.,

cameras, stereo cameras and LiDAR). Given this, in the coming years, drones will likely revolutionize aerial photography, mapping and three-dimensional (3D) reconstruction. Recent estimates put the total market for these drone-based services at 63 billion US dollars by 2025 

[drone-market-1], with 3D reconstruction, the task of generating a digital 3D model of the environment, accounting for nearly a third of the drone services market [drone-market-2].

What is a 3D model? The term 3D model covers a wide range of geometric representations of the surfaces of objects, from coarse-grained approximations (cylinders, cubes, intersection of planes), to more fine-grain representations such as meshes (small-scale surface tesselations that capture structural variations). In this paper, we seek to extract a fine-grain point-cloud of a large structure (e.g., a building, blimp or airplane) which consists of dense points on the surface of the structure. Each point has an associated 3D position, together with other attributes (depending on the sensor used to generate the point-cloud). The point-cloud based 3D model can generate all other representations.

Applications. 3D models are used in animations in films and video games, for preserving historical and tourist sites, for archaeology, city planning, and to capture buildings or rooftops for repair and solar installation etc., and as inputs to immersive virtual reality (VR) applications. Most prior research work has explored such offline reconstruction [uav_lidar_1, uav_lidar_2, 7139681, 7989530, 8124461, 8628990] and report centimeter to sub-meter accuracy.

Towards real-time 3D reconstruction. The nascent drone-based reconstruction industry is starting to use the time-to-reconstruction as an important market discriminator. Startups promise to deliver models in hours or within a day [company_drone_deploy] after drone data collection. This trend has inspired newer, more time-sensitive applications of 3D modeling like (a) post-disaster reconstruction for search and rescue missions [drone_disaster_relief_1, drone_disaster_relief_2, drone_disaster_relief_3, drone_disaster_relief_4, drone_disaster_relief_5], (b) construction site monitoring [company_drone_deploy], (c) mapping mines, cell-towers and tunnels [hovermap, Prometheus], (d) documenting interior office spaces [company_kaarta], and (e) perhaps, most compelling, between-flight modeling of aircraft fuselages or blimps to determine structural integrity [company_austrian_airline].

Though startups are starting to promise smaller reconstruction times [hovermap, Prometheus], they reconstruct 3D models offline i.e., after the drone has landed. Only one company we know of so far promises real-time reconstruction [company_kaarta], employing a human-operator holding a hand-held kit with a stereo camera (instead of a drone) to reconstruct interior spaces.

The Goal. Given this, we want to accurately reconstruct large 3D structures in near real-time, with no human-intervention.

Photogrammetry based reconstruction. Most existing work (§4) uses photogrammetry, which takes as input a sequence of 2D camera images (and the positions associated with these images), captured either using drone-mounted cameras [7139681, 7989530, 8124461, 8628990] or by humans using ordinary mobile devices [company_hover]. Photogrammetry infers 3D models using a technique known as multi-view stereo [furukawa15:_multi_view_stereo], which combines information from successive images, together with information about local motion, and inferred camera parameters such as the focal length. Photogrammetry approaches estimate depth from 2D images so their reconstruction times are relatively large i.e., from several hours to days [company_drone_deploy] for large structures.

LiDAR-based reconstruction. Unlike photogrammetry, LiDAR-based reconstruction can more directly infer 3-D models, because LiDARs directly provide depth information (unlike cameras). A LiDAR sensor measures distances to surfaces using lasers mounted on a mechanical rotating platform. The lasers and mechanical rotator sit within the enclosure of the LiDAR sensor. With each revolution of the lasers, the sensor returns a point cloud, or a LiDAR frame. The point cloud is a set of 3D data points, each corresponding to a distance measurement of a particular position of the surrounding environment from the LiDAR. For instance, an Ouster OS1-64 LiDAR has 64 lasers that scan at 20 Hz, with a horizontal and vertical field of view of 360 and 45 respectively.

To reconstruct a structure e.g., a building, it suffices, in theory, to merge points clouds captured from different locations around the building. To understand what it means to merge point clouds, consider two successive clouds and . A point on the surface of the building may appear both in and . However, since the drone has moved, this point appears at different positions (relative to the LiDAR) in the two point clouds. If we precisely transform both point clouds to the same coordinate frame of reference, then the union of points in and constitutes part of the 3D model of the building.

Accurate positioning ensures high-quality models. This requires the precise position of the LiDAR at the positions where it captured the point clouds. The accuracy of positioning determines model quality. Two metrics define quality. Accuracy, the average distance between the position of a point in the model and the corresponding point on the surface (see §2 for a more precise definition), clearly relies on accurate positioning. Completeness, the degree to which a model captures the entire structure, also relies on accurate positioning, since positioning errors can lead to gaps in the captured model.

Figure 1: GPS-derived (left), SLAM-derived (middle), and dart-derived (right) models of a large complex on our campus.

Strawman: Using GPS for positioning. Because drones can carry GPS receivers, GPS can be used to position point clouds. Unfortunately, GPS errors can be several meters in obstructed settings, resulting in poor accuracy and completeness of the 3D model. The left image of Fig. 1 shows a 3D model of a building assembled using a drone flight over a commercial building; the building’s outline is fuzzy, as are the contours of the trees surrounding the building. The right image shows a 3D reconstruction using techniques proposed in this paper, which does not use GPS. (All reconstructions in Fig. 1 use real data captured from a drone-mounted LiDAR §3).

High-precision GNSS/RTK receivers can provide more accurate positioning but require additional infrastructure, are costly, and can perform poorly in urban environments due to non line-of-sight signals that degrade accuracy (e.g., prior work [Carloc] reports that RTK-enabled receivers can exhibit tens of meters error in downtown environments). Prior work [uav_lidar_1, uav_lidar_2] has used expensive GPS receivers in the context of remote sensing, using specialized unmanned aerial vehicles (UAVs) with long-range LiDAR sensors (but for offline model reconstruction). In contrast, in this paper we consider solutions that employ off-the-shelf technologies: drones, and commodity GPS and LiDAR.

An alternative: Using SLAM for positioning. LiDAR SLAM (Simultaneous Localization And Mapping [slam_part1, slam_part2]) algorithms can provide accurate pose (position and orientation) estimates, which can be used to position point clouds. These algorithms use scan or feature matching techniques to align consecutive LiDAR frames to determine the pose of the drone throughout its flight. For example, scan matching uses techniques [icp] to find the closest match for each point in the source point cloud to a point in the reference point cloud . It then estimates the transformation (translation and rotation) that best aligns each point in to its corresponding point in . By repeating this process across consecutive LiDAR frames, SLAM can position each frame in the first frame’s coordinate frame of reference.

Challenges. However, using SLAM on drone-mounted LiDAR is challenging for the following reasons (Table 1).

Reconstruction quality is critically dependent on the design of the drone trajectory. Fig. 1 (middle) shows reconstruction using SLAM from a poorly planned drone flight. This reconstruction is visually worse than GPS-based reconstruction (Fig. 1 (left)), because the underlying SLAM algorithm is unable to track LiDAR frames (i.e., it is unable to match points in successive point clouds).

SLAM algorithms accumulate drift [slam_part1, slam_part2], so position estimates can degrade on longer flights.

Drones have limited compute because they can carry limited payloads. For instance, a DJI M600Pro (which we use in this paper) hexacopter has a maximum payload weight of 5 kg. It carries an A3Pro flight controller that contains three IMUs and three GNSS receivers. We have also mounted an LTE radio and an Ouster OS1-64 LiDAR, as well as a Jetson TX2 board. This compute capability is far from sufficient to run LiDAR SLAM111With a 64-beam LiDAR, SLAM processes upto 480 Mbps of 3D data. On the TX2, plane-fitting [ransac], a primitive used in our reconstruction pipeline, takes 0.5 seconds per LiDAR frame (LiDARs generate 20 frames per second) (Fig. A.1) and plane-fitting accounts for 5% of the execution time of our entire pipeline (§3).

Drones have limited flight endurance. When fully loaded and starting from a full charge, the M600Pro can fly for approximately 25 minutes. This necessitates careful trajectory planning to minimize flight duration.

Challenge Mechanism
Model accuracy Model collection trajectory planning
Limited duration Cheap reconnaissance flight
Limited compute Offload compressed point clouds
Drift accumulation Flight re-calibration
Table 1: Challenges and contributions

Contributions. This paper presents the design and implementation of a system called dart, which makes the following contributions to address these challenges (Table 1).

dart’s first contribution is the design of model collection trajectories that navigate the competing constraints i.e., accuracy and battery life (§2.1). Model collection uses SLAM, and SLAM error is sensitive to how the LiDAR is mounted, how fast and at what height the drone flies. Faster, higher flights use less of the drone’s battery, but can incur high SLAM error. The converse is true of slower and lower flights. dart finds a sweet spot in this trade-off space to balance its accuracy goals with drone battery usage. Even so, SLAM can incur drift on longer flights. dart needs to detect excessive drift, and correct for it in real-time. It uses a novel algorithm that tracks consistency between GPS traces and SLAM positions to detect excessive drift, then incorporates a re-calibration step to correct for it while the drone is in-flight.

Even with dart’s efficient model collection trajectories, scans over large areas can exhaust a drone’s battery. To this end, dart’s second contribution is a robust and efficient geometry extraction algorithm (§2.2) that helps focus model collection only on the structure222In this paper, we focus on extracting the boundary of a building and leave generalization to future work to reconstruct. This algorithm, which runs during a reconnaissance flight before model collection, works well even with a fast, high flight that minimizes drone flight duration (and hence battery usage). During the flight, this algorithm extracts the building geometry without constructing the 3D model; it relies on detecting planar surfaces by using the consistency of surface normals across points on a plane, then estimates building height and boundary from the plane forming the rooftop. dart uses this boundary to plan the model collection trajectories described above.

dart’s third contribution is the design of a processing pipeline that (a) offloads computation to a cloud server by compressing point clouds on the drone to the point where they can be transmitted over LTE (§2.3), and (b) that leverages GPU acceleration to process these point clouds in near real time at the LiDAR frame rate and with minimal end to end processing delay (§3).

Experiments (§3) on real-world drone flights and a photo-realistic drone simulator (AirSim [airsim]) , demonstrate that dart can achieve 21-30 cm reconstruction accuracy even after compressing the raw sensor data by almost two orders of magnitude. Not only is dart faster than an offline approach, it is also more accurate, since the latter cannot re-calibrate mid-flight. An experiment with a complete dart implementation is able to stream compressed point clouds over LTE, reconstruct the 3D model on-the-fly and deliver an accurate 3D model about 100 ms after flight completion. dart’s choice of trajectories drains the battery least while achieving the accuracy target, and its pipeline can process frames at 20 fps while incurring a processing latency of less than 100 ms. To our knowledge, dart is the first to demonstrate on-line, cloud-based, autonomous, sub-meter, 3D model reconstruction in near real-time.

2 DART Design

Figure 2: dart architecture

Because drone-based 3-D reconstruction is a complex multi-dimensional problem (Table 1), we have focused on a geometrically regular, but important, subset of structures for reconstruction: buildings. As this section will make clear, even this choice poses significant challenges. It also brings out the computation and communication issues in 3-D reconstruction that are the main focus of this paper. In §2.1, we discuss what it would take to generalize to other, more complex, structures.

Overview. To use dart, a user specifies: a) an area of interest, and b) a minimum target point density. Point density is the number of points per unit area on the surface of a point cloud; this knob controls the quality of the 3D model. In the area of interest, dart guides a drone to automatically discover buildings, and constructs a 3D model of the buildings in near real-time (i.e., during the drone flight) while minimizing flight duration at that given minimum point density. (To a first approximation, drone battery usage increases with flight duration; we have left it to future work to incorporate drone battery models.) dart splits its functionality across two components: (a) a lightweight subsystem that runs on the drone, and (b) a cloud-based component that discovers buildings, generates drone trajectories, and reconstructs the 3D models on-the-fly.

dart’s cloud component (Fig. 2) generates an efficient reconnaissance trajectory over the area of interest to discover the rooftop geometry of buildings. Extracting the geometry from LiDAR data can be computationally intensive (§1), so dart streams compressed point clouds to a cloud service during flight over a cellular (LTE) connection. The cloud service extracts the geometry, then prepares a more careful model collection trajectory that designs a minimal duration flight while ensuring high 3D model accuracy. During this second flight, the drone also streams compressed point clouds, and the cloud service runs SLAM to estimate point cloud poses, and composes the received point clouds into the building’s 3D model. Mid-flight, the cloud service may re-calibrate the trajectory dynamically to minimize drift accumulation.

Below, we first describe model collection (§2.1), since that is the most challenging of dart’s components. We then describe how dart extracts the rooftop geometry (§2.2), then conclude by describing point-cloud compression (§2.3).

2.1 Model Collection

Given the building geometry (§2.2), dart designs a model collection trajectory to capture the 3D model (§1) of the building.

What constitutes a good 3D model? Prior work on 3D reconstruction [reconstructionmetrics] has proposed two metrics, accuracy and completeness. Consider a 3D model and a corresponding ground-truth . Accuracy is the Root Mean Square Error (RMSE) of the distance from each point in to the nearest point in . Completeness is the RMSE of the distance from each point in to the nearest point in . If both values are zero, perfectly matches . If captures all points in , but the positions of the points are erroneous, then both accuracy and completeness will be non-zero. If captures only one point in , but positions it correctly, its accuracy is perfect, but completeness is poor.

Figure 3: Parallel and perpendicular LiDAR orientation
Figure 4: Impact of orientation on SLAM positioning error.
Figure 5: Equi-dense trajectory scan width

Trajectories, SLAM and 3D reconstruction error. As compared to an autonomous vehicle, a drone-mounted LiDAR can only perceive () of the 3D point cloud. This makes scan matching more difficult for SLAM. Thus, the trajectory of the drone flight can impact 3D model completeness and accuracy, in part because a poorly designed trajectory can increase SLAM error. In designing the drone’s trajectory, dart can control the following parameters: the actual path of the drone over the building, its speed, its height, and the orientation of the LiDAR with respect to the ground. We now discuss the qualitative impact of these parameter choices; later (§3), we empirically quantify the best parameter choices.

Orientation impacts accuracy. At a fixed height and speed, a parallel orientation of the LiDAR (Fig. 5) in which its scan plane aligns with the drone’s direction of motion, results in higher overlap between two successive point clouds than with a perpendicular trajectory, therefore, lower SLAM error and better accuracy. Fig. 5, obtained using the methodology described in §3, quantifies this intuition: different orientations have different degrees of overlap, and as overlap decreases, SLAM’s positioning error increases. A parallel orientation (0) has the lowest SLAM error because it has the highest visibility lifespan. (Visibility lifespan, the time for which a point on the building’s surface is visible during flight, is a proxy for overlap; a longer lifespan indicates greater overlap).

Speed impacts model accuracy. If the drone flies fast, two successive point clouds will have fewer overlapping points, resulting in errors in the SLAM’s pose transformations and (therefore) pose estimates (for a reason similar to Fig. 5) which leads to poor 3D model accuracy. So, dart must fly as slow as possible.

Height impacts both accuracy and completeness. Because LiDAR beams are radial, the higher a drone flies, the less dense the points on the surface of the building. Lower density results in worse completeness. Accuracy is also worse, because the likelihood of matching the same point on the surface between two scans decreases with point density. For instance, the positioning errors for point densities of 2.2 points per m and 3.0 points per m are 2.5 m and 1.0 m respectively (graph omitted for brevity). So, dart must fly as low as possible.

The drone’s path must ensure coverage of the buildings rooftop and sides. Consider a narrow and wide building. The drone must fly several times over the building to capture all its surfaces. If it flies low, slowly, and at a parallel orientation, the flight duration can be significant. Over long durations, SLAM accumulates drift, which can worsen model accuracy and completeness.

dart designs equi-dense trajectories to control model completeness, and uses offline data-driven parameter estimation to find the choice of speed, height and orientation. To minimize drift accumulation, dart performs online drift estimation and re-calibration. We describe these below.

Equi-dense Trajectories. An equi-dense trajectory ensures that the resulting model is (a) complete, and (b) captures the building with a point density that is no less than the specified minimum target point density .

Point density depends on LiDAR parameters and height. The height (more generally, distance for vertical surfaces like the sides of the building) at which a LIDAR flies from a surface governs the average density of points it obtains from that surface; larger heights result in lower point density.

For a given LiDAR configuration, we can compute the point density as a function of height. For instance, for an Ouster LiDAR with 64 beams, horizontal resolution of 1024, and a vertical field of view of 45, to a first approximation, two consecutive beams are at an angular separation of 0.7 () and lasers from same beam are 0.35 () apart. Using geometry, for a surface at a distance from the drone, we can compute the projection of the LIDAR on that surface. Using this projection, we can compute the point density throughout the whole point cloud. Central regions of the point cloud have much higher density than regions at the extremities.

(a) The lateral flight
(b) The longitudinal flight
(c) An alternative flight plan
(d) Mid-flight re-calibration
Figure 6: Model collection trajectory design

Coverage depends on height. The density of points at which a drone captures a surface depends on its height . Given a height , Fig. 5 shows the coverage of the LiDAR on a given surface. In general, the drone can only capture a subset of this full coverage region with a minimum target point density (shown by target density region in Fig. 5). Now, suppose the drone’s trajectory performs a rectilinear scan over the surface, like the one shown in Fig. 5(a). Then, to ensure that dart captures the entire surface at least at a density , the scan width must be equal to or smaller than the width of the target density coverage region (Fig. 5).

dart estimates scan width from LiDAR parameters. To estimate the width of the target-density coverage region, dart uses the LiDAR parameters, and models LiDAR geometry, to derive a function which returns the scan width for a given target density , and a given height . It models the beams of the LiDAR as having equal angular separation, so it can compute the points at which these beams intersect with the plane at a height (or distance) away. Given a target point density , dart can compute the largest width at this height that will ensure minimum point density .

This density guarantee is nominal; in practice, LiDARs may drop some reflections if they are noisy [Lidarsim]. Future work can model this noise for better equi-dense trajectory designs.

Trajectory path planning, and parameter selection. dart uses a fixed orientation, height and speed for the drone flight; it uses an offline data-driven approach to determine these.

Offline simulations to estimate parameters. As described above, LiDAR orientation, and drone height, and speed determine how well SLAM can estimate positions to ensure accuracy and completeness. We could have tried to analytically model the system to derive the optimal parameter choices. Modeling the LiDAR is feasible (as we discuss above); modeling SLAM’s feature matching mathematically is much harder. So, we resort to exploring the space of parameters using simulation. Specifically, we use game engine driven photorealistic simulators like AirSim [airsim] to sweep the space of parameters. Then, we validate these results using traces that we capture from our real-world prototype. We discuss this methodology in greater detail in §3, where we show that there exists a sweet spot in the parameter space that ensures high accuracy and completeness while minimizing flight duration. Specifically, §3.4 shows that a parallel orientation, while flying at a distance of 20 m from the surface (or lower, if necessary, to meet the point density constraint) at 1 m/s gives the best accuracy.

A fixed trajectory. Given these parameters, the building geometry and scan width, dart designs a drone flight path. Fig. 6 describes this for a building shaped like a rectangular solid; dart supports other building shapes (§3). dart’s model collection trajectory starts from an origin; this point defines the origin of the coordinate system for the resulting 3D model, then laterally traverses the building to capture the two sides of the building (Fig. 5(a)). Its flight path extends a distance beyond the building edges to account for errors in building geometry estimation. As it moves to each side laterally, it moves up/down on each face to capture them at the same minimum point density. Then, it returns to the origin, and traverses longitudinally (Fig. 5(b)).

Avoiding LiDAR rotations. Why return to the origin?333Drones use fiducials (e.g.,

a drone landing pad with a distinctive design) to identify the origin.

This loop closure maneuver is an important technique in SLAM to correct for drift [slam_part1]. If loop closure were not necessary, we could have designed a trajectory as shown in Fig. 5(c). However, this trajectory requires a rotation of the drone at the dashed segment to ensure that the lateral and longitudinal segments have the same drone orientation. Rotation can significantly increase SLAM drift; Fig. 10 shows an example in which the green dashed line depicts the actual (ground truth) drone trajectory, and the blue line SLAM’s estimated pose. At the bottom right corner of the trajectory, when the drone rotates, SLAM is completely thrown off.

Drift Estimation and Re-calibration. dart uses return-to-origin to re-calibrate SLAM drift. The second, longitudinal, flight starts a new SLAM session; to “stitch” the two sessions together, dart needs to compute a transformation matrix that transforms the coordinate system of the first session to that of the second. dart uses standard techniques for this. More important, dart designs the longitudinal flight to start close to the origin, which has two benefits: (a) shorter flight time resulting in lower overall energy consumption and (b) less drift accumulation.

Mid-flight re-calibration for accuracy and flight efficiency. Return-to-origin re-calibration might also be necessary in the middle of one of the flights (Fig. 5(d)), if the environment is sparse and SLAM tracking fails. To combat this, dart could have added more loop closure maneuvers in the lateral and longitudinal flights. However, returning to origin is an expensive operation in terms of the drone’s battery. Instead, dart actively monitors drift-error and returns to the origin only when needed. In that case, the flight resumes at the point at which it detected excessive drift: the direct path from the origin to that point is always shorter than the initial segment, ensuring that the resumed flight starts with a lower drift.

Using deviation from GPS trajectory to detect drift. However, detecting excessive drift is non-trivial, since dart has no way of knowing when SLAM’s position estimates are wrong, because it does not have accurate ground truth. dart leverages the GPS readings associated with SLAM poses: each sequence of readings gives a GPS trajectory, and dart attempts to find the best possible match between the GPS trajectory (e.g., the green line in Fig. 10) and the SLAM-generated trajectory (the blue line in Fig. 10). If there is a significant deviation, dart assumes there is a drift and invokes re-calibration.

This approach is robust to GPS errors, since it matches the shape of the two trajectories, not their precise positions (Algorithm 1). Specifically, dart continuously executes 3D SLAM on the stream of compressed LiDAR frames from the drone, and estimates the pose of each frame. It synchronizes the GPS timestamps with the LiDAR timestamps (line 1), then transforms GPS readings using the Mercator projection (line 2). It then aligns the GPS trajectory and the SLAM-generated trajectory using the Umeyama algorithm [umeyama] to determine the rigid transformation matrices (i.e., translation, and rotation) that best align the SLAM and GPS poses (lines 3-4). dart partitions trajectories into fixed length segments and after alignment, computes the RMSE between the two trajectories in these segments, and uses these RMSE values as an indicator of excessive drift: if the RMSE is greater than a threshold (lines 5-12), dart invokes return-to-origin.

Figure 7: Rotation throws off SLAM
Figure 8: To reconstruct other structures (e.g., a blimp), dart wraps them in a rectangular solid and plans a model collection trajectory for it.
Figure 9: Coverage and scan widths for different orientations and heights.
Figure 10: dart’s building detector on a real building.
Input : SLAM poses and GPS tags
Output : Imperfect regions
1 TimeSynchronization(, )
2 GPSToMercator( )
3 UmeyamaAlignment( , )
4 TransformTrajectory( , )
5foreach  in , in  do
6       RMSE( , )
7       if  IsExcessive()  then
8             .Append()
9      else
10             pass
11       end if
13 end foreach
Algorithm 1 Detecting Excessive Drift

Generalizing to other structures. Some aspects of model collection trajectory design depend on the geometry of the structure whose model we seek to reconstruct. To simplify the discussion and because geometric manipulations are not the focus of this paper, we have chosen rectangular buildings. We demonstrate in §3 that our approach generalizes to other regular building geometries. It also generalizes to other structures that can be tightly bounded within rectangular solids, such as aircraft fuselages or blimps (Fig. 10, §3). To accommodate arbitrary solid geometries, we expect that our techniques for generating equi-dense trajectories, in-flight re-calibration and our conclusions about orientation, height and speed will apply, but the actual trajectory design (Fig. 6) will need to match the shape of the solid. We leave these extensions to future work.

2.2 Estimating Building Geometry

Given a region of interest, dart conducts a reconnaissance (“recon”) flight to determine the boundary of the building’s roof. It uses this boundary for trajectory design (§2.1).

Goal and requirements. Accurate model reconstruction requires a low, slow, flight which can be battery-intensive. The recon flight helps dart scope its model collection to the part of the region of interest that contains the building to reduce battery usage. For instance, if in a large campus, buildings occupy only a small fraction of the surface area, a fast recon flight can reduce overall drone battery consumption. So, we require that the recon flight should have as short a duration as possible. In addition, (a) boundary estimation must not assume prior existence of a 3D model of the building (prior work in the area makes this assumption [lidarbuildingdetection, lidarimagebuildingdetection]); (b) boundary estimation must be robust to nearby objects like trees that can introduce error; and (c) buildings come in many shapes (e.g., rectangles, squares, hexagons etc.), so boundary estimation must generalize to these.

The Recon Trajectory. Similar to model collection, recon uses a rectilinear scan (Fig. 5(a), Fig. A.3). Unlike model collection, however, during recon the drone flies fast (4 m/s) and high (60 m above the building’s roof444We assume the nominal building heights in an area are known, for example, from zoning restrictions.), with the LiDAR mounted in a perpendicular orientation in order to have the shortest duration flight possible. We justify these parameter choices in §3, but Fig. 10 depicts the intuition for these choices. It shows, for an Ouster-64 LiDAR, the ground coverage area as a function of height. Coverage is highest between 40 and 60 m. At a given height, a perpendicular orientation covers a wider swathe of ground than a parallel orientation; this allows dart to use a larger scan width (Fig. 10), resulting in a shorter flight. As with model collection, during this flight, dart streams point clouds to its cloud component, which runs the boundary detection algorithms described below.

Challenges and Overview. This flight design poses two challenges for boundary detection. First, to detect the building’s boundary, it is still necessary to align all point clouds to the same coordinate frame of reference. In recon, dart cannot use SLAM because fast, high flights can cause SLAM to lose tracking frequently. We show below that, because boundary detection can afford to be approximate, we can use GPS. Second, high and fast flights result in low point density, and boundary detection algorithms must be robust to this.

dart’s building boundary detection takes as input the area of interest and outputs the GPS locations that constitute the boundary of the building. Model collection uses these outputs (§2.1). Boundary detection runs two different algorithms: rooftop surface extraction, followed by boundary estimation.

Step 1. Surface Extraction. The cloud component receives GPS-tagged compressed point clouds from the drone. It first uncompresses them, then computes the surface normal of every point in the point cloud. A surface normal for a point determines the direction normal to the surface formed by points within a fixed radius of the point. Then, dart uses RANSAC [ransac]

(a plane-fitting algorithm) to segment the LiDAR points into groups of points that fall onto planes. RANSAC is fast, but is not robust to outliers: (a) it combines into one surface all LiDAR points that satisfy the same planar equation, including disjoint sets of points (

e.g., from trees) at the same height; (b) point clouds can have multiple detected planes (e.g., building rooftop, ground surface, vehicles etc.), and RANSAC cannot distinguish between these.

To address the first issue, in each plane, dart removes outlying points that are further away from neighboring points in the same plane using a statistical outlier filter. Using the statistical outlier on every point cloud can be compute-intensive. To this end, dart tunes the statistical outlier’s parameters to find a sweet spot between filtering accuracy and performance. To find the rooftop among the multiple detected planes, dart uses surface normals to compute surface statistics for each plane (e.g.,

plane height, 3D centroid, normal variance

etc.). It uses these statistics to find the rooftop in the extracted planes. (As an aside, surface normal computation is computationally intensive but is parallelizable, so we use a GPU to accelerate this, as discussed in §3). Intuitively, the rooftop is a large, uniformly oriented surface (surface normal variance is low) that lies above the ground plane. dart can eliminate the ground plane as that plane whose points are consistent with the drone’s height. So, it discards all planes that do not satisfy this definition (this includes planes with high variances and the ground surface). At the end of this step, dartclassifies a single plane as the roof surface.

To remove the possibility of false positives, dart uses majority voting to remove erroneous surface detections; it classifies a plane as the rooftop only if it detects it in multiple consecutive frames. Lastly, even though the outlier filter removes small sets of outliers in planes, it is unable to remove large clusters of points belonging to objects like trees found nearby the building. For this, dart forms clusters of points based on their spatial relationships such that neighboring points belong to the same cluster. This way, points belonging to different objects form clusters. Since the roof is normally a relatively larger surface, dart simply discards smaller clusters. To do this in near real-time, dart finds the right set of parameters for the clustering algorithm.

Step 2. Estimating the boundary of the building. In the previous step, dart obtains parts of the rooftop from each point cloud. In this step, it uses the drone’s GPS location to transform each surface to the same coordinate frame of reference, then combines all surfaces into a single point cloud that represents the extracted rooftop of the building. To extract the boundary of the building, it extracts the alpha shape [alphashape] of the stitched point cloud. A generalization of a convex hull, an alpha shape is a sequence of piecewise linear curves in 2-D encompassing the point cloud representing the rooftop. This allows dart to generalize to non-convex shapes as well. Finally, to detect the boundary of multiple buildings, dart clusters the rooftop point clouds.

Fig. 10 shows results from the building boundary detection algorithm on real data taken from our drone. The green rectangle is the ground truth boundary of the building. The blue points illustrate the drone’s recon trajectory and the grey points depict locations where dart detects a rooftop. Some grey points are beyond the building’s boundary because the LiDAR has a wide field of view and can see the rooftop even after it has passed it. The red points show the GPS stitched 3D point cloud of the building’s rooftop.

2.3 Point-Cloud Compression

LiDARs generate voluminous 3D data. For instance, the Ouster OS1-64 LiDAR (which we use in this paper), generates 20 point clouds per second that add upto 480 Mbps, well beyond the capabilities of even future cellular standards. dart compresses these point clouds to a few Mbps (1.2 to 4.0), using two techniques: viewpoint filtering, and octree compression. We describe these in §A.2.

3 Evaluation

We evaluate (a) dart’s ability to reconstruct 3D models in near real-time, and (b) the accuracy of these 3D models ( §2.1). We also describe our parameter sensitivity analyses for data collection, and evaluate boundary detection performance.

Implementation. Not counting external libraries and packages, dart is 15,500 lines of code (discussion in §A.3).

Simulations. We evaluate dart using a photorealistic simulator, AirSim [airsim], that models realistic physical environments using a game engine, then simulates drone flights over these environments and records sensor readings taken from the perspective of the drone. AirSim is widely accepted as a leading simulation platform for autonomous vehicles and drones by manufacturers, academia and leading industrial players. AirSim has a parametrizable model for a LiDAR; we used the parameters for the Ouster OS1-64 in our simulation experiments. dart generates trajectories for the AirSim drone, then records the data generated by the LiDAR, and processes it to obtain the 3D model. For computing the metrics above, we obtain ground truth from AirSim. To build the ground truth 3D model, we flew a drone equipped with a LiDAR several times over the region of interest in AirSim (using exhaustive flights) and then stitched all the resulting point clouds using ground truth positioning information from AirSim.

Real-world Traces. In addition, we have collected data from nearly 30 flights (each of about 25 minutes) on an M600Pro drone with an Ouster OS1-64 LiDAR on a commercial complex. For almost all experiments, we evaluated dart on both real-world and simulation-driven traces. Simulation-driven traces give us the flexibility to explore the parameter space more (as we show below). However, we use real-world traces to validate all these parameter choices and estimate reconstruction accuracy in practice.

Metrics. In this section, we quantify end-to-end latency, 3D model accuracy and completeness (§2.1), and positioning error for a number of experimental scenarios. We also quantify dart’s energy-efficiency (using flight duration as a proxy for drone battery usage) and the computational capabilities of its processing pipeline.

time (s)
Large building (100 m x 50 m x 20 m)
0.87 0.35 3900
0.21 0.24 719
0.14 0.17 719
Small building (50 m x 50 m x 20 m)
3.36 1.30 2400
0.62 0.43 3300
0.25 0.14 656
0.21 0.09 656
Table 2: Reconstruction accuracy, completeness (comp.), and time for two large buildings using three schemes: a) offline reconstruction with shortest duration flight (Offline-SDF), b) offline reconstruction with dart trajectory planning (Offline-TP), c) dart, and d) dart with raw traces.
Offline SLAM
480 2.30 1.30
Online w GPS.
3.80 1.60 0.53
3.80 0.13 0.09
Table 3: Reconstruction quality of a real-world 70 x 40 x 20 m building with online and offline approaches, relative to an uncompressed trace.
duration (s)
Blimp 586 0.23 0.03
Small rect. 656 0.25 0.14
Star-shaped 686 0.37 0.12
Large rect. 719 0.21 0.24
Plus-shaped 1024 0.31 0.06
H-shaped 1044 0.34 0.10
Pentagonal 1361 0.31 0.12
Table 4: dart 3D reconstruction times (recon and model collection) and quality for different structures at low compression

3.1 3D Model Reconstruction

Experiment setup. To evaluate the end-to-end performance of dart in building an accurate 3D model in near real-time, we collected and reconstructed the 3D model of two buildings: a) a large 50m x 100m x 20m (L x W x H) and, b) a small 50m x 50m x 20m building in Airsim. We then compared the reconstruction accuracy of these models with two baseline offline approaches (i.e., approaches that reconstruct the 3D model after the drone lands): a) offline reconstruction555We assume both offline approaches know the exact location of the building. Without this, SLAM accumulates significant drift and reconstructions are very poor. with the shortest duration flight (Offline-SDF), b) offline reconstruction with dart’s trajectory planning (Offline-TP). We calculated the accuracy and completeness of the models generated by these approaches by comparing them against ground truth models generated from AirSim. Lower is better for accuracy and completeness. For these experiments, dart uses compressed point clouds with bandwidth requirements that are compatible with LTE speeds today (i.e., upload bandwidth of 3.8 Mbps). dart-raw shows accuracy and completeness if dart were to use raw point clouds; we study the effect of compression on dart model reconstruction more in §3.3.

dart builds significantly accurate models in less time. As Table 2 shows, dart achieves lower than 25 cm accuracy and completeness for both buildings and reconstructs the entire buildings in just 10-12 minutes (the flight duration). For reconstruction quality, dart does much better than the two baseline approaches for two reasons: a) careful trajectory planning (TP), and b) in-flight re-calibration. Since Offline-SDF does neither, its accuracy and completeness values are very large. To reconstruct the larger building i.e., 100 x 50 x 20m building, the drone needs to fly more and accumulates significant drift (as compared to the smaller building) and has poor accuracy and completeness (shown by ). Offline-TP does better because it uses dart’s trajectory planning, but still exhibits worse accuracy and completeness than dart because it lacks in-flight calibration. This shows the importance of a real-time quality feedback signal for reconstruction and highlights why offline reconstruction is not accurate even with uncompressed traces. Though dart uses compressed point clouds, with in-flight re-calibration and trajectory planning, dart’s models are upto 3.5x more accurate and complete. If dart were to use raw traces (dart-raw) instead, loss of accuracy and completeness is attributable to SLAM. Relative to a raw trace, compression accounts for 4-7 cm difference in accuracy and completeness. Moreover, dart reconstructs while the drone is in-flight whereas the other two baseline approaches do reconstruction offline on uncompressed point clouds, incurring up to 4.6 higher reconstruction time666In practice, offline reconstruction will have higher reconstruction times because we did not consider time to upload data to the cloud..

To get a visual feel for the degradation resulting from lower accuracy and completeness, consider Fig. A.6, which shows the ground-truth model, together with the dart reconstructions. With an accuracy of 0.25 m (using 3.8 Mbps upload bandwidth), the model closely matches the ground truth, but the textured building surface on the right shows some small artifacts. These artifacts arise not because of compression, but because of SLAM imperfections (§3.3).

dart generalizes to different building shapes. Our results so far, and the descriptions in §2, have focused on rectangular buildings. dart can accurately reconstruct a variety of building types, as Table 4 shows. For these results, we use dart’s default flight parameters and low compression. Larger buildings (pentagonal, plus, and H-shaped) have larger flight durations partly because of their size and because they require two re-calibration steps. Even then, for all buildings, dart achieves tens of centimeter accuracy and completeness.

dart generalizes to other types of structures. To show that dart can reconstruct other types of 3D structures e.g., airplanes, helicopters etc., we modeled a real-world blimp [blimp] (15 m x 60 m x 15 m) in AirSim. dart encloses such structures within an enclosing rectangular solid (Fig. 10, §3.4). In less than 10 minutes (Table 4), dart reconstructed the blimp with an accuracy of 23 cm and completeness of just 3 cm.

High accuracy is possible on real-world traces. Results from our drone flights validate that real-world data can result in comparable performance (Table 3). In these experiments, we reconstructed the 3D model of a real-world 70 m x 40 m x 20 m building. Because we lack a reference ground truth for real-world data, we use the 3D model generated from raw, uncompressed traces. Offline reconstruction using SLAM after the drone lands fails completely for the same reasons mentioned above (i.e., no trajectory planning, and no re-calibration). With GPS, it is possible to do in-flight reconstruction, however, the accuracy and completeness being 1.60 m and 0.53 m, make such 3D models unusable. With dart, on the other hand, we can build accurate, and complete 3D models whose completeness and accuracy are 9 and 13 cm respectively (top-down view of 3D model in Fig. A.2).

Compression scheme
Upload bandwidth (Mbps)
3.80 2.50 1.27
Compression time (ms)
62.9 65.3 65.3
Network latency (ms)
15.5 14.6 13.7
Extraction time (ms)
5.05 3.07 3.03
SLAM time (ms)
34.5 30.8 23.9
Total processing time (ms)
117 113 106
Table 5: dart enables real-time 3D reconstruction over LTE. Each row shows per frame latency for that operation.

3.2 Performance

Real-time 3D reconstruction over LTE is feasible. To validate that dart can collect a 3D model end-to-end in near real-time, we used our implementation to conduct an experiment in which we replayed 15 minutes worth of real-world data on the drone compute (a Jetson TX2). It then compressed and streamed point clouds over an LTE connection, to a 16-core AWS VM with 64 GB RAM and a Nvidia T4 GPU. (Our experiment only ran model collection; recon also runs in real-time as discussed below).

To compress point clouds, we used three different levels of compression (low, medium and high), corresponding to the following combinations of octree resolution and point resolution (§2.3): , and (effect of compression is studied in §3.3). In our experiments with our drone, we have found achievable LTE throughput to range from 1-4 Mbps; we chose these compression levels to correspond to this range. (In §3.3, we discuss how 5G deployments would alter these conclusions).

At all three compression modes, dart was able to stream point clouds in real time (Table 5), and the total end-to-end processing time per frame is about 110 ms, of which nearly 65 ms is network latency. Thus, dart builds the 3D model whilst the drone is in-flight, adds a frame within 100 ms after receiving it and can make available a complete 3D model of a building in about a 100 ms after receiving the last frame!

dart component Sub-component
execution time
Recon phase 3D frame compression 13.0
3D frame extraction 3.0
GPU normal estimation 76.0
RANSAC plane-fitting 5.0
Outlier removal 0.2
Rooftop detection 0.005
Rooftop extraction 6.0
Rooftop stitching 3.0
Total time 100
Model collection LiDAR SLAM 37.0
3D Reconstruction 10.3
Table 6: Processing times for dart components.

dart supports full frame rate processing. We profiled the execution time of each component of dart on a 15-minute real-world trace. Point cloud compression executes on the drone, and other components run on the AWS VM mentioned above. We use the GPU to offload the computation of surface normals for building detection. During recon, point cloud compression takes 13 ms per frame (Table 6). Extracting the building geometry requires 100 ms per frame; with these numbers, we can sustain about 10 fps, so with a 20 fps LiDAR, we process roughly every other frame. Despite this, our building detector is quite accurate (§3.5). During model collection, SLAM requires 37 ms per frame, and 3D reconstruction requires about 10 ms (Table 6). The former uses 8 cores, so we have been able to run these two components in a pipeline to sustain 20 fps. Thus, a moderately provisioned, cloud VM suffices to run dart at full frame rate with an end-to-end compute latency of about 100 ms for reconnaissance, and 50 ms for model collection.

3.3 Ablation Studies

In this section, we explore how dart’s techniques contribute to 3D reconstruction quality and performance.

Structure type Flight dur. (s) Accuracy (m) Comp. (m)
Star-shaped 613 686 1.05 0.37 0.39 0.12
Small rect. 573 656 0.63 0.25 0.40 0.14
Large rect. 694 719 0.96 0.21 0.39 0.24
Plus-shaped 766 1024 0.51 0.31 0.08 0.06
H-shaped 866 1044 1.10 0.34 0.27 0.10
Pentagonal 1062 1361 1.47 0.31 0.42 0.12
Table 7: Flight duration (dur.) and reconstruction quality for buildings at low compression with (w) and without (w/o) re-calibration.
Velocity (m/s) 0.5 1.0 2.0
@ 1 m/s
Height (m) 30 40 50 40
Flight duration (s) Recon 136 136 136 -
Model coll. 673 343 72 1520
Re-calib. 476 240 272 -
Total time 1285 719 480 1520
Accuracy (m) 0.43 0.14 0.91
Completeness (m) 0.34 0.17 0.45
Table 8: dart’s flight duration for various parameter choices.

Re-calibration helps reduce error. To show the effect of in-flight re-calibration, we built online 3D models of the 7 large buildings mentioned above using dart with (w) and without (w/o) re-calibration in Airsim. In these experiments, we evaluate flight duration and reconstruction quality at low compression (3.8 Mbps upload bandwidth) using accuracy and completeness metrics. Table 7 shows that, on average, at the expense of only 18% (150 seconds) longer flights, dart improves accuracy by 65% (65 cm) and completeness by 55% (20 cm) with re-calibration flights. Larger buildings (plus-shaped, H-shaped, and pentagonal) require longer aerial flights which accumulate higher drift. This results in relatively more re-calibration flights and hence higher flight duration. Even so, dart is able to reconstruct these buildings accurately, demonstrating the importance of re-calibration.

Short flight durations can produce accurate models. dart strives to reduce drone battery depletion in its design by generating short duration flights without sacrificing accuracy and completeness. To show that dart’s defaults of 1 m/s speed and 40 m height represent the best point in this tradeoff space, we compare it to a lower, slower flight (30 m, 0.5 m/s), and a faster, higher flight (50 m, 2 m/s). Table 8 shows that, on the large building the lower, slower flight has a longer trajectory, resulting in more re-calibrations. The resulting model has worse accuracy and completeness; re-calibration can limit drift error, but not reverse it. A faster, higher flight has a slightly shorter trajectory, but the resulting model’s accuracy is very poor, because there is less overlap between point clouds at higher speeds (§2.1). Finally, Table 8 also shows the benefits of a recon flight: an exhaustive flight that uses the model collection parameters and does not perform recon is 3 longer than dart’s flight (and accumulates significant drift, resulting in poor quality 3D models). Results on the small building are qualitatively similar (omitted for brevity).

dart builds accurate models at low bandwidths. We explore the impact of compression on accuracy and completeness using (a) a synthetic building in AirSim and (b) real-world traces. In addition to the three compression schemes discussed earlier, we compute accuracy and completeness for (a) raw point clouds, (b) viewpoint compression and (c) lossless compression. The first two alternatives provide calibration, while the third alternative explores reconstruction performance under higher bandwidth as would be available, for example, in 5G deployments.

As Table 9 shows, viewpoint filtering achieves a 10 compression throughout. Low compression is an order of magnitude more efficient beyond this. Despite this, dart can achieve high quality reconstruction. For the AirSim building, consider accuracy: the raw-point cloud has an accuracy of 0.21 m and 0.09 m, which is attributable entirely to SLAM error. View-point filtering does not degrade accuracy since it only omits zero returns. Low compression, with a bandwidth of 3.8 Mbps (easily achievable over LTE and over 100 more compact than the raw LiDAR output) only adds 4 cm and 5 cm to accuracy and completeness (respectively). Medium and high compression have significantly poorer accuracy and completeness. Similar results hold true for the other AirSim building, so we omit for brevity.

Results from our drone flights validate that real-world data of a large building (dimensions in Table 9) can result in comparable performance (Table 9). Since we lack a reference ground truth for real-world data, we use the 3D model generated from raw traces. With real-world traces, we can build accurate, and complete 3D models that are within 9-13 cm completeness and accuracy for low compression, and about 16-23 cm for medium compression, with respect to the uncompressed traces. This suggests that highly compressed point clouds do not significantly impact accuracy and completeness.

Real-world 70 m x 40 m x 20 m large building
480.0 0.00 0.00
42.7 0.00 0.00
7.86 0.06 0.07
3.80 0.13 0.09
2.50 0.23 0.16
1.27 0.28 0.29
AirSim 50 m x 50 m x 20 m small building
480.0 0.21 0.09
42.7 0.21 0.09
7.86 0.22 0.10
3.80 0.25 0.14
2.50 0.66 0.21
1.27 0.73 0.24
Table 9: The impact of compression on accuracy/completeness.

Higher bandwidths provide centimeter-level improvements. The emergence of 5G promises larger upload bandwidths. However, as Table 9 illustrates, room for improvement in accuracy and completeness is small. For the AirSim building, the gap between raw point clouds and low compression accuracy (completeness) is only 4 cm (5cm); for the real-world building, it is 7 cm (2cm). Lossless point cloud compression, which requires 7.86 Mbps bandwidth comes within 1 cm of the raw point cloud accuracy and completeness for the AirSim building and within 7 cm for the real-world building.

Lower target density worsens completeness. To demonstrate that users can use the target density tuning knob to obtain less complete models more quickly, we conducted an experiment with dart (with re-calibration) at two different densities: 7.5 points per m and 1 point per m. For the former, accuracy and completeness were 0.21 m and 0.14 m, and for the latter 0.68 m, 0.17 m respectively. The lower density flight took 20% less time. As expected, completeness is worse at lower target densities. At the lower density, accuracy is worse because two adjacent scan lines have smaller overlap. Put another way, a side benefit of specifying higher density is the higher accuracy from scan line overlap.

3.4 Data Collection

dart relies on a careful parameter sensitivity analysis (in both simulation and on real-world traces) to determine model collection flight parameters: speed, height, and orientation (§2.1). We have evaluated SLAM error for every combination of drone speed (ranging from 0.5 m/s to 3 m/s), distance from building (10 m to 40 m) and orientation (parallel to perpendicular). We present a subset of these results for space reasons. For these experiments, we use the trajectory described in Fig. 5(c). We report the average numbers for each experiment.

Figure 11: SLAM errors for LiDAR orientations.
Figure 12: SLAM error for different speeds and building distances.

Best choice of orientation is parallel. Fig. 12 plots SLAM error as a function of LiDAR orientation (Fig. 5) with respect to the direction of motion. A parallel orientation has lowest SLAM error (in Fig. 12, yaw 0 corresponds to parallel and yaw 90 to perpendicular), because it has highest overlap between successive frames; as yaw increases, overlap decreases, resulting in higher SLAM error (§2.1).

Best choice of distance is 20 m. Fig. 12 plots the SLAM error as a function of the drone’s distance from the building surface for the parallel orientation of the LiDAR. Error increases slowly with height; beyond a 20 m distance from the building, the error is more than 1 m. Point densities decrease with height and affect SLAM’s ability to track features/points across frames (§2.1). Rather than fly lower, dart operates at a 20 m distance (or a 40 m height, since in our experiments buildings are 20 m tall) to reduce flight duration.

Best choice of speed is 1 m/s. Speed impacts SLAM positioning error significantly (Fig. 12). Beyond 1 m/s, SLAM cannot track frames accurately because of lower overlap between frames (§2.1). Below 1 m/s i.e., at 0.5 m/s, the flight duration (in seconds) is twice that of 1 m/s which results in drift error accumulation. To achieve accurate reconstruction, dart chooses to fly the drone at 1 m/s.

Real-world traces confirm these observations. Real-world tracesA.5) validate these parameter choices ( Table A.3,  Table A.4) i.e., fly slow, close to the building and in parallel.

3.5 Boundary Detection

Methodology and metrics. We use two metrics for building boundary estimation: accuracy, and completeness. Accuracy is the average (2-D) distance between each point (quantized to 0.1 m) on the predicted boundary and the nearest point on the actual building boundary. Completeness, is the average distance between each point on the actual boundary and the nearest point on dart’s predicted boundary. Lower values of accuracy and completeness are better.

We use both real-world traces collected from our dart prototype and synthetic traces from AirSim. To compute ground truth for real-world traces, we pin-pointed the building’s boundary on Google Maps [google_maps]. For AirSim, we collected the ground truth from the Unreal engine.

Boundary detection can run at full frame rate. Table 6 shows the time taken for each component of boundary detection, on our real-world traces on a single core of a GPU-equipped desktop. The average processing time per point cloud is 100 ms, dominated by GPU-accelerated surface normal estimation (76 ms). This can sustain 10 fps. However, our LiDAR generates 20 fps, so dart uses every other frame, without sacrificing accuracy.

Boundary detection is accurate. To evaluate the accuracy of dart’s boundary extraction, we experimented with 3 real-world traces collected over a 70 m x 60 m x 20 m building. For these traces, dart’s average accuracy is 1.42 m and its completeness is 1.25 m, even at the highest compression and when it samples every other frames.

Other results. We extensively evaluated dart’s boundary detection algorithm’s robustness to different building shapes (Table A.1), point cloud compression (Table A.2), and point cloud sub-sampling. Furthermore, we performed an extensive parameter study to find the right flight parameters i.e., speed (Fig. A.5), height (Fig. A.4) and orientation. For brevity, we have included results and discussions in the appendix (§A.4). We summarize two results. First, recon flights can be short (boundary detection is insensitive to point density and overlap). So, it can use perpendicular orientation, fly at 60 m from the building at 4 m/s. Second, it tolerates sub-sampling upto one point cloud per second.

4 Related Work

Networked 3D sensing and drone positioning. Some recent work has explored, in the context of cooperative perception [AVR] and real-time 3D map update [CarMap], transmitting 3D sensor information over wireless networks. Compared to dart, they use different techniques to overcome wireless capacity constraints. Robotics literature has studied efficient coverage path-planning for single [sensorplanning], and multiple drones [ubanc]. dart’s trajectory design is influenced by more intricate constraints like SLAM accuracy and equi-density goals. Accurately inferring drone motion is important for SLAM-based positioning [observability]. Cartographer [Cartographer], which dart uses for positioning, utilizes motion models and on-board IMU’s for estimating motion. In future work, dart can use drone orchestration systems [beecluster], for larger campus-scale reconstruction with multiple drones.

Offline reconstruction using images. UAV photogrammetry [federman2017] reconstructs 3D models offline from 2D photographs. Several pieces of work [7139681, 7989530, 8124461, 8628990] study the use of cameras (either RGB or RGB-D) on UAVs for 3D reconstruction. Prior work [7139681] has proposed a real-time, interactive interface into the reconstruction process for a human guide. The most relevant of these [mostegel2016uav, 7422384] predicts the completeness of 3D reconstruction in-flight, using a quality confidence predictor trained offline, for a better offline 3D reconstruction. However, unlike dart, this work requires human intervention, computes the 3D model offline, requires close-up flights, cannot ensure equi-dense reconstructions, cannot dynamically re-calibrate for drift and is not an end-to-end system. A body of work has explored factors affecting reconstruction accuracy: sensor error [6899451], tracking drift, and the degree of image overlap [7139681, LIENARD2016264]. Other work [8793729, 8628990, bylow2019combining] has explored techniques to reduce errors by fusing with depth information, or using image manipulations such as upscaling. Unlike dart, almost all of this work reconstructs the 3-D model offline.

Offline reconstruction using LiDAR. 3D model reconstruction using LiDAR [uav_lidar_1, uav_lidar_2] relies on additional positioning infrastructure such as base stations for real-time kinematic (RTK) positioning, and long-range specialized LiDAR to achieve tens of centimeters model accuracy. dart explores a different part of the design space: online reconstruction with sub-meter accuracy using commodity drones, GPS and LiDAR. More recent work has explored drone-mounted LiDAR based offline reconstruction of tunnels and mines, but require specialized LiDARs and a human-in-the-loop [Prometheus, hovermap] for drone guidance (either manually or by defining a set of waypoints).

Rooftop boundary detection. Prior work has used infrared sensors, RGB-D cameras [rgbfasterboundarydetection] and a fusion of LiDAR [lidarbuildingdetection] with monocular cameras [lidarimagebuildingdetection, lidarorthophotoboundarydetection]. These assume a pre-existing stitched 3D point cloud [lidarbuildingdetection] or orthophoto [lidarorthophotoboundarydetection] and are not designed to operate in real-time. dart’s boundary detection accuracy is comparable to these pieces of work, even though it does not rely on these assumptions.

5 Conclusions

In this paper, we have taken a step towards accurate, near-real time 3D reconstruction using drones. Our system, dart, uses novel techniques for navigating the tension between cellular bandwidths, SLAM positioning errors, and compute constraints on the drone. It contains algorithms for estimating building geometry, for determining excessive SLAM drift, and for recovering from excessive drift. It can achieve reconstruction accuracy to within 10s of centimeters in near real-time, even after compressing LiDAR data enough to fit within achievable LTE speeds. Future work can include using more sophisticated drone battery models, cooperative reconstruction of large campuses using multiple drones, and generalizing further to structures of arbitrary shape.


A Appendix

a.1 Drone compute

Figure A.1: Plane-fitting on a TX2

We ran a plane-fitting algorithm, RANSAC, (a module that we use in our pipeline) on a real-world point cloud trace using drone compute platform (Jetson TX2). We found that (Fig. A.1) it takes the TX2, on average, 0.5 seconds to process a single point cloud. The 64-beam LiDAR generates 20 point clouds per second whereas plane-fitting accounts for only 5% of the entire execution time of our reconstruction pipeline. Thus, the TX2 will take 200 seconds to process a single second’s worth of data from the 64-beam LiDAR if we ran it at 20 frames per second. To this end, we offload computations from the drone to the cloud.

a.2 Point cloud compression

dart uses two techniques (i.e., viewpoint filtering and Octree compression) to compress LiDAR point clouds to within 1.2 to 4.0 Mbps and transmit them over LTE.

Figure A.2: Top down view of reconstructed 3D model for a large real-world complex

Viewpoint filtering. The OS1-64 LiDAR has a 360 horizontal field-of-view (FoV) and a 45 vertical FoV. In a drone-mounted LiDAR (Fig. 5), only a portion of the full 360 contains useful information. Beams directed towards the sky, or towards objects beyond LiDAR range, generate zero returns. Viewpoint filtering removes these, and also removes returns from the body of the drone. To compress point clouds, dart simply removes zero returns. In practice, we have found it to be important to also filter out returns from the drone itself, and also returns further away from the nominal range of the LiDAR, since these are erroneous. So, dart filters all points closer than 5 m and further than 120 m.

Octree compression. After filtering the point cloud, dart compresses the retained data using a standard octree compression algorithm [octree] designed specifically for point clouds (and hence this is better than data-agnostic compression techniques like gzip). An octree is a three-dimensional tree data structure where each node is a cube that spans a 3D region, and has exactly eight children. The dimensions of the cubes at the leaves of the tree determine the octree resolution. The numerical precision used to encode point positions determines the point resolution. Octree compression efficiently encodes empty leaves or empty tree-internal nodes (those whose descendant leaves are empty). It also performs inter-frame compression (similar to video encoders), efficiently encoding unchanged leaves or internal nodes between two successive point clouds. As we show in §3, we can parameterize octree compression to achieve point-cloud transmission rates of 1.2-4 Mbps. dart chooses different values of octree resolution and point resolution, two parameters that govern the compressibility of point clouds, to achieve point-cloud transmission rates of 1.2–4 Mbps (§3), well within the range of achievable LTE speeds.

a.3 Implementation Details.

We have implemented dart using the Point Cloud Library (PCL [octree]), the Cartographer [Cartographer] LiDAR SLAM implementation777We use Cartographer but it can be replaced by other LiDAR SLAM algorithms like LOAM [zhang2014loam], the Boost C++ libraries [Boost], and the Robotic Operating System (ROS [ros]). For the recon phase, we used functions from the Point Cloud Library (PCL [octree]) for plane-fitting, outlier removal and clustering. Our compression and extraction modules also use PCL and are implemented as ROS nodes. The drift detection module uses a Python package for the Umeyama alignment [grupp2017evo]. Not counting libraries and packages it uses, dart is 15,500 lines of code.

a.4 Recon Flight

The goal of the recon flight is to survey the area and find the boundary of the structure as fast as possible. dart uses a flight trajectory as shown in Fig. A.3 in which parallel scans of length are separated by a scan width . In designing the recon flight, dart can change the height, speed and LiDAR orientation of the drone. To find the right set of parameters, we performed an exhaustive parameter sweep.

Optimum height for recon. To find the optimum height for the recon flight, we planned recon trajectories for a 20 m building (within a 300 m x 300 m area) in AirSim at different heights (from 40 m to 90 m). We flew the drone and ran the boundary estimation on the collected highly compressed LiDAR point clouds at 10 Hz. For each height, we collected data and ran the boundary detection module five times. Higher flights increase scan width ( Fig. 10) at the expense of point density. However, dart’s boundary detection algorithm is robust to lower density point clouds (up till 80 m) and can accurately estimate the boundary of the building from a height of upto 80 m. Fig. A.4 shows the 2D boundary detection accuracy, completeness (lower is good) and flight duration (as a proxy for battery usage) as a function of the height of the drone. We find that at 80 m (or 60 m from the building), dart can jointly optimize for battery efficiency and boundary detection accuracy. At 80 m, dart can complete the recon flight in 150 seconds and estimate the boundary to within 2.5 m accuracy and completeness. Beyond 80 m, the scan width and point density decrease. This results in longer flights and higher boundary detection accuracy and completeness.

Figure A.3: Recon flight trajectory for dart.
Figure A.4: Finding the right height for boundary detection accuracy and battery efficiency in the recon flight.

Optimum speed for recon. To find the optimum speed for the recon flight, we planned a recon trajectory for the drone to fly over the same 20 m building at a height of 80 m from the ground. We flew the drone in the planned trajectory at speeds from 1 m/s to 8 m/s and ran boundary detection on the highly compressed point clouds at 10 Hz. For each speed, we collected data and ran the boundary detection module five times. Fig. A.5 illustrates the effect of drone speed on the boundary detection accuracy, completeness and the flight duration. A higher speed results in lower flight duration but at the expense of boundary detection accuracy and completeness. Even then, dart robustly extracts the boundary up till 6 m/s. At higher speeds, the overlap between consecutive frames is smaller and hence dart cannot accurately stitch the frames together. As such, dart flies the drone at the sweet spot i.e., 4 m/s where the flight duration is approximately 150 seconds and accuracy and completeness are 2.5 m.

Optimum LiDAR orientation. LiDAR orientation controls scan width and point cloud overlap. A parallel orientation means larger overlap but small scan width . On the other hand, a perpendicular orientation means smaller overlap but larger scan width . Larger scan width means a smaller flight duration (Fig. A.3). A large overlap means better scan matching accuracy. Since dart uses GPS for stitching in the recon phase, so it is robust to the overlap. Hence, to minimize flight duration, it uses a perpendicular orientation of the LiDAR. We conducted experiments (omitted for brevity) without different orientations of the LiDAR and confirmed that a perpendicular orientation minimzes flight duration without any loss in accuracy/completeness.

Figure A.5: Finding the right speed for boundary detection accuracy and battery efficiency in the recon flight.

Boundary extraction for different buildings. To show that dart can accurately extract the 2D boundary of any building, we collected LiDAR traces of a drone flying over five different buildings in Airsim at a height of 80 m and speed of 4 m/s. We collected data over each building five times. Then, we ran boundary detection on the highly compressed point clouds at 10 Hz. We summarize the boundary detection accuracy, completeness and the flight duration in Table A.1. As expected, the flight duration for all buildings is independent of the underlying building. For all building types, dart can accurately extract the boundary of all buildings within 2.5 m accuracy and completeness. This shows that dart’s boundary detection is scalable to all building shapes.

duration (s)
Star-shaped 150 1.39 1.67
H-shaped 150 1.31 1.83
Plus-shaped 150 1.35 1.55
Pentagon 150 2.58 2.58
Rectangular 150 2.50 2.53
Table A.1: dart boundary estimation accuracy, completeness and flight duration for different building types using high compression.

Effect of point cloud compression. To evaluate the effect of point cloud compression on boundary extraction, we compressed a real-world over the 70 m x 40 m x 20 m building with the four different compression profiles described above. Then, we ran our boundary extraction algorithm on the compressed traces. Table A.2 shows that dart’s boundary extraction algorithm is robust to compression. While bringing down bandwidth by a factor of 377, for high compression, dart only trades off 36 cm in accuracy and 24 cm in completeness. With higher bandwidths promised with the emergence of 5G, dart can achieve the same boundary extraction accuracy as an uncompressed trace.

bandwidth (Mbps)
Uncompressed 480.0 1.09 1.09
View-point 42.7 1.09 1.09
Lossless 7.86 1.09 1.09
Low 3.80 1.09 1.10
Medium 2.50 1.13 1.07
High 1.27 1.45 1.33
Table A.2: dart boundary estimation accuracy and completeness for different levels of compression.

Effect of sub-sampling. dart’s boundary detection algorithm runs at 10 fps. A Ouster-64 beam LiDAR generates 20 point clouds per second. So, the boundary detection algorithm must be robust to sub-sampling of point clouds. Our evaluations show that, for a drone traveling at 4 m/s, it works well even when using one point cloud every 3 seconds. Because dart’s boundary detection uses GPS for stitching, it does not need overlap between 3D frames.

a.5 Data Collection

In this section, we perform a parameter sensitivity study to find the optimum parameters for running SLAM accurately on real-world UAV flights. To do this, we report positioning error generated by SLAM. For the lack of accurate ground truth in the real-world, we compare SLAM positions against a GPS trace. Positioning accuracy is directly related to 3D model RMSE because these poses are used to position 3D point cloud in generating a 3D model. A higher positioning error leads to a higher reconstruction error and vice-versa.

Effect of drone speed. Because GPS is erroneous, we only draw qualitative conclusions. As Table A.3, taken from our drone traces, shows, slower flights have lower SLAM error than faster one, and parallel orientations have lower SLAM error than perpendicular.

Effect of drone height. Similarly, SLAM error increases with height and, in real-world traces, the parallel orientation seems to be significantly better than the perpendicular orientation (Table A.4). At a distance of 20 m from the surface of the building, the parallel orientation has the minimum positioning error i.e., 1.25 m. Beyond 20 m for parallel and 40 m for perpendicular, SLAM loses track completely because of lower point density.

LiDAR Orientation Drone model collection speed (m/s)
1.5 m/s 3.0 m/s
Parallel 1.25 3.33
Perpendicular 3.12 7.64
Table A.3: Positioning errors for parallel and perpendicular LiDAR orientations at different speeds for real-world traces at a vertical height of 20 m from the building.
LiDAR orientation Drone model collection height from building (m)
20 40 60
Parallel 1.25 5.41
Perpendicular 2.18
Table A.4: Positioning errors different LiDAR orientations at different heights for real-world traces at 1 m/s.
Figure A.6: Reconstructed 3D models at different levels of compression. Top-left: ground truth, top-right: low compression, bottom-left: medium compression, bottom-right: high compression