1 Introduction
Dronebased 3D reconstruction. The last few years have seen impressive advances in the commoditization of drones. Today, drones can be equipped with on board compute, a cellular (LTE) radio and sophisticated sensors (e.g.,
cameras, stereo cameras and LiDAR). Given this, in the coming years, drones will likely revolutionize aerial photography, mapping and threedimensional (3D) reconstruction. Recent estimates put the total market for these dronebased services at 63 billion US dollars by 2025
[dronemarket1], with 3D reconstruction, the task of generating a digital 3D model of the environment, accounting for nearly a third of the drone services market [dronemarket2].What is a 3D model? The term 3D model covers a wide range of geometric representations of the surfaces of objects, from coarsegrained approximations (cylinders, cubes, intersection of planes), to more finegrain representations such as meshes (smallscale surface tesselations that capture structural variations). In this paper, we seek to extract a finegrain pointcloud of a large structure (e.g., a building, blimp or airplane) which consists of dense points on the surface of the structure. Each point has an associated 3D position, together with other attributes (depending on the sensor used to generate the pointcloud). The pointcloud based 3D model can generate all other representations.
Applications. 3D models are used in animations in films and video games, for preserving historical and tourist sites, for archaeology, city planning, and to capture buildings or rooftops for repair and solar installation etc., and as inputs to immersive virtual reality (VR) applications. Most prior research work has explored such offline reconstruction [uav_lidar_1, uav_lidar_2, 7139681, 7989530, 8124461, 8628990] and report centimeter to submeter accuracy.
Towards realtime 3D reconstruction. The nascent dronebased reconstruction industry is starting to use the timetoreconstruction as an important market discriminator. Startups promise to deliver models in hours or within a day [company_drone_deploy] after drone data collection. This trend has inspired newer, more timesensitive applications of 3D modeling like (a) postdisaster reconstruction for search and rescue missions [drone_disaster_relief_1, drone_disaster_relief_2, drone_disaster_relief_3, drone_disaster_relief_4, drone_disaster_relief_5], (b) construction site monitoring [company_drone_deploy], (c) mapping mines, celltowers and tunnels [hovermap, Prometheus], (d) documenting interior office spaces [company_kaarta], and (e) perhaps, most compelling, betweenflight modeling of aircraft fuselages or blimps to determine structural integrity [company_austrian_airline].
Though startups are starting to promise smaller reconstruction times [hovermap, Prometheus], they reconstruct 3D models offline i.e., after the drone has landed. Only one company we know of so far promises realtime reconstruction [company_kaarta], employing a humanoperator holding a handheld kit with a stereo camera (instead of a drone) to reconstruct interior spaces.
The Goal. Given this, we want to accurately reconstruct large 3D structures in near realtime, with no humanintervention.
Photogrammetry based reconstruction. Most existing work (§4) uses photogrammetry, which takes as input a sequence of 2D camera images (and the positions associated with these images), captured either using dronemounted cameras [7139681, 7989530, 8124461, 8628990] or by humans using ordinary mobile devices [company_hover]. Photogrammetry infers 3D models using a technique known as multiview stereo [furukawa15:_multi_view_stereo], which combines information from successive images, together with information about local motion, and inferred camera parameters such as the focal length. Photogrammetry approaches estimate depth from 2D images so their reconstruction times are relatively large i.e., from several hours to days [company_drone_deploy] for large structures.
LiDARbased reconstruction. Unlike photogrammetry, LiDARbased reconstruction can more directly infer 3D models, because LiDARs directly provide depth information (unlike cameras). A LiDAR sensor measures distances to surfaces using lasers mounted on a mechanical rotating platform. The lasers and mechanical rotator sit within the enclosure of the LiDAR sensor. With each revolution of the lasers, the sensor returns a point cloud, or a LiDAR frame. The point cloud is a set of 3D data points, each corresponding to a distance measurement of a particular position of the surrounding environment from the LiDAR. For instance, an Ouster OS164 LiDAR has 64 lasers that scan at 20 Hz, with a horizontal and vertical field of view of 360 and 45 respectively.
To reconstruct a structure e.g., a building, it suffices, in theory, to merge points clouds captured from different locations around the building. To understand what it means to merge point clouds, consider two successive clouds and . A point on the surface of the building may appear both in and . However, since the drone has moved, this point appears at different positions (relative to the LiDAR) in the two point clouds. If we precisely transform both point clouds to the same coordinate frame of reference, then the union of points in and constitutes part of the 3D model of the building.
Accurate positioning ensures highquality models. This requires the precise position of the LiDAR at the positions where it captured the point clouds. The accuracy of positioning determines model quality. Two metrics define quality. Accuracy, the average distance between the position of a point in the model and the corresponding point on the surface (see §2 for a more precise definition), clearly relies on accurate positioning. Completeness, the degree to which a model captures the entire structure, also relies on accurate positioning, since positioning errors can lead to gaps in the captured model.
Strawman: Using GPS for positioning. Because drones can carry GPS receivers, GPS can be used to position point clouds. Unfortunately, GPS errors can be several meters in obstructed settings, resulting in poor accuracy and completeness of the 3D model. The left image of Fig. 1 shows a 3D model of a building assembled using a drone flight over a commercial building; the building’s outline is fuzzy, as are the contours of the trees surrounding the building. The right image shows a 3D reconstruction using techniques proposed in this paper, which does not use GPS. (All reconstructions in Fig. 1 use real data captured from a dronemounted LiDAR §3).
Highprecision GNSS/RTK receivers can provide more accurate positioning but require additional infrastructure, are costly, and can perform poorly in urban environments due to non lineofsight signals that degrade accuracy (e.g., prior work [Carloc] reports that RTKenabled receivers can exhibit tens of meters error in downtown environments). Prior work [uav_lidar_1, uav_lidar_2] has used expensive GPS receivers in the context of remote sensing, using specialized unmanned aerial vehicles (UAVs) with longrange LiDAR sensors (but for offline model reconstruction). In contrast, in this paper we consider solutions that employ offtheshelf technologies: drones, and commodity GPS and LiDAR.
An alternative: Using SLAM for positioning. LiDAR SLAM (Simultaneous Localization And Mapping [slam_part1, slam_part2]) algorithms can provide accurate pose (position and orientation) estimates, which can be used to position point clouds. These algorithms use scan or feature matching techniques to align consecutive LiDAR frames to determine the pose of the drone throughout its flight. For example, scan matching uses techniques [icp] to find the closest match for each point in the source point cloud to a point in the reference point cloud . It then estimates the transformation (translation and rotation) that best aligns each point in to its corresponding point in . By repeating this process across consecutive LiDAR frames, SLAM can position each frame in the first frame’s coordinate frame of reference.
Challenges. However, using SLAM on dronemounted LiDAR is challenging for the following reasons (Table 1).
Reconstruction quality is critically dependent on the design of the drone trajectory. Fig. 1 (middle) shows reconstruction using SLAM from a poorly planned drone flight. This reconstruction is visually worse than GPSbased reconstruction (Fig. 1 (left)), because the underlying SLAM algorithm is unable to track LiDAR frames (i.e., it is unable to match points in successive point clouds).
SLAM algorithms accumulate drift [slam_part1, slam_part2], so position estimates can degrade on longer flights.
Drones have limited compute because they can carry limited payloads. For instance, a DJI M600Pro (which we use in this paper) hexacopter has a maximum payload weight of 5 kg. It carries an A3Pro flight controller that contains three IMUs and three GNSS receivers. We have also mounted an LTE radio and an Ouster OS164 LiDAR, as well as a Jetson TX2 board. This compute capability is far from sufficient to run LiDAR SLAM^{1}^{1}1With a 64beam LiDAR, SLAM processes upto 480 Mbps of 3D data. On the TX2, planefitting [ransac], a primitive used in our reconstruction pipeline, takes 0.5 seconds per LiDAR frame (LiDARs generate 20 frames per second) (Fig. A.1) and planefitting accounts for 5% of the execution time of our entire pipeline (§3).
Drones have limited flight endurance. When fully loaded and starting from a full charge, the M600Pro can fly for approximately 25 minutes. This necessitates careful trajectory planning to minimize flight duration.
Challenge  Mechanism 

Model accuracy  Model collection trajectory planning 
Limited duration  Cheap reconnaissance flight 
Limited compute  Offload compressed point clouds 
Drift accumulation  Flight recalibration 
Contributions. This paper presents the design and implementation of a system called dart, which makes the following contributions to address these challenges (Table 1).
dart’s first contribution is the design of model collection trajectories that navigate the competing constraints i.e., accuracy and battery life (§2.1). Model collection uses SLAM, and SLAM error is sensitive to how the LiDAR is mounted, how fast and at what height the drone flies. Faster, higher flights use less of the drone’s battery, but can incur high SLAM error. The converse is true of slower and lower flights. dart finds a sweet spot in this tradeoff space to balance its accuracy goals with drone battery usage. Even so, SLAM can incur drift on longer flights. dart needs to detect excessive drift, and correct for it in realtime. It uses a novel algorithm that tracks consistency between GPS traces and SLAM positions to detect excessive drift, then incorporates a recalibration step to correct for it while the drone is inflight.
Even with dart’s efficient model collection trajectories, scans over large areas can exhaust a drone’s battery. To this end, dart’s second contribution is a robust and efficient geometry extraction algorithm (§2.2) that helps focus model collection only on the structure^{2}^{2}2In this paper, we focus on extracting the boundary of a building and leave generalization to future work to reconstruct. This algorithm, which runs during a reconnaissance flight before model collection, works well even with a fast, high flight that minimizes drone flight duration (and hence battery usage). During the flight, this algorithm extracts the building geometry without constructing the 3D model; it relies on detecting planar surfaces by using the consistency of surface normals across points on a plane, then estimates building height and boundary from the plane forming the rooftop. dart uses this boundary to plan the model collection trajectories described above.
dart’s third contribution is the design of a processing pipeline that (a) offloads computation to a cloud server by compressing point clouds on the drone to the point where they can be transmitted over LTE (§2.3), and (b) that leverages GPU acceleration to process these point clouds in near real time at the LiDAR frame rate and with minimal end to end processing delay (§3).
Experiments (§3) on realworld drone flights and a photorealistic drone simulator (AirSim [airsim]) , demonstrate that dart can achieve 2130 cm reconstruction accuracy even after compressing the raw sensor data by almost two orders of magnitude. Not only is dart faster than an offline approach, it is also more accurate, since the latter cannot recalibrate midflight. An experiment with a complete dart implementation is able to stream compressed point clouds over LTE, reconstruct the 3D model onthefly and deliver an accurate 3D model about 100 ms after flight completion. dart’s choice of trajectories drains the battery least while achieving the accuracy target, and its pipeline can process frames at 20 fps while incurring a processing latency of less than 100 ms. To our knowledge, dart is the first to demonstrate online, cloudbased, autonomous, submeter, 3D model reconstruction in near realtime.
2 DART Design
Because dronebased 3D reconstruction is a complex multidimensional problem (Table 1), we have focused on a geometrically regular, but important, subset of structures for reconstruction: buildings. As this section will make clear, even this choice poses significant challenges. It also brings out the computation and communication issues in 3D reconstruction that are the main focus of this paper. In §2.1, we discuss what it would take to generalize to other, more complex, structures.
Overview. To use dart, a user specifies: a) an area of interest, and b) a minimum target point density. Point density is the number of points per unit area on the surface of a point cloud; this knob controls the quality of the 3D model. In the area of interest, dart guides a drone to automatically discover buildings, and constructs a 3D model of the buildings in near realtime (i.e., during the drone flight) while minimizing flight duration at that given minimum point density. (To a first approximation, drone battery usage increases with flight duration; we have left it to future work to incorporate drone battery models.) dart splits its functionality across two components: (a) a lightweight subsystem that runs on the drone, and (b) a cloudbased component that discovers buildings, generates drone trajectories, and reconstructs the 3D models onthefly.
dart’s cloud component (Fig. 2) generates an efficient reconnaissance trajectory over the area of interest to discover the rooftop geometry of buildings. Extracting the geometry from LiDAR data can be computationally intensive (§1), so dart streams compressed point clouds to a cloud service during flight over a cellular (LTE) connection. The cloud service extracts the geometry, then prepares a more careful model collection trajectory that designs a minimal duration flight while ensuring high 3D model accuracy. During this second flight, the drone also streams compressed point clouds, and the cloud service runs SLAM to estimate point cloud poses, and composes the received point clouds into the building’s 3D model. Midflight, the cloud service may recalibrate the trajectory dynamically to minimize drift accumulation.
Below, we first describe model collection (§2.1), since that is the most challenging of dart’s components. We then describe how dart extracts the rooftop geometry (§2.2), then conclude by describing pointcloud compression (§2.3).
2.1 Model Collection
Given the building geometry (§2.2), dart designs a model collection trajectory to capture the 3D model (§1) of the building.
What constitutes a good 3D model? Prior work on 3D reconstruction [reconstructionmetrics] has proposed two metrics, accuracy and completeness. Consider a 3D model and a corresponding groundtruth . Accuracy is the Root Mean Square Error (RMSE) of the distance from each point in to the nearest point in . Completeness is the RMSE of the distance from each point in to the nearest point in . If both values are zero, perfectly matches . If captures all points in , but the positions of the points are erroneous, then both accuracy and completeness will be nonzero. If captures only one point in , but positions it correctly, its accuracy is perfect, but completeness is poor.
Trajectories, SLAM and 3D reconstruction error. As compared to an autonomous vehicle, a dronemounted LiDAR can only perceive () of the 3D point cloud. This makes scan matching more difficult for SLAM. Thus, the trajectory of the drone flight can impact 3D model completeness and accuracy, in part because a poorly designed trajectory can increase SLAM error. In designing the drone’s trajectory, dart can control the following parameters: the actual path of the drone over the building, its speed, its height, and the orientation of the LiDAR with respect to the ground. We now discuss the qualitative impact of these parameter choices; later (§3), we empirically quantify the best parameter choices.
Orientation impacts accuracy. At a fixed height and speed, a parallel orientation of the LiDAR (Fig. 5) in which its scan plane aligns with the drone’s direction of motion, results in higher overlap between two successive point clouds than with a perpendicular trajectory, therefore, lower SLAM error and better accuracy. Fig. 5, obtained using the methodology described in §3, quantifies this intuition: different orientations have different degrees of overlap, and as overlap decreases, SLAM’s positioning error increases. A parallel orientation (0) has the lowest SLAM error because it has the highest visibility lifespan. (Visibility lifespan, the time for which a point on the building’s surface is visible during flight, is a proxy for overlap; a longer lifespan indicates greater overlap).
Speed impacts model accuracy. If the drone flies fast, two successive point clouds will have fewer overlapping points, resulting in errors in the SLAM’s pose transformations and (therefore) pose estimates (for a reason similar to Fig. 5) which leads to poor 3D model accuracy. So, dart must fly as slow as possible.
Height impacts both accuracy and completeness. Because LiDAR beams are radial, the higher a drone flies, the less dense the points on the surface of the building. Lower density results in worse completeness. Accuracy is also worse, because the likelihood of matching the same point on the surface between two scans decreases with point density. For instance, the positioning errors for point densities of 2.2 points per m and 3.0 points per m are 2.5 m and 1.0 m respectively (graph omitted for brevity). So, dart must fly as low as possible.
The drone’s path must ensure coverage of the buildings rooftop and sides. Consider a narrow and wide building. The drone must fly several times over the building to capture all its surfaces. If it flies low, slowly, and at a parallel orientation, the flight duration can be significant. Over long durations, SLAM accumulates drift, which can worsen model accuracy and completeness.
dart designs equidense trajectories to control model completeness, and uses offline datadriven parameter estimation to find the choice of speed, height and orientation. To minimize drift accumulation, dart performs online drift estimation and recalibration. We describe these below.
Equidense Trajectories. An equidense trajectory ensures that the resulting model is (a) complete, and (b) captures the building with a point density that is no less than the specified minimum target point density .
Point density depends on LiDAR parameters and height. The height (more generally, distance for vertical surfaces like the sides of the building) at which a LIDAR flies from a surface governs the average density of points it obtains from that surface; larger heights result in lower point density.
For a given LiDAR configuration, we can compute the point density as a function of height. For instance, for an Ouster LiDAR with 64 beams, horizontal resolution of 1024, and a vertical field of view of 45, to a first approximation, two consecutive beams are at an angular separation of 0.7 () and lasers from same beam are 0.35 () apart. Using geometry, for a surface at a distance from the drone, we can compute the projection of the LIDAR on that surface. Using this projection, we can compute the point density throughout the whole point cloud. Central regions of the point cloud have much higher density than regions at the extremities.
Coverage depends on height. The density of points at which a drone captures a surface depends on its height . Given a height , Fig. 5 shows the coverage of the LiDAR on a given surface. In general, the drone can only capture a subset of this full coverage region with a minimum target point density (shown by target density region in Fig. 5). Now, suppose the drone’s trajectory performs a rectilinear scan over the surface, like the one shown in Fig. 5(a). Then, to ensure that dart captures the entire surface at least at a density , the scan width must be equal to or smaller than the width of the target density coverage region (Fig. 5).
dart estimates scan width from LiDAR parameters. To estimate the width of the targetdensity coverage region, dart uses the LiDAR parameters, and models LiDAR geometry, to derive a function which returns the scan width for a given target density , and a given height . It models the beams of the LiDAR as having equal angular separation, so it can compute the points at which these beams intersect with the plane at a height (or distance) away. Given a target point density , dart can compute the largest width at this height that will ensure minimum point density .
This density guarantee is nominal; in practice, LiDARs may drop some reflections if they are noisy [Lidarsim]. Future work can model this noise for better equidense trajectory designs.
Trajectory path planning, and parameter selection. dart uses a fixed orientation, height and speed for the drone flight; it uses an offline datadriven approach to determine these.
Offline simulations to estimate parameters. As described above, LiDAR orientation, and drone height, and speed determine how well SLAM can estimate positions to ensure accuracy and completeness. We could have tried to analytically model the system to derive the optimal parameter choices. Modeling the LiDAR is feasible (as we discuss above); modeling SLAM’s feature matching mathematically is much harder. So, we resort to exploring the space of parameters using simulation. Specifically, we use game engine driven photorealistic simulators like AirSim [airsim] to sweep the space of parameters. Then, we validate these results using traces that we capture from our realworld prototype. We discuss this methodology in greater detail in §3, where we show that there exists a sweet spot in the parameter space that ensures high accuracy and completeness while minimizing flight duration. Specifically, §3.4 shows that a parallel orientation, while flying at a distance of 20 m from the surface (or lower, if necessary, to meet the point density constraint) at 1 m/s gives the best accuracy.
A fixed trajectory. Given these parameters, the building geometry and scan width, dart designs a drone flight path. Fig. 6 describes this for a building shaped like a rectangular solid; dart supports other building shapes (§3). dart’s model collection trajectory starts from an origin; this point defines the origin of the coordinate system for the resulting 3D model, then laterally traverses the building to capture the two sides of the building (Fig. 5(a)). Its flight path extends a distance beyond the building edges to account for errors in building geometry estimation. As it moves to each side laterally, it moves up/down on each face to capture them at the same minimum point density. Then, it returns to the origin, and traverses longitudinally (Fig. 5(b)).
Avoiding LiDAR rotations. Why return to the origin?^{3}^{3}3Drones use fiducials (e.g.,
a drone landing pad with a distinctive design) to identify the origin.
This loop closure maneuver is an important technique in SLAM to correct for drift [slam_part1]. If loop closure were not necessary, we could have designed a trajectory as shown in Fig. 5(c). However, this trajectory requires a rotation of the drone at the dashed segment to ensure that the lateral and longitudinal segments have the same drone orientation. Rotation can significantly increase SLAM drift; Fig. 10 shows an example in which the green dashed line depicts the actual (ground truth) drone trajectory, and the blue line SLAM’s estimated pose. At the bottom right corner of the trajectory, when the drone rotates, SLAM is completely thrown off.Drift Estimation and Recalibration. dart uses returntoorigin to recalibrate SLAM drift. The second, longitudinal, flight starts a new SLAM session; to “stitch” the two sessions together, dart needs to compute a transformation matrix that transforms the coordinate system of the first session to that of the second. dart uses standard techniques for this. More important, dart designs the longitudinal flight to start close to the origin, which has two benefits: (a) shorter flight time resulting in lower overall energy consumption and (b) less drift accumulation.
Midflight recalibration for accuracy and flight efficiency. Returntoorigin recalibration might also be necessary in the middle of one of the flights (Fig. 5(d)), if the environment is sparse and SLAM tracking fails. To combat this, dart could have added more loop closure maneuvers in the lateral and longitudinal flights. However, returning to origin is an expensive operation in terms of the drone’s battery. Instead, dart actively monitors drifterror and returns to the origin only when needed. In that case, the flight resumes at the point at which it detected excessive drift: the direct path from the origin to that point is always shorter than the initial segment, ensuring that the resumed flight starts with a lower drift.
Using deviation from GPS trajectory to detect drift. However, detecting excessive drift is nontrivial, since dart has no way of knowing when SLAM’s position estimates are wrong, because it does not have accurate ground truth. dart leverages the GPS readings associated with SLAM poses: each sequence of readings gives a GPS trajectory, and dart attempts to find the best possible match between the GPS trajectory (e.g., the green line in Fig. 10) and the SLAMgenerated trajectory (the blue line in Fig. 10). If there is a significant deviation, dart assumes there is a drift and invokes recalibration.
This approach is robust to GPS errors, since it matches the shape of the two trajectories, not their precise positions (Algorithm 1). Specifically, dart continuously executes 3D SLAM on the stream of compressed LiDAR frames from the drone, and estimates the pose of each frame. It synchronizes the GPS timestamps with the LiDAR timestamps (line 1), then transforms GPS readings using the Mercator projection (line 2). It then aligns the GPS trajectory and the SLAMgenerated trajectory using the Umeyama algorithm [umeyama] to determine the rigid transformation matrices (i.e., translation, and rotation) that best align the SLAM and GPS poses (lines 34). dart partitions trajectories into fixed length segments and after alignment, computes the RMSE between the two trajectories in these segments, and uses these RMSE values as an indicator of excessive drift: if the RMSE is greater than a threshold (lines 512), dart invokes returntoorigin.
Generalizing to other structures. Some aspects of model collection trajectory design depend on the geometry of the structure whose model we seek to reconstruct. To simplify the discussion and because geometric manipulations are not the focus of this paper, we have chosen rectangular buildings. We demonstrate in §3 that our approach generalizes to other regular building geometries. It also generalizes to other structures that can be tightly bounded within rectangular solids, such as aircraft fuselages or blimps (Fig. 10, §3). To accommodate arbitrary solid geometries, we expect that our techniques for generating equidense trajectories, inflight recalibration and our conclusions about orientation, height and speed will apply, but the actual trajectory design (Fig. 6) will need to match the shape of the solid. We leave these extensions to future work.
2.2 Estimating Building Geometry
Given a region of interest, dart conducts a reconnaissance (“recon”) flight to determine the boundary of the building’s roof. It uses this boundary for trajectory design (§2.1).
Goal and requirements. Accurate model reconstruction requires a low, slow, flight which can be batteryintensive. The recon flight helps dart scope its model collection to the part of the region of interest that contains the building to reduce battery usage. For instance, if in a large campus, buildings occupy only a small fraction of the surface area, a fast recon flight can reduce overall drone battery consumption. So, we require that the recon flight should have as short a duration as possible. In addition, (a) boundary estimation must not assume prior existence of a 3D model of the building (prior work in the area makes this assumption [lidarbuildingdetection, lidarimagebuildingdetection]); (b) boundary estimation must be robust to nearby objects like trees that can introduce error; and (c) buildings come in many shapes (e.g., rectangles, squares, hexagons etc.), so boundary estimation must generalize to these.
The Recon Trajectory. Similar to model collection, recon uses a rectilinear scan (Fig. 5(a), Fig. A.3). Unlike model collection, however, during recon the drone flies fast (4 m/s) and high (60 m above the building’s roof^{4}^{4}4We assume the nominal building heights in an area are known, for example, from zoning restrictions.), with the LiDAR mounted in a perpendicular orientation in order to have the shortest duration flight possible. We justify these parameter choices in §3, but Fig. 10 depicts the intuition for these choices. It shows, for an Ouster64 LiDAR, the ground coverage area as a function of height. Coverage is highest between 40 and 60 m. At a given height, a perpendicular orientation covers a wider swathe of ground than a parallel orientation; this allows dart to use a larger scan width (Fig. 10), resulting in a shorter flight. As with model collection, during this flight, dart streams point clouds to its cloud component, which runs the boundary detection algorithms described below.
Challenges and Overview. This flight design poses two challenges for boundary detection. First, to detect the building’s boundary, it is still necessary to align all point clouds to the same coordinate frame of reference. In recon, dart cannot use SLAM because fast, high flights can cause SLAM to lose tracking frequently. We show below that, because boundary detection can afford to be approximate, we can use GPS. Second, high and fast flights result in low point density, and boundary detection algorithms must be robust to this.
dart’s building boundary detection takes as input the area of interest and outputs the GPS locations that constitute the boundary of the building. Model collection uses these outputs (§2.1). Boundary detection runs two different algorithms: rooftop surface extraction, followed by boundary estimation.
Step 1. Surface Extraction. The cloud component receives GPStagged compressed point clouds from the drone. It first uncompresses them, then computes the surface normal of every point in the point cloud. A surface normal for a point determines the direction normal to the surface formed by points within a fixed radius of the point. Then, dart uses RANSAC [ransac]
(a planefitting algorithm) to segment the LiDAR points into groups of points that fall onto planes. RANSAC is fast, but is not robust to outliers: (a) it combines into one surface all LiDAR points that satisfy the same planar equation, including disjoint sets of points (
e.g., from trees) at the same height; (b) point clouds can have multiple detected planes (e.g., building rooftop, ground surface, vehicles etc.), and RANSAC cannot distinguish between these.To address the first issue, in each plane, dart removes outlying points that are further away from neighboring points in the same plane using a statistical outlier filter. Using the statistical outlier on every point cloud can be computeintensive. To this end, dart tunes the statistical outlier’s parameters to find a sweet spot between filtering accuracy and performance. To find the rooftop among the multiple detected planes, dart uses surface normals to compute surface statistics for each plane (e.g.,
plane height, 3D centroid, normal variance
etc.). It uses these statistics to find the rooftop in the extracted planes. (As an aside, surface normal computation is computationally intensive but is parallelizable, so we use a GPU to accelerate this, as discussed in §3). Intuitively, the rooftop is a large, uniformly oriented surface (surface normal variance is low) that lies above the ground plane. dart can eliminate the ground plane as that plane whose points are consistent with the drone’s height. So, it discards all planes that do not satisfy this definition (this includes planes with high variances and the ground surface). At the end of this step, dartclassifies a single plane as the roof surface.To remove the possibility of false positives, dart uses majority voting to remove erroneous surface detections; it classifies a plane as the rooftop only if it detects it in multiple consecutive frames. Lastly, even though the outlier filter removes small sets of outliers in planes, it is unable to remove large clusters of points belonging to objects like trees found nearby the building. For this, dart forms clusters of points based on their spatial relationships such that neighboring points belong to the same cluster. This way, points belonging to different objects form clusters. Since the roof is normally a relatively larger surface, dart simply discards smaller clusters. To do this in near realtime, dart finds the right set of parameters for the clustering algorithm.
Step 2. Estimating the boundary of the building. In the previous step, dart obtains parts of the rooftop from each point cloud. In this step, it uses the drone’s GPS location to transform each surface to the same coordinate frame of reference, then combines all surfaces into a single point cloud that represents the extracted rooftop of the building. To extract the boundary of the building, it extracts the alpha shape [alphashape] of the stitched point cloud. A generalization of a convex hull, an alpha shape is a sequence of piecewise linear curves in 2D encompassing the point cloud representing the rooftop. This allows dart to generalize to nonconvex shapes as well. Finally, to detect the boundary of multiple buildings, dart clusters the rooftop point clouds.
Fig. 10 shows results from the building boundary detection algorithm on real data taken from our drone. The green rectangle is the ground truth boundary of the building. The blue points illustrate the drone’s recon trajectory and the grey points depict locations where dart detects a rooftop. Some grey points are beyond the building’s boundary because the LiDAR has a wide field of view and can see the rooftop even after it has passed it. The red points show the GPS stitched 3D point cloud of the building’s rooftop.
2.3 PointCloud Compression
LiDARs generate voluminous 3D data. For instance, the Ouster OS164 LiDAR (which we use in this paper), generates 20 point clouds per second that add upto 480 Mbps, well beyond the capabilities of even future cellular standards. dart compresses these point clouds to a few Mbps (1.2 to 4.0), using two techniques: viewpoint filtering, and octree compression. We describe these in §A.2.
3 Evaluation
We evaluate (a) dart’s ability to reconstruct 3D models in near realtime, and (b) the accuracy of these 3D models ( §2.1). We also describe our parameter sensitivity analyses for data collection, and evaluate boundary detection performance.
Implementation. Not counting external libraries and packages, dart is 15,500 lines of code (discussion in §A.3).
Simulations. We evaluate dart using a photorealistic simulator, AirSim [airsim], that models realistic physical environments using a game engine, then simulates drone flights over these environments and records sensor readings taken from the perspective of the drone. AirSim is widely accepted as a leading simulation platform for autonomous vehicles and drones by manufacturers, academia and leading industrial players. AirSim has a parametrizable model for a LiDAR; we used the parameters for the Ouster OS164 in our simulation experiments. dart generates trajectories for the AirSim drone, then records the data generated by the LiDAR, and processes it to obtain the 3D model. For computing the metrics above, we obtain ground truth from AirSim. To build the ground truth 3D model, we flew a drone equipped with a LiDAR several times over the region of interest in AirSim (using exhaustive flights) and then stitched all the resulting point clouds using ground truth positioning information from AirSim.
Realworld Traces. In addition, we have collected data from nearly 30 flights (each of about 25 minutes) on an M600Pro drone with an Ouster OS164 LiDAR on a commercial complex. For almost all experiments, we evaluated dart on both realworld and simulationdriven traces. Simulationdriven traces give us the flexibility to explore the parameter space more (as we show below). However, we use realworld traces to validate all these parameter choices and estimate reconstruction accuracy in practice.
Metrics. In this section, we quantify endtoend latency, 3D model accuracy and completeness (§2.1), and positioning error for a number of experimental scenarios. We also quantify dart’s energyefficiency (using flight duration as a proxy for drone battery usage) and the computational capabilities of its processing pipeline.





Large building (100 m x 50 m x 20 m)  

3500  

0.87  0.35  3900  

0.21  0.24  719  

0.14  0.17  719  
Small building (50 m x 50 m x 20 m)  

3.36  1.30  2400  

0.62  0.43  3300  

0.25  0.14  656  

0.21  0.09  656 







480  2.30  1.30  

3.80  1.60  0.53  

3.80  0.13  0.09 






Blimp  586  0.23  0.03  
Small rect.  656  0.25  0.14  
Starshaped  686  0.37  0.12  
Large rect.  719  0.21  0.24  
Plusshaped  1024  0.31  0.06  
Hshaped  1044  0.34  0.10  
Pentagonal  1361  0.31  0.12 
3.1 3D Model Reconstruction
Experiment setup. To evaluate the endtoend performance of dart in building an accurate 3D model in near realtime, we collected and reconstructed the 3D model of two buildings: a) a large 50m x 100m x 20m (L x W x H) and, b) a small 50m x 50m x 20m building in Airsim. We then compared the reconstruction accuracy of these models with two baseline offline approaches (i.e., approaches that reconstruct the 3D model after the drone lands): a) offline reconstruction^{5}^{5}5We assume both offline approaches know the exact location of the building. Without this, SLAM accumulates significant drift and reconstructions are very poor. with the shortest duration flight (OfflineSDF), b) offline reconstruction with dart’s trajectory planning (OfflineTP). We calculated the accuracy and completeness of the models generated by these approaches by comparing them against ground truth models generated from AirSim. Lower is better for accuracy and completeness. For these experiments, dart uses compressed point clouds with bandwidth requirements that are compatible with LTE speeds today (i.e., upload bandwidth of 3.8 Mbps). dartraw shows accuracy and completeness if dart were to use raw point clouds; we study the effect of compression on dart model reconstruction more in §3.3.
dart builds significantly accurate models in less time. As Table 2 shows, dart achieves lower than 25 cm accuracy and completeness for both buildings and reconstructs the entire buildings in just 1012 minutes (the flight duration). For reconstruction quality, dart does much better than the two baseline approaches for two reasons: a) careful trajectory planning (TP), and b) inflight recalibration. Since OfflineSDF does neither, its accuracy and completeness values are very large. To reconstruct the larger building i.e., 100 x 50 x 20m building, the drone needs to fly more and accumulates significant drift (as compared to the smaller building) and has poor accuracy and completeness (shown by ). OfflineTP does better because it uses dart’s trajectory planning, but still exhibits worse accuracy and completeness than dart because it lacks inflight calibration. This shows the importance of a realtime quality feedback signal for reconstruction and highlights why offline reconstruction is not accurate even with uncompressed traces. Though dart uses compressed point clouds, with inflight recalibration and trajectory planning, dart’s models are upto 3.5x more accurate and complete. If dart were to use raw traces (dartraw) instead, loss of accuracy and completeness is attributable to SLAM. Relative to a raw trace, compression accounts for 47 cm difference in accuracy and completeness. Moreover, dart reconstructs while the drone is inflight whereas the other two baseline approaches do reconstruction offline on uncompressed point clouds, incurring up to 4.6 higher reconstruction time^{6}^{6}6In practice, offline reconstruction will have higher reconstruction times because we did not consider time to upload data to the cloud..
To get a visual feel for the degradation resulting from lower accuracy and completeness, consider Fig. A.6, which shows the groundtruth model, together with the dart reconstructions. With an accuracy of 0.25 m (using 3.8 Mbps upload bandwidth), the model closely matches the ground truth, but the textured building surface on the right shows some small artifacts. These artifacts arise not because of compression, but because of SLAM imperfections (§3.3).
dart generalizes to different building shapes. Our results so far, and the descriptions in §2, have focused on rectangular buildings. dart can accurately reconstruct a variety of building types, as Table 4 shows. For these results, we use dart’s default flight parameters and low compression. Larger buildings (pentagonal, plus, and Hshaped) have larger flight durations partly because of their size and because they require two recalibration steps. Even then, for all buildings, dart achieves tens of centimeter accuracy and completeness.
dart generalizes to other types of structures. To show that dart can reconstruct other types of 3D structures e.g., airplanes, helicopters etc., we modeled a realworld blimp [blimp] (15 m x 60 m x 15 m) in AirSim. dart encloses such structures within an enclosing rectangular solid (Fig. 10, §3.4). In less than 10 minutes (Table 4), dart reconstructed the blimp with an accuracy of 23 cm and completeness of just 3 cm.
High accuracy is possible on realworld traces. Results from our drone flights validate that realworld data can result in comparable performance (Table 3). In these experiments, we reconstructed the 3D model of a realworld 70 m x 40 m x 20 m building. Because we lack a reference ground truth for realworld data, we use the 3D model generated from raw, uncompressed traces. Offline reconstruction using SLAM after the drone lands fails completely for the same reasons mentioned above (i.e., no trajectory planning, and no recalibration). With GPS, it is possible to do inflight reconstruction, however, the accuracy and completeness being 1.60 m and 0.53 m, make such 3D models unusable. With dart, on the other hand, we can build accurate, and complete 3D models whose completeness and accuracy are 9 and 13 cm respectively (topdown view of 3D model in Fig. A.2).






3.80  2.50  1.27  

62.9  65.3  65.3  

15.5  14.6  13.7  

5.05  3.07  3.03  

34.5  30.8  23.9  

117  113  106 
3.2 Performance
Realtime 3D reconstruction over LTE is feasible. To validate that dart can collect a 3D model endtoend in near realtime, we used our implementation to conduct an experiment in which we replayed 15 minutes worth of realworld data on the drone compute (a Jetson TX2). It then compressed and streamed point clouds over an LTE connection, to a 16core AWS VM with 64 GB RAM and a Nvidia T4 GPU. (Our experiment only ran model collection; recon also runs in realtime as discussed below).
To compress point clouds, we used three different levels of compression (low, medium and high), corresponding to the following combinations of octree resolution and point resolution (§2.3): , and (effect of compression is studied in §3.3). In our experiments with our drone, we have found achievable LTE throughput to range from 14 Mbps; we chose these compression levels to correspond to this range. (In §3.3, we discuss how 5G deployments would alter these conclusions).
At all three compression modes, dart was able to stream point clouds in real time (Table 5), and the total endtoend processing time per frame is about 110 ms, of which nearly 65 ms is network latency. Thus, dart builds the 3D model whilst the drone is inflight, adds a frame within 100 ms after receiving it and can make available a complete 3D model of a building in about a 100 ms after receiving the last frame!
dart component  Subcomponent 



Recon phase  3D frame compression  13.0  
3D frame extraction  3.0  
GPU normal estimation  76.0  
RANSAC planefitting  5.0  
Outlier removal  0.2  
Rooftop detection  0.005  
Rooftop extraction  6.0  
Rooftop stitching  3.0  
Total time  100  
Model collection  LiDAR SLAM  37.0  
3D Reconstruction  10.3 
dart supports full frame rate processing. We profiled the execution time of each component of dart on a 15minute realworld trace. Point cloud compression executes on the drone, and other components run on the AWS VM mentioned above. We use the GPU to offload the computation of surface normals for building detection. During recon, point cloud compression takes 13 ms per frame (Table 6). Extracting the building geometry requires 100 ms per frame; with these numbers, we can sustain about 10 fps, so with a 20 fps LiDAR, we process roughly every other frame. Despite this, our building detector is quite accurate (§3.5). During model collection, SLAM requires 37 ms per frame, and 3D reconstruction requires about 10 ms (Table 6). The former uses 8 cores, so we have been able to run these two components in a pipeline to sustain 20 fps. Thus, a moderately provisioned, cloud VM suffices to run dart at full frame rate with an endtoend compute latency of about 100 ms for reconnaissance, and 50 ms for model collection.
3.3 Ablation Studies
In this section, we explore how dart’s techniques contribute to 3D reconstruction quality and performance.
Structure type  Flight dur. (s)  Accuracy (m)  Comp. (m)  








Starshaped  613  686  1.05  0.37  0.39  0.12  
Small rect.  573  656  0.63  0.25  0.40  0.14  
Large rect.  694  719  0.96  0.21  0.39  0.24  
Plusshaped  766  1024  0.51  0.31  0.08  0.06  
Hshaped  866  1044  1.10  0.34  0.27  0.10  
Pentagonal  1062  1361  1.47  0.31  0.42  0.12 
Velocity (m/s)  0.5  1.0  2.0 


Height (m)  30  40  50  40  
Flight duration (s)  Recon  136  136  136    
Model coll.  673  343  72  1520  
Recalib.  476  240  272    
Total time  1285  719  480  1520  
Accuracy (m)  0.43  0.14  0.91  
Completeness (m)  0.34  0.17  0.45 
Recalibration helps reduce error. To show the effect of inflight recalibration, we built online 3D models of the 7 large buildings mentioned above using dart with (w) and without (w/o) recalibration in Airsim. In these experiments, we evaluate flight duration and reconstruction quality at low compression (3.8 Mbps upload bandwidth) using accuracy and completeness metrics. Table 7 shows that, on average, at the expense of only 18% (150 seconds) longer flights, dart improves accuracy by 65% (65 cm) and completeness by 55% (20 cm) with recalibration flights. Larger buildings (plusshaped, Hshaped, and pentagonal) require longer aerial flights which accumulate higher drift. This results in relatively more recalibration flights and hence higher flight duration. Even so, dart is able to reconstruct these buildings accurately, demonstrating the importance of recalibration.
Short flight durations can produce accurate models. dart strives to reduce drone battery depletion in its design by generating short duration flights without sacrificing accuracy and completeness. To show that dart’s defaults of 1 m/s speed and 40 m height represent the best point in this tradeoff space, we compare it to a lower, slower flight (30 m, 0.5 m/s), and a faster, higher flight (50 m, 2 m/s). Table 8 shows that, on the large building the lower, slower flight has a longer trajectory, resulting in more recalibrations. The resulting model has worse accuracy and completeness; recalibration can limit drift error, but not reverse it. A faster, higher flight has a slightly shorter trajectory, but the resulting model’s accuracy is very poor, because there is less overlap between point clouds at higher speeds (§2.1). Finally, Table 8 also shows the benefits of a recon flight: an exhaustive flight that uses the model collection parameters and does not perform recon is 3 longer than dart’s flight (and accumulates significant drift, resulting in poor quality 3D models). Results on the small building are qualitatively similar (omitted for brevity).
dart builds accurate models at low bandwidths. We explore the impact of compression on accuracy and completeness using (a) a synthetic building in AirSim and (b) realworld traces. In addition to the three compression schemes discussed earlier, we compute accuracy and completeness for (a) raw point clouds, (b) viewpoint compression and (c) lossless compression. The first two alternatives provide calibration, while the third alternative explores reconstruction performance under higher bandwidth as would be available, for example, in 5G deployments.
As Table 9 shows, viewpoint filtering achieves a 10 compression throughout. Low compression is an order of magnitude more efficient beyond this. Despite this, dart can achieve high quality reconstruction. For the AirSim building, consider accuracy: the rawpoint cloud has an accuracy of 0.21 m and 0.09 m, which is attributable entirely to SLAM error. Viewpoint filtering does not degrade accuracy since it only omits zero returns. Low compression, with a bandwidth of 3.8 Mbps (easily achievable over LTE and over 100 more compact than the raw LiDAR output) only adds 4 cm and 5 cm to accuracy and completeness (respectively). Medium and high compression have significantly poorer accuracy and completeness. Similar results hold true for the other AirSim building, so we omit for brevity.
Results from our drone flights validate that realworld data of a large building (dimensions in Table 9) can result in comparable performance (Table 9). Since we lack a reference ground truth for realworld data, we use the 3D model generated from raw traces. With realworld traces, we can build accurate, and complete 3D models that are within 913 cm completeness and accuracy for low compression, and about 1623 cm for medium compression, with respect to the uncompressed traces. This suggests that highly compressed point clouds do not significantly impact accuracy and completeness.





Realworld 70 m x 40 m x 20 m large building  

480.0  0.00  0.00  

42.7  0.00  0.00  

7.86  0.06  0.07  

3.80  0.13  0.09  

2.50  0.23  0.16  

1.27  0.28  0.29  
AirSim 50 m x 50 m x 20 m small building  

480.0  0.21  0.09  

42.7  0.21  0.09  

7.86  0.22  0.10  

3.80  0.25  0.14  

2.50  0.66  0.21  

1.27  0.73  0.24 
Higher bandwidths provide centimeterlevel improvements. The emergence of 5G promises larger upload bandwidths. However, as Table 9 illustrates, room for improvement in accuracy and completeness is small. For the AirSim building, the gap between raw point clouds and low compression accuracy (completeness) is only 4 cm (5cm); for the realworld building, it is 7 cm (2cm). Lossless point cloud compression, which requires 7.86 Mbps bandwidth comes within 1 cm of the raw point cloud accuracy and completeness for the AirSim building and within 7 cm for the realworld building.
Lower target density worsens completeness. To demonstrate that users can use the target density tuning knob to obtain less complete models more quickly, we conducted an experiment with dart (with recalibration) at two different densities: 7.5 points per m and 1 point per m. For the former, accuracy and completeness were 0.21 m and 0.14 m, and for the latter 0.68 m, 0.17 m respectively. The lower density flight took 20% less time. As expected, completeness is worse at lower target densities. At the lower density, accuracy is worse because two adjacent scan lines have smaller overlap. Put another way, a side benefit of specifying higher density is the higher accuracy from scan line overlap.
3.4 Data Collection
dart relies on a careful parameter sensitivity analysis (in both simulation and on realworld traces) to determine model collection flight parameters: speed, height, and orientation (§2.1). We have evaluated SLAM error for every combination of drone speed (ranging from 0.5 m/s to 3 m/s), distance from building (10 m to 40 m) and orientation (parallel to perpendicular). We present a subset of these results for space reasons. For these experiments, we use the trajectory described in Fig. 5(c). We report the average numbers for each experiment.
Best choice of orientation is parallel. Fig. 12 plots SLAM error as a function of LiDAR orientation (Fig. 5) with respect to the direction of motion. A parallel orientation has lowest SLAM error (in Fig. 12, yaw 0 corresponds to parallel and yaw 90 to perpendicular), because it has highest overlap between successive frames; as yaw increases, overlap decreases, resulting in higher SLAM error (§2.1).
Best choice of distance is 20 m. Fig. 12 plots the SLAM error as a function of the drone’s distance from the building surface for the parallel orientation of the LiDAR. Error increases slowly with height; beyond a 20 m distance from the building, the error is more than 1 m. Point densities decrease with height and affect SLAM’s ability to track features/points across frames (§2.1). Rather than fly lower, dart operates at a 20 m distance (or a 40 m height, since in our experiments buildings are 20 m tall) to reduce flight duration.
Best choice of speed is 1 m/s. Speed impacts SLAM positioning error significantly (Fig. 12). Beyond 1 m/s, SLAM cannot track frames accurately because of lower overlap between frames (§2.1). Below 1 m/s i.e., at 0.5 m/s, the flight duration (in seconds) is twice that of 1 m/s which results in drift error accumulation. To achieve accurate reconstruction, dart chooses to fly the drone at 1 m/s.
3.5 Boundary Detection
Methodology and metrics. We use two metrics for building boundary estimation: accuracy, and completeness. Accuracy is the average (2D) distance between each point (quantized to 0.1 m) on the predicted boundary and the nearest point on the actual building boundary. Completeness, is the average distance between each point on the actual boundary and the nearest point on dart’s predicted boundary. Lower values of accuracy and completeness are better.
We use both realworld traces collected from our dart prototype and synthetic traces from AirSim. To compute ground truth for realworld traces, we pinpointed the building’s boundary on Google Maps [google_maps]. For AirSim, we collected the ground truth from the Unreal engine.
Boundary detection can run at full frame rate. Table 6 shows the time taken for each component of boundary detection, on our realworld traces on a single core of a GPUequipped desktop. The average processing time per point cloud is 100 ms, dominated by GPUaccelerated surface normal estimation (76 ms). This can sustain 10 fps. However, our LiDAR generates 20 fps, so dart uses every other frame, without sacrificing accuracy.
Boundary detection is accurate. To evaluate the accuracy of dart’s boundary extraction, we experimented with 3 realworld traces collected over a 70 m x 60 m x 20 m building. For these traces, dart’s average accuracy is 1.42 m and its completeness is 1.25 m, even at the highest compression and when it samples every other frames.
Other results. We extensively evaluated dart’s boundary detection algorithm’s robustness to different building shapes (Table A.1), point cloud compression (Table A.2), and point cloud subsampling. Furthermore, we performed an extensive parameter study to find the right flight parameters i.e., speed (Fig. A.5), height (Fig. A.4) and orientation. For brevity, we have included results and discussions in the appendix (§A.4). We summarize two results. First, recon flights can be short (boundary detection is insensitive to point density and overlap). So, it can use perpendicular orientation, fly at 60 m from the building at 4 m/s. Second, it tolerates subsampling upto one point cloud per second.
4 Related Work
Networked 3D sensing and drone positioning. Some recent work has explored, in the context of cooperative perception [AVR] and realtime 3D map update [CarMap], transmitting 3D sensor information over wireless networks. Compared to dart, they use different techniques to overcome wireless capacity constraints. Robotics literature has studied efficient coverage pathplanning for single [sensorplanning], and multiple drones [ubanc]. dart’s trajectory design is influenced by more intricate constraints like SLAM accuracy and equidensity goals. Accurately inferring drone motion is important for SLAMbased positioning [observability]. Cartographer [Cartographer], which dart uses for positioning, utilizes motion models and onboard IMU’s for estimating motion. In future work, dart can use drone orchestration systems [beecluster], for larger campusscale reconstruction with multiple drones.
Offline reconstruction using images. UAV photogrammetry [federman2017] reconstructs 3D models offline from 2D photographs. Several pieces of work [7139681, 7989530, 8124461, 8628990] study the use of cameras (either RGB or RGBD) on UAVs for 3D reconstruction. Prior work [7139681] has proposed a realtime, interactive interface into the reconstruction process for a human guide. The most relevant of these [mostegel2016uav, 7422384] predicts the completeness of 3D reconstruction inflight, using a quality confidence predictor trained offline, for a better offline 3D reconstruction. However, unlike dart, this work requires human intervention, computes the 3D model offline, requires closeup flights, cannot ensure equidense reconstructions, cannot dynamically recalibrate for drift and is not an endtoend system. A body of work has explored factors affecting reconstruction accuracy: sensor error [6899451], tracking drift, and the degree of image overlap [7139681, LIENARD2016264]. Other work [8793729, 8628990, bylow2019combining] has explored techniques to reduce errors by fusing with depth information, or using image manipulations such as upscaling. Unlike dart, almost all of this work reconstructs the 3D model offline.
Offline reconstruction using LiDAR. 3D model reconstruction using LiDAR [uav_lidar_1, uav_lidar_2] relies on additional positioning infrastructure such as base stations for realtime kinematic (RTK) positioning, and longrange specialized LiDAR to achieve tens of centimeters model accuracy. dart explores a different part of the design space: online reconstruction with submeter accuracy using commodity drones, GPS and LiDAR. More recent work has explored dronemounted LiDAR based offline reconstruction of tunnels and mines, but require specialized LiDARs and a humanintheloop [Prometheus, hovermap] for drone guidance (either manually or by defining a set of waypoints).
Rooftop boundary detection. Prior work has used infrared sensors, RGBD cameras [rgbfasterboundarydetection] and a fusion of LiDAR [lidarbuildingdetection] with monocular cameras [lidarimagebuildingdetection, lidarorthophotoboundarydetection]. These assume a preexisting stitched 3D point cloud [lidarbuildingdetection] or orthophoto [lidarorthophotoboundarydetection] and are not designed to operate in realtime. dart’s boundary detection accuracy is comparable to these pieces of work, even though it does not rely on these assumptions.
5 Conclusions
In this paper, we have taken a step towards accurate, nearreal time 3D reconstruction using drones. Our system, dart, uses novel techniques for navigating the tension between cellular bandwidths, SLAM positioning errors, and compute constraints on the drone. It contains algorithms for estimating building geometry, for determining excessive SLAM drift, and for recovering from excessive drift. It can achieve reconstruction accuracy to within 10s of centimeters in near realtime, even after compressing LiDAR data enough to fit within achievable LTE speeds. Future work can include using more sophisticated drone battery models, cooperative reconstruction of large campuses using multiple drones, and generalizing further to structures of arbitrary shape.
References
A Appendix
a.1 Drone compute
We ran a planefitting algorithm, RANSAC, (a module that we use in our pipeline) on a realworld point cloud trace using drone compute platform (Jetson TX2). We found that (Fig. A.1) it takes the TX2, on average, 0.5 seconds to process a single point cloud. The 64beam LiDAR generates 20 point clouds per second whereas planefitting accounts for only 5% of the entire execution time of our reconstruction pipeline. Thus, the TX2 will take 200 seconds to process a single second’s worth of data from the 64beam LiDAR if we ran it at 20 frames per second. To this end, we offload computations from the drone to the cloud.
a.2 Point cloud compression
dart uses two techniques (i.e., viewpoint filtering and Octree compression) to compress LiDAR point clouds to within 1.2 to 4.0 Mbps and transmit them over LTE.
Viewpoint filtering. The OS164 LiDAR has a 360 horizontal fieldofview (FoV) and a 45 vertical FoV. In a dronemounted LiDAR (Fig. 5), only a portion of the full 360 contains useful information. Beams directed towards the sky, or towards objects beyond LiDAR range, generate zero returns. Viewpoint filtering removes these, and also removes returns from the body of the drone. To compress point clouds, dart simply removes zero returns. In practice, we have found it to be important to also filter out returns from the drone itself, and also returns further away from the nominal range of the LiDAR, since these are erroneous. So, dart filters all points closer than 5 m and further than 120 m.
Octree compression. After filtering the point cloud, dart compresses the retained data using a standard octree compression algorithm [octree] designed specifically for point clouds (and hence this is better than dataagnostic compression techniques like gzip). An octree is a threedimensional tree data structure where each node is a cube that spans a 3D region, and has exactly eight children. The dimensions of the cubes at the leaves of the tree determine the octree resolution. The numerical precision used to encode point positions determines the point resolution. Octree compression efficiently encodes empty leaves or empty treeinternal nodes (those whose descendant leaves are empty). It also performs interframe compression (similar to video encoders), efficiently encoding unchanged leaves or internal nodes between two successive point clouds. As we show in §3, we can parameterize octree compression to achieve pointcloud transmission rates of 1.24 Mbps. dart chooses different values of octree resolution and point resolution, two parameters that govern the compressibility of point clouds, to achieve pointcloud transmission rates of 1.2–4 Mbps (§3), well within the range of achievable LTE speeds.
a.3 Implementation Details.
We have implemented dart using the Point Cloud Library (PCL [octree]), the Cartographer [Cartographer] LiDAR SLAM implementation^{7}^{7}7We use Cartographer but it can be replaced by other LiDAR SLAM algorithms like LOAM [zhang2014loam], the Boost C++ libraries [Boost], and the Robotic Operating System (ROS [ros]). For the recon phase, we used functions from the Point Cloud Library (PCL [octree]) for planefitting, outlier removal and clustering. Our compression and extraction modules also use PCL and are implemented as ROS nodes. The drift detection module uses a Python package for the Umeyama alignment [grupp2017evo]. Not counting libraries and packages it uses, dart is 15,500 lines of code.
a.4 Recon Flight
The goal of the recon flight is to survey the area and find the boundary of the structure as fast as possible. dart uses a flight trajectory as shown in Fig. A.3 in which parallel scans of length are separated by a scan width . In designing the recon flight, dart can change the height, speed and LiDAR orientation of the drone. To find the right set of parameters, we performed an exhaustive parameter sweep.
Optimum height for recon. To find the optimum height for the recon flight, we planned recon trajectories for a 20 m building (within a 300 m x 300 m area) in AirSim at different heights (from 40 m to 90 m). We flew the drone and ran the boundary estimation on the collected highly compressed LiDAR point clouds at 10 Hz. For each height, we collected data and ran the boundary detection module five times. Higher flights increase scan width ( Fig. 10) at the expense of point density. However, dart’s boundary detection algorithm is robust to lower density point clouds (up till 80 m) and can accurately estimate the boundary of the building from a height of upto 80 m. Fig. A.4 shows the 2D boundary detection accuracy, completeness (lower is good) and flight duration (as a proxy for battery usage) as a function of the height of the drone. We find that at 80 m (or 60 m from the building), dart can jointly optimize for battery efficiency and boundary detection accuracy. At 80 m, dart can complete the recon flight in 150 seconds and estimate the boundary to within 2.5 m accuracy and completeness. Beyond 80 m, the scan width and point density decrease. This results in longer flights and higher boundary detection accuracy and completeness.
Optimum speed for recon. To find the optimum speed for the recon flight, we planned a recon trajectory for the drone to fly over the same 20 m building at a height of 80 m from the ground. We flew the drone in the planned trajectory at speeds from 1 m/s to 8 m/s and ran boundary detection on the highly compressed point clouds at 10 Hz. For each speed, we collected data and ran the boundary detection module five times. Fig. A.5 illustrates the effect of drone speed on the boundary detection accuracy, completeness and the flight duration. A higher speed results in lower flight duration but at the expense of boundary detection accuracy and completeness. Even then, dart robustly extracts the boundary up till 6 m/s. At higher speeds, the overlap between consecutive frames is smaller and hence dart cannot accurately stitch the frames together. As such, dart flies the drone at the sweet spot i.e., 4 m/s where the flight duration is approximately 150 seconds and accuracy and completeness are 2.5 m.
Optimum LiDAR orientation. LiDAR orientation controls scan width and point cloud overlap. A parallel orientation means larger overlap but small scan width . On the other hand, a perpendicular orientation means smaller overlap but larger scan width . Larger scan width means a smaller flight duration (Fig. A.3). A large overlap means better scan matching accuracy. Since dart uses GPS for stitching in the recon phase, so it is robust to the overlap. Hence, to minimize flight duration, it uses a perpendicular orientation of the LiDAR. We conducted experiments (omitted for brevity) without different orientations of the LiDAR and confirmed that a perpendicular orientation minimzes flight duration without any loss in accuracy/completeness.
Boundary extraction for different buildings. To show that dart can accurately extract the 2D boundary of any building, we collected LiDAR traces of a drone flying over five different buildings in Airsim at a height of 80 m and speed of 4 m/s. We collected data over each building five times. Then, we ran boundary detection on the highly compressed point clouds at 10 Hz. We summarize the boundary detection accuracy, completeness and the flight duration in Table A.1. As expected, the flight duration for all buildings is independent of the underlying building. For all building types, dart can accurately extract the boundary of all buildings within 2.5 m accuracy and completeness. This shows that dart’s boundary detection is scalable to all building shapes.






Starshaped  150  1.39  1.67  
Hshaped  150  1.31  1.83  
Plusshaped  150  1.35  1.55  
Pentagon  150  2.58  2.58  
Rectangular  150  2.50  2.53 
Effect of point cloud compression. To evaluate the effect of point cloud compression on boundary extraction, we compressed a realworld over the 70 m x 40 m x 20 m building with the four different compression profiles described above. Then, we ran our boundary extraction algorithm on the compressed traces. Table A.2 shows that dart’s boundary extraction algorithm is robust to compression. While bringing down bandwidth by a factor of 377, for high compression, dart only trades off 36 cm in accuracy and 24 cm in completeness. With higher bandwidths promised with the emergence of 5G, dart can achieve the same boundary extraction accuracy as an uncompressed trace.






Uncompressed  480.0  1.09  1.09  
Viewpoint  42.7  1.09  1.09  
Lossless  7.86  1.09  1.09  
Low  3.80  1.09  1.10  
Medium  2.50  1.13  1.07  
High  1.27  1.45  1.33 
Effect of subsampling. dart’s boundary detection algorithm runs at 10 fps. A Ouster64 beam LiDAR generates 20 point clouds per second. So, the boundary detection algorithm must be robust to subsampling of point clouds. Our evaluations show that, for a drone traveling at 4 m/s, it works well even when using one point cloud every 3 seconds. Because dart’s boundary detection uses GPS for stitching, it does not need overlap between 3D frames.
a.5 Data Collection
In this section, we perform a parameter sensitivity study to find the optimum parameters for running SLAM accurately on realworld UAV flights. To do this, we report positioning error generated by SLAM. For the lack of accurate ground truth in the realworld, we compare SLAM positions against a GPS trace. Positioning accuracy is directly related to 3D model RMSE because these poses are used to position 3D point cloud in generating a 3D model. A higher positioning error leads to a higher reconstruction error and viceversa.
Effect of drone speed. Because GPS is erroneous, we only draw qualitative conclusions. As Table A.3, taken from our drone traces, shows, slower flights have lower SLAM error than faster one, and parallel orientations have lower SLAM error than perpendicular.
Effect of drone height. Similarly, SLAM error increases with height and, in realworld traces, the parallel orientation seems to be significantly better than the perpendicular orientation (Table A.4). At a distance of 20 m from the surface of the building, the parallel orientation has the minimum positioning error i.e., 1.25 m. Beyond 20 m for parallel and 40 m for perpendicular, SLAM loses track completely because of lower point density.
LiDAR Orientation  Drone model collection speed (m/s)  

1.5 m/s  3.0 m/s  
Parallel  1.25  3.33 
Perpendicular  3.12  7.64 
LiDAR orientation  Drone model collection height from building (m)  

20  40  60  
Parallel  1.25  5.41  
Perpendicular  2.18 