Most autonomous micro aerial vehicles (MAVs) to date rely on depth sensing through e.g. laser scanners, RGB-D or stereo cameras. Payload and power capacity are, however, limiting factors for MAVs, such that sensing principles are desirable that require as little size, weight, and power-consumption as possible.
In recent work, we propose large-scale direct simultaneous localization and mapping (LSD-SLAM ) with handheld monocular cameras in real-time. This method tracks the motion of the camera towards reference keyframes and at the same time estimates semi-dense depth at high gradient pixels in the keyframe. By this, it avoids strong regularity assumptions such as planarity in textureless areas. In this paper, we demonstrate how this method can be used for obstacle-avoiding autonomous navigation and exploration for a consumer-grade MAV. We integrate our approach on the recently introduced Parrot Bebop MAV, which comes with a 30 fps high-resolution fisheye video camera and integrated attitude sensing and control.
Our proposed two-step exploration strategy is specifically and directly suited for semi-dense reconstructions as obtained with LSD-SLAM. A simple but effective local exploration strategy, coined star discovery, safely discovers free and occupied space in a local surrounding of a specific position in the environment. In contrast to existing autonomous exploration approaches, our method takes the semi-dense depth measurement principle of LSD-SLAM based on motion parallax into account. A global exploration strategy then determines interesting volume for further local explorations in order to sequentially discover novel parts of the environment. We demonstrate the properties of our exploration strategy in several experiments with the Parrot Bebop.
Ii Related Work
Autonomous exploration by mobile robots has been investigated over many years, mainly relying on laser scanner sensors. Yamauchi  proposed in his seminal work the so-called frontier-based exploration strategy that favors exploring the frontiers of the unexplored space in the map. Some methods define a utility function [3, 4], e.g., on paths or view poses that, for instance, trade-off discovered area with travel costs. The approaches in [5, 6, 7] combine the probabilistic measure of information gain with travel cost in a measure of utility. Rekleitis et al.  optimize a utility function that favors the reduction of uncertainty in the map, and at the same time tries to achieve a fast exploration of the map. All of the above methods rely on a dense map representation of the environment which is acquired using 2D and 3D laser range sensors. In our case, these exploration methods are not directly applicable. The exploration process needs to additionally consider how dense map information can be obtained from the visual semi-dense depth measurements of our SLAM system.
Very recently, autonomous exploration by flying robots has attracted attention [9, 10, 11, 12]. Nuske et al.  explore rivers using an MAV equipped with a continuously rotating 3D laser scanner. They propose a multi-criteria exploration strategy to select goal points and traversal paths. Heng et al.  propose a two-step approach to visual exploration with MAVs using depth cameras. Efficient exploration is achieved through maximizing information gain in a 3D occupancy map. At the same time, high coverage of the viewed surfaces is determined along the path to the exploration pose. In order to avoid building up a dense 3D map of the environment and applying standard exploration methods, Shen et al.  propose a particle-based frontier method that represents known and unknown space through samples. This approach also relies on depth sensing through a 2D laser scanner and a depth camera. Yoder and Scherer  explore the frontiers of surfaces measured with a 3D laser scanner. In  a 3D occupancy map of the environment is acquired using an on-board depth sensor. Next best views for exploration are selected by growing a random tree path planner in the free-space of the current map and choosing a branch to explore that maximizes the amount of unmapped space uncovered on the path. Also these approaches use dense range or depth sensors which allow for adapting existing exploration methods from mobile robotics research. Desaraju et al.  use a monocular camera and a dense motion stereo approach to find suitable landing sites of a UAV.
We propose an exploration method which is suitable for lightweight, low-cost monocular cameras. Our visual navigation method is based on large-scale direct SLAM which recovers semi-dense reconstructions. We take special care of the semi-dense information and its measurement process for obstacle mapping and exploration.
Iii Autonomous Quadrocopter Navigation Using Monocular Lsd-Slam
We build on the TUM ARDrone package by Engel et al.  which has been originally developed for the Parrot ARDrone 2.0. We transferred the software to the Parrot Bebop platform which comes with similar sensory equipment and onboard control. The Parrot Bebop is equipped with an IMU built from 3-axis magnetometer, gyroscope, and accelerometer. It measures height using an ultrasonic sensor, an air pressure sensor and a vertical camera, similar to the Parrot ARDrone 2.0. The MAV is equipped with a fisheye camera with wide 186 field-of-view. The camera provides images at 30 frames per second. A horizontally stabilized region-of-interest is automatically extracted in software on the main processing unit of the MAV, and can be transmitted via wireless communication with the attitude measurements.
Iii-1 State Estimation and Control
The visual navigation system proposed in 
integrates visual motion estimates from a monocular SLAM system with the attitude measurements from the MAV. It filters both kinds of messages using a loosely-coupled Extended Kalman filtering (EKF) approach. Since the attitude measurements and control commands are transmitted via wireless communication, they are affected by a time delay that needs to be compensated using the EKF framework. Waypoint control of the MAV is achieved using PID control based on the EKF state estimate. In monocular SLAM, the metric scale of motion and reconstruction estimates are not observable. We probabilistically fuse ultrasonic and air pressure measurements and adapt the scale of the SLAM motion estimate to the observed metric scale.
Iii-2 Vision-Based Navigation Using Monocular LSD-SLAM
LSD-SLAM  is a keyframe based SLAM approach. It maintains and optimizes the view poses of a subset of images, i.e. keyframes, extracted along the camera trajectory. In order to estimate the camera trajectory, it tracks camera motion towards a reference keyframe through direct image alignment. This requires depth in either of the images, which we estimate from stereo correspondences between the two images within the reference keyframe. The poses of the keyframes are made globally consistent by mutual direct image alignment and pose graph optimization.
A key feature of LSD-SLAM is the ability to close trajectory loops within the keyframe graph. In such an event, the view poses of the keyframes are readjusted to compensate for the drift that is accumulated through tracking along the loop. This especially changes the pose of the current reference keyframe that is used for tracking, also inducing a change in the tracked motion estimate. Yet, the tracked motion estimate is used to update the EKF that estimates the MAV state which is fed into the control loop. At a loop closure, this visual motion estimate would update the filter with large erroneous velocities which would induce significant errors in the state estimate. In turn this could cause severe failures in flight control. We therefore compensate for the changes induced by loop-closures with an additional pose offset on the visual estimate before feeding it into the EKF.
In order to initialize the system, the MAV performs a look-around maneuver in the beginning by flying a 360 turn on the spot while hovering up and down by several centimeters. In this way, the MAV already obtains an initial keyframe map with a closed trajectory loop (Fig. 6).
Iv Autonomous Obstacle-Free Exploration With Semi-Dense Depth Maps
Autonomous exploration has been a research topic for many years targeting exploration of both 2D and 3D environments. In most 3D scenarios an exploration strategy works with a volumetric representation of the environment, such as a voxel grid or an octree, and uses laser-scanners or RGB-D cameras as sensors to build such a representation.
In this paper we devise an exploration strategy that builds on a fundamentally different type of sensor data – semi-dense depth maps estimated with a single moving monocular camera. The difference to previously mentioned sensors lies in the fact that only for the image areas with strong gradients the depth can be estimated. This means that especially initially during exploration, large portions of the map will remain unknown. The exploration strategy has to account for the motion parallax measurement principle of LSD-SLAM.
Iv-a Occupancy Mapping with Semi-Dense Depth Maps
In this work we use OctoMap  that provides an efficient implementation of hierarchical 3D occupancy mapping in octrees. We directly use the semi-dense depth maps reconstructed with LSD-SLAM to create the 3D occupancy map. All keyframes are traversed and the measured depths are integrated via ray-casting using the camera model.
Since LSD-SLAM performs loop closures, the poses at which the depth maps of keyframes have been integrated into the map may change and the map will become outdated. We therefore periodically regenerate the map using the updated keyframe poses. While this operation may last for several seconds, the MAV hovers on the spot and waits until proceeding with the exploration.
Each voxel in the occupancy map stores the probability of being occupied in log-odds form. In order to determine if a voxel is free or occupied, a threshold is applied on the occupancy probability (0.86 in our experiments). During the integration of a depth measurement, all voxels along the ray in front of the measurement are updated with a probability value for missing voxels and measuring free-space. The voxel at the depth measurement in turn is updated with a hit probability value. Note that LSD-SLAM outputs not only the computed depths but also the variance of this estimate. Although measurements with a high variance can be very noisy, they still contain information about the vicinity of the sensor. Therefore we insert only free space on a reduced distance for these pixels which assures that no wrong voxels are added. Fig.10(a) shows an example occupancy map.
Iv-B Optimal Motion for Exploration and Mapping
By using semi-dense reconstructions, we do not make strong assumptions such as planarity on the properties of the environment in textureless areas. On the other hand, the use of semi-dense reconstruction in visual navigation leads to indentations of unknown volume which occur between the rays of free-space measured towards depth readings (Fig. 2). As we will analyze next, these indentations can be removed through lateral motion towards the measurable structures – an important property that we will exploit in our exploration strategy.
Figure 3 illustrates the problem of finding the direction of motion with such that it maximizes the observed free space in a 2D setting (without loss of generality). Assuming the camera center at the origin of the coordinate frame, the volume that is observed free in front of the measured point is cut by the triangle formed by the motion of the camera and the measured point
. The magnitude of the vector
equals the observed free area, where
is a skew-symmetric matrix formed fromsuch that its product with corresponds to the cross-product between the two vectors.
To find the optimal direction we maximize (two times) the sum of squared areas of the triangles formed by all observed points,
Since we want to determine the optimal motion direction independent from its magnitude, we optimize the direction subject to a normalization constraint.
This constrained optimization problem can be solved using Lagrange multipliers,
so that the optimal solution for should satisfy the equations
This implies that
should be a (unit) eigenvector of the matrix
. Moreover, the vector that corresponds to the largest eigenvalue produces the largest observation of the free space.
We perform Monte-Carlo simulations to further analyze the optimal motion direction. Without any prior knowledge about the environment structure, we assume a uniform distribution of depths in the pixels. We sample 600 pointsaccording to the following distribution: , where are the image coordinates, is the distance, and denotes uniform distribution. The points are reprojected into 3D space using the camera model and used for computing . Statistics for the eigenvalues and eigenvectors for 100 random simulations is accumulated in Table I. It demonstrates that the optimal direction for increasing the observed free space is a motion parallel to the image plane, i.e. sidewards or up-down motion.
Iv-C Obstacle-Free Local Exploration through Star Discoveries
The characteristics of our semi-dense SLAM approach prevent the direct application of existing exploration approaches such as next-best view-planning or frontier-based exploration. Frontiers of known space (occupied or measured free) occur at indentations of unknown space as well as between measured edges and textures on flat walls. Simply flying to those boundaries would not allow discerning unknown from free or occupied space at the textureless boundaries as monocular SLAM requires motion parallax towards measurable structures and depth is only measured along the line-of-sight of semi-dense depth readings. Next-best view planning aims at choosing a new view that maximizes the discovered unknown space at the new view pose. Since measuring depth requires motion in our monocular SLAM system, it could be extended to measure the discovered space along the path to the new view point. This procedure would be computationally very expensive, since for each potential view pose many ray-casting operations would need to be performed. We propose a simpler but effective local exploration strategy that we call star discovery, which discovers the indentations in the unknown volume around a specific position in the map.
In star discovery, the MAV flies a star-shape pattern (Fig. 4). In order to generate motion parallax for LSD-SLAM and to discover indentations in the unknown volume, the MAV flies with a 90 heading towards the motion direction. Clearly, the MAV can only fly as far as the occupancy map already contains explored free-space.
The star-shape pattern is generated as follows: We cast rays from a specific position in the map at a predefined angular interval in order to determine the farest possible flight position along the ray. The traversability of a voxel is determined by inflation of the occupied voxels by the size of the MAV. In order to increase the success of the discovery, we perform this computation at different heights and choose the result of maximum size.
Only if the star discovery is fully executed, we redetermine the occupancy map from the updated LSD-SLAM keyframe map. This also enables to postpone loop-closure updates towards the end of the exploration process, and provides a full 360 view from the center position of the star discovery.
Our exploration strategy is also favorable for the tracking performance of LSD-SLAM. For instance, flying an outward facing ellipse of maximum size instead it could easily loose track because the MAV will only see few or no gradients when it flies close to an obstacle while facing it.
Iv-D Global Exploration
Obviously, a single star discovery from one spot is not sufficient to explore arbitrarily shaped environments, as only positions on the direct line-of-sight from the origin can be reached (Fig. 5). This induces a natural definition of interesting origins for subsequent star discoveries. We denote a voxel interesting if it is known to be free but not in line-of-sight of any previous origin of star discovery.
We determine the interesting voxels for starting a new star discovery as follows: For every previously visited origin of a star discovery, we mark all free voxels in direct line-of-sight as visited. Then all free voxels in the inflated map are traversed and the ones that have not been marked are set to interesting. With being the number of star discovery origins, the whole algorithm runs in , where and are the number of voxels inflated in the horizontal and vertical directions. We define as the number of voxels along the longest direction of the bounding box of the occupancy map.
Afterwards, we search a path in the occupancy map to one of the interesting voxels. We look at several random voxels within the largest connected component of interesting voxels and choose the one from which we can execute the largest star discovery afterwards.
As discussed above, frontier-based exploration would not be suitable with our monocular SLAM system, as the frontiers could be located on non-observable occupied structures such as textureless walls. In order to discover these structures, we propose star-shaped local exploration moves. Our global exploration strategy determines new origins for these moves where freespace has been measured behind semi-dense structures that are not on the direct line-of-sight from the previous star discovery origin.
We evaluate our approach on a Parrot Bebop MAV in two differently sized and furnished rooms (a lab and a seminar room). We recommend viewing the accompanying video of the experiments at https://youtu.be/fWBsDwBJD-g.
V-a Experiment Setup
We transmit the live image stream of the horizontally stabilized images of the Bebop to a control station PC via wireless communication. The images are then processed on the control station PC to implement vision-based navigation and exploration based on LSD-SLAM. All experiments were executed completely autonomous.
We report results of two experiments. The first experiment has been conducted in a lab room and demonstrates our star discovery exploration strategy. In this simpler setting, we neglected depth measurements with high variance estimates and plan the star discovery only in a single height. The second experiment evaluates local (star discovery) and global exploration strategies in a larger seminar room. We enhanced the system in several ways towards the first experiment to cope with the larger environment. We use depth measurements with high variance estimates as free space measurements for occupancy mapping as described in Sec. IV-A instead of neglecting them. In order to increase the possible coverage of the star discovery it is computed on several different heights and the one with the largest travel distance is used. When computing interesting voxels we as well use multiple heights for the center points. The robustness of the star discovery was improved by slightly reducing the maximum distance to the origin and by sending intermediate goals to the PID-controller. Finally, we start LSD-SLAM only right after takeoff, we improved the accuracy of the scale estimation and we readjusted the parameters of the PID-controller, the autopilot and the look-around maneuver.
V-B Qualitative Evaluation
V-B1 Star Discovery
In the first experiment, we demonstrate autonomous local exploration using our star discovery strategy in our lab room. There was no manual interaction except triggering the discovery and the landing at the end. At first, the MAV performs a look-around maneuver. In Fig. 6 one can see the semi-dense reconstruction of the room obtained with LSD-SLAM. Based on a 3D occupancy map, a star discovery is planned (Fig. 9). In this case, we used three voxels in horizontal direction and one voxel in vertical direction to inflate the map.
V-B2 Full Exploration Strategy
||look-around||star discovery||look-around||star discovery||new origin|
|mark voxels in sight||-||-||-||4.93s||4.52/6.63s|
|way to new origin||-||-||-||0.024s||0.065s|
|#voxels in bounding box||195048||211968||449565||728416||1312492|
|#free #bounding box||0.18||0.22||0.17||0.15||0.12|
In the second experiment, we demonstrate a star discovery with subsequent repositioning at an interesting voxel in a larger seminar room. First, the MAV took off, initialized the scale, and performed a look-around maneuver. Afterwards, the MAV executed a star discovery. Fig. 9(b) shows the planned discovery motion and the flown trajectory estimated with LSD-SLAM. We explain the differences by LSD-SLAM pose graph updates.
After the star discovery, we obtain the maps and interesting voxels in Fig. 11 and Fig. 12. The largest connected component found by our algorithm is the one outside the room. The MAV planned a path towards it and autonomously executed it. In Fig. 13 we depicted the planned path and the actually flown trajectory estimated with LSD-SLAM.
After reaching the interesting point the battery of the MAV was empty and it landed automatically. The step that our algorithm would have performed next is the star discovery depicted in Fig. 14.
V-C Quantitative Evaluation
Table II gives results on the run-time of various parts of our approach and properties of the LSD-SLAM and occupancy mapping processes for the two experiments. The creation of the occupancy map is visibly the most time-consuming part of our method, especially at later time steps when the semi-dense depth reconstruction becomes large. In the second experiment modified parameters were used for the creation of the occupancy map. While they proved to perform better they also further increased the time consumption. The remaining parts are comparatively time efficient and can be performed in a couple of seconds. Our evaluation also shows that star discoveries significantly increase the number of free voxels in the map.
In this paper, we proposed a novel approach to vision-based navigation and exploration with MAVs. Our method only requires a monocular camera, which enables low-cost, lightweight, and low-power consuming hardware solutions. We track the motion of the camera and obtain a semi-dense reconstruction in real-time using LSD-SLAM. Based on these estimates, we build 3D occupancy maps which we use for planning obstacle-free exploration maneuvers.
Our exploration strategy is a two-step process. On a local scale, star discoveries find free-space in the local surrounding of a specific position in the map. A global exploration strategy determines interesting voxels in the reachable free-space that is not in direct line-of-sight from previous star discovery origins. In experiments, we demonstrate the performance of LSD-SLAM for vision-based navigation of a MAV. We give qualitative insights and quantitative results on the effectiveness of our exploration strategy.
The success of our vision-based navigation and exploration method clearly depends on the robustness of the visual tracking. If the MAV moves very fast into regions where it observes mostly textureless regions, tracking can become difficult. A tight integration with IMU information could benefit tracking, however, such a method is not possible with the current wireless transmission protocoll for visual and IMU data on the Bebop.
Also a more general path planning algorithm based on the next best view approach is desirable. This however requires a more efficient way to refresh the occupancy map when pose graph updates happen.
In future work we will extend our method to Stereo LSD-SLAM  and tight integration with IMUs. We may also use the method for autonomous exploration on a larger MAV with onboard processing.
-  J. Engel, T. Schöps, and D. Cremers, “LSD-SLAM: Large-scale direct monocular SLAM,” in ECCV, 2014.
-  B. Yamauchi, “A frontier-based approach for autonomous exploration,” in IEEE Int. Symp. on Computational Intelligence in Robotics and Automation, 1997.
-  H. H. Gonzalez-Banos and J.-C. Latombe, “Navigation strategies for exploring indoor environments,” Internaional Journal of Robotics Research, vol. 21, no. 10-11, pp. 829–848, 2002.
-  N. Basilico and F. Amigoni, “Exploration strategies based on multi-criteria decision making for searching environments in rescue operations,” Autonomous Robots, vol. 31, no. 4, pp. 401–417, 2011.
-  W. Burgard, M. Moors, C. Stachniss, and F. Schneider, “Coordinated multi-robot exploration,” IEEE Transactions on Robotics, vol. 21, no. 3, pp. 376–386, 2005.
-  D. Joho, C. Stachniss, P. Pfaff, and W. Burgard, “Autonomous exploration for 3D map learning,” in Autonome Mobile Systeme (AMS), 2007.
-  C. Stachniss, G. Grisetti, and W. Burgard, “Information gain-based exploration using rao-blackwellized particle filters,” in Proc. of Robotics: Science and Systems (RSS), Cambridge, MA, USA, 2005.
-  I. M. Rekleitis, “Single robot exploration: Simultaneous localization and uncertainty reduction on maps (slurm),” in CRV, 2012.
-  S. Shen, N. Michael, and V. Kumar, “Autonomous indoor 3D exploration with a micro-aerial vehicle,” in IEEE International Conference on Robotics and Automation (ICRA), 2012, pp. 9–15.
-  L. Yoder and S. Scherer, “Autonomous exploration for infrastructure modeling with a micro aerial vehicle,” in Field and Service Robotics, 2015.
-  S. Nuske, S. Choudhury, S. Jain, A. Chambers, L. Yoder, S. Scherer, L. Chamberlain, H. Cover, and S. Singh, “Autonomous exploration and motion planning for a UAV navigating rivers,” Journal of Field Robotics, 2015.
-  L. Heng, A. Gotovos, A. Krause, and M. Pollefeys, “Efficient visual exploration and coverage with a micro aerial vehicle in unknown environments,” in IEEE International Conference on Robotics and Automation (ICRA), 2015, pp. 1071–1078.
-  A. Bircher, M. Kamel, K. Alexis, H. Oleynikova, and R. Siegwart, “Receding horizon ”next-best-view” planner for 3D exploration,” in IEEE International Conference on Robotics and Automation (ICRA), 2016, pp. 1462–1468.
-  V. R. Desaraju, N. Michael, M. Humenberger, R. Brockers, S. Weiss, J. Nash, and L. Matthies, “Vision-based landing site evaluation and informed optimal trajectory generation toward autonomous rooftop landing,” Autonomous Robots, vol. 39, no. 3, pp. 445–463, 2015.
-  J. Engel, J. Sturm, and D. Cremers, “Scale-aware navigation of a low-cost quadrocopter with a monocular camera,” Robotics and Autonomous Systems (RAS), vol. 62, no. 11, pp. 1646––1656, 2014.
-  A. Hornung, K. Wurm, M. Bennewitz, C. Stachniss, and W. Burgard, “OctoMap: an efficient probabilistic 3d mapping framework based on octrees,” Autonomous Robots, vol. 34, no. 3, pp. 189–206, 2013.
-  J. Engel, J. Stückler, and D. Cremers, “Large-scale direct SLAM with stereo cameras,” in IROS, 2015.