Visual-Inertial Teach and Repeat for Aerial Inspection

by   Marius Fehr, et al.
ETH Zurich

Industrial facilities often require periodic visual inspections of key installations. Examining these points of interest is time consuming, potentially hazardous or require special equipment to reach. MAVs are ideal platforms to automate this expensive and tedious task. In this work we present a novel system that enables a human operator to teach a visual inspection task to an autonomous aerial vehicle by simply demonstrating the task using a handheld device. To enable robust operation in confined, GPS-denied environments, the system employs the Google Tango visual-inertial mapping framework as the only source of pose estimates. In a first step the operator records the desired inspection path and defines the inspection points. The mapping framework then computes a feature-based localization map, which is shared with the robot. After take-off, the robot estimates its pose based on this map and plans a smooth trajectory through the way points defined by the operator. Furthermore, the system is able to track the poses of other robots or the operator, localized in the same map, and follow them in real-time while keeping a safe distance.



There are no comments yet.


page 1

page 3


Marker based Thermal-Inertial Localization for Aerial Robots in Obscurant Filled Environments

For robotic inspection tasks in known environments fiducial markers prov...

Autonomous visual inspection of large-scale infrastructures using aerial robots

This article presents a novel framework for performing visual inspection...

Cooperative Path-following Control of Remotely Operated Underwater Robots for Human Visual Inspection Task

Remotely operated vehicles (ROVs) have drawn much attention to underwate...

Visual-Inertial Odometry-enhanced Geometrically Stable ICP for Mapping Applications using Aerial Robots

This paper presents a visual-inertial odometry-enhanced geometrically st...

maplab: An Open Framework for Research in Visual-inertial Mapping and Localization

Robust and accurate visual-inertial estimation is crucial to many of tod...

Evaluation of a Skill-based Control Architecture for a Visual Inspection-oriented Aerial Platform

The periodic inspection of vessels is a fundamental task to ensure their...

Perception-Aware Perching on Powerlines with Multirotors

Multirotor aerial robots are becoming widely used for the inspection of ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Industrial facilities such as refineries, power and heating plants are required to operate at peak efficiency over extended periods of time. Interruptions or failures due to wear and tear on the components of such installations are to be avoided at all cost, hence, regular inspections and revisions of key components are imperative. However, inspecting these points of interest is time consuming and therefore expensive, especially if the installations need to be shut down due to potential or actual hazards to the technicians. Other components might just simply be hard to reach and require special equipment. Automating these inspection tasks using robots, such as mavs, is a very cost-efficient solution, which can decrease down-times and most importantly keep the human work force out of harms way.

Fig. 1: The mav is autonomously performing an inspection task that has been taught by an operator holding a Google Tango tablet in a GPS-denied environment. See this video for the full demonstration of the system:

However, autonomous mavs require exact pose estimates to safely navigate in these potentially confined and GPS-denied environments. That’s why much of the industrial inspection research focuses either on the inspection task itself by providing change/damage detection algorithms [2, 3] or additional safety/navigation mechanisms to support the human pilot [4, 5, 6]. With the advent of robust vio pipelines and mapping framework, such as maplab [7], ORB-SLAM2 [8], OKVIS [9], ROVIO [10] and the visual-inertial navigation system presented in [11] autonomous robots are able to perform more challenging tasks. Without the need for external infrastructure, like cameras, markers or beacons, these systems allow the robots to follow waypoints provided by the operator in visually challenging environments [12] or fully reconstruct the environment for the purpose of collision avoidance [13].

In this work we present a complete tnr system based on the Google Tango mapping framework [1] that allows a non-expert human operator to simply demonstrate an industrial inspection task to the robot by simply walking and pointing using a hand-held tablet. This allows the automation of tedious and potentially hazardous inspection routines in no more time than it takes to manually perform a single such task. In case the inspection point is out of reach for the human operator, the inspection task can also be taught by manually piloting the mav. The on-board vio allows the operator to navigate precisely and safely in position control mode, considerably lowering the need for extensive pilot training.

The contributions of this work are as follows:

  • We present a novel, robust visual-inertial-based tnr system for industrial inspection.

  • We demonstrate the capabilities of the system in a challenging real-world industrial setting.

Ii System

In this section we will introduce the architecture and components of the presented tnr system. Fig. 2 provides an overview of the components and data flow of the proposed inspection system which is comprised of a teacher and an agent. The teacher refers to a human operator either carrying a tablet or piloting an mav. The agent on the other hand is a fully autonomous mav. The system can be used in two different modes: In live-mode the mav will follow the operators poses in real-time, while keeping a safe distance in case it catches up. In tnr-mode the mav will record the operators poses and inspection points while on the ground and is able to then later execute the observed task as often as required. In both modes the communication between teacher and agent was maintained over WiFi using UDP connection.
The remaining communication between the components as well as the components themselves are implemented based on the ros [14].

Fig. 2: System overview: Both teacher and agent (mav) are localizing from a local copy of a common feature-based localization map. The poses and inspection points of the teacher are sent to the agent live over a UDP connection and either followed or are recorded by the agent to execute the taught mission at any later point in time. The teacher poses and optional poses from other agents are furthermore used to prevent collisions. Components: red: state estimation, green: path planning, blue: control, gray: hardware

State estimation: In order to be able to perfectly reenact the inspection task demonstrated by the teacher, both teacher and agent need to estimate their pose with respect to a common global frame. This is achieved by means of robust vio and localization based on binary feature descriptors. In order to create a localization map the initial map from odometry is post-processed using appearance-based loop closure and viwls optimization (See Fig. 3). Using this common localization map 6-DOF poses taught by the operator can be precisely mapped to the corresponding waypoints required by the mav. On the mav the localized 6-DOF pose provided by the odometry is furthermore fused with the on-board imu for additional robustness using a msf based on [15].

tnr planner: This global planner tracks the poses of the teacher and maintains a key-framed recording of its trajectory including the desired inspection points. Based on the mavs position the planner will select a sequence of key-frames along the teachers trajectory. There are two ways such a sequence can end, either in a normal key-frame, limited by a predefined maximum length or in an inspection point. The planner will furthermore compute the desired velocity at the end of this sequence. If the final waypoint is not the end of the trajectory and not equal to an inspection point this velocity is set to the maximum allowed velocity, otherwise the planner will set the velocity to zero.

Safety: The sequence of waypoints and current agent position is continuously checked for collision against the safety spheres around both teacher and other agents. If a sequence intersects with such a safety sphere, it is cut short and the terminal velocity set to zero, such that the agent comes to a full stop at a safe distance. In case the agents own safety sphere intersects with another sphere, an emergency stop signal is sent directly to the controller, ordering an immediate stop of the mav. It is important to note that aside from passive collision avoidance based on the known poses of the teacher and other agents, the system does not have any collision avoidance capabilities. It therefore assumes a static environment with known dynamic obstacles.

Local planner: The waypoints computed by the tnr planner are then used by the local planner to compute a smooth and dynamically feasible trajectory for the mav using polynomial trajectory optimization similar to [16].

Control: Finally, the polynomial trajectory is sent to the non-linear mpc [17], which in turn computes the desired thrust and attitude signals for the mav interface.

Iii Results

We demonstrate the capabilities of the proposed system in the following video: The experiments were conducted on an AscTec Neo hexacopter, equipped with a visual-inertial stereo sensor and an Intel NUC, i7 on-board computer. Prior to the experiments the operating environment was recorded using a Google Tango tablet. The data was then downloaded to a external computer, loop-closed and bundle-adjusted and finally converted to a localization map using the Google Tango framework [1]. The resulting localization map can be seen in Fig. 3. At the beginning of the experiment the localization map was distributed among the teacher and the agent, i.e. the tablet and the mav. All subsequent processing was executed exclusively on-board the tablet and the mav. Localized to a common frame of reference, the inspection task is defined by the human operator using a hand-held Google Tango tablet. Inspection points are automatically inserted if the operator remains motionless for 2 seconds. The resulting inspection task can be seen in Fig. 4, where the mav has just started following the taught trajectory. Furthermore, Fig. 4 shows the inspected installations as seen from the mav after it safely and reliably completed the desired inspection task.

Fig. 3: Localization map of the industrial operating environment, created using the Google Tango framework [1].
Fig. 4: The mav is executing the inspection plan (green: trajectory, blue: inspection points) and is on its way to the first inspection point. On the right are the resulting images of the 5 inspection points as seen by the mav.

Iv Conclusion

This work presents a novel system for autonomous industrial inspection using mavs, where the inspection task can be taught in an intuitive and safe manner by a human operator holding a tablet or piloting an mav. In a GPS-denied environment, robust visual-inertial mapping techniques are used to localize teacher and agent in a common frame of reference, without the need for external infrastructure, such as markers or beacons. To demonstrate the capabilities of the proposed system, we conducted a challenging real-world experiment in an industrial environment. The mav followed the inspection plan safely and precisely and succeeded in observing all inspection points.


The presented experiments received support from members of the Autonomous Systems Lab and Google Tango, most importantly: Michael Burri, Helen Oleynikova, Zachary Taylor, Fabian Blöchliger, Mingyang Li, Ivan Dryanovski, Simon Lynen, and Konstantine Tsotsos.


  • [1] Google, “Project Tango,”, 2014.
  • [2] N. Hallermann and G. Morgenthal, “Visual inspection strategies for large bridges using unmanned aerial vehicles (uav),” in Proc. of 7th IABMAS, International Conference on Bridge Maintenance, Safety and Management, 2014, pp. 661–667.
  • [3] G. Morgenthal and N. Hallermann, “Quality assessment of unmanned aerial vehicle (uav) based visual inspection of structures,” Advances in Structural Engineering, vol. 17, no. 3, pp. 289–302, 2014.
  • [4] C. Eschmann, C.-M. Kuo, C.-H. Kuo, and C. Boller, “Unmanned aircraft systems for remote building inspection and monitoring,” in Proceedings of the 6th European Workshop on Structural Health Monitoring, Dresden, Germany, vol. 36, 2012.
  • [5] A. Al-Kaff, F. M. Moreno, L. J. San José, F. García, D. Martín, A. de la Escalera, A. Nieva, and J. L. M. Garcéa, “Vbii-uav: vision-based infrastructure inspection-uav,” in World Conference on Information Systems and Technologies.    Springer, 2017, pp. 221–231.
  • [6] I. Sa, S. Hrabar, and P. Corke, “Outdoor flight testing of a pole inspection uav incorporating high-speed vision,” in Field and Service Robotics.    Springer, 2015, pp. 107–121.
  • [7] T. Schneider, M. T. Dymczyk, M. Fehr, K. Egger, S. Lynen, I. Gilitschenski, and R. Siegwart, “maplab: An open framework for research in visual-inertial mapping and localization,” IEEE Robotics and Automation Letters, 2018.
  • [8] R. Mur-Artal and J. D. Tardos, “ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras,” T-RO, 2017.
  • [9] S. Leutenegger, S. Lynen, M. Bosse, R. Siegwart, and P. Furgale, “Keyframe-based visual–inertial odometry using nonlinear optimization,” IJRR, 2015.
  • [10] M. Bloesch, S. Omari, M. Hutter, and R. Siegwart, “Robust visual inertial odometry using a direct ekf-based approach,” in IROS, 2015.
  • [11] Y. Lin, F. Gao, T. Qin, W. Gao, T. Liu, W. Wu, Z. Yang, and S. Shen, “Autonomous aerial navigation using monocular visual-inertial fusion,” Journal of Field Robotics, vol. 35, no. 1, pp. 23–51, 2018.
  • [12] J. Nikolic, M. Burri, J. Rehder, S. Leutenegger, C. Huerzeler, and R. Siegwart, “A uav system for inspection of industrial facilities,” in Aerospace Conference, 2013 IEEE.    IEEE, 2013, pp. 1–8.
  • [13] S. Omari, P. Gohl, M. Burri, M. Achtelik, and R. Siegwart, “Visual industrial inspection using aerial robots,” in Applied Robotics for the Power Industry (CARPI), 2014 3rd International Conference on.    IEEE, 2014, pp. 1–5.
  • [14] M. Quigley, K. Conley, B. P. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, and A. Y. Ng, “Ros: an open-source robot operating system,” in ICRA Workshop on Open Source Software, 2009.
  • [15] S. Lynen, M. Achtelik, S. Weiss, M. Chli, and R. Siegwart, “A robust and modular multi-sensor fusion approach applied to mav navigation,” in Proc. of the IEEE/RSJ Conference on Intelligent Robots and Systems (IROS), 2013.
  • [16] C. Richter, A. Bry, and N. Roy, “Polynomial trajectory planning for aggressive quadrotor flight in dense indoor environments,” in Robotics Research.    Springer, 2016, pp. 649–666.
  • [17] M. Kamel, T. Stastny, K. Alexis, and R. Siegwart, “Model predictive control for trajectory tracking of unmanned aerial vehicles using robot operating system,” in Robot Operating System (ROS) The Complete Reference, Volume 2, A. Koubaa, Ed.    Springer, 2017.