Log In Sign Up

Low-level Active Visual Navigation: Increasing robustness of vision-based localization using potential fields

This paper proposes a low-level visual navigation algorithm to improve visual localization of a mobile robot. The algorithm, based on artificial potential fields, associates each feature in the current image frame with an attractive or neutral potential energy, with the objective of generating a control action that drives the vehicle towards the goal, while still favoring feature-rich areas within a local scope, thus improvingimproving in this way the localization performance. One key property of the proposed method is that it does not rely on mapping, and therefore it is a lightweight solution that can be deployed on miniaturized aerial robots, in which memory and computational power are major constraints. Simulations and real experimental results using a mini quadrotor equipped with a downward looking camera demonstrate that the proposed method can effectively drive the vehicle to a designatedthe goal through a path that prevents localization failure.


page 4

page 5

page 7


A GNC Architecture for Planetary Rovers with Autonomous Navigation Capabilities

This paper proposes a Guidance, Navigation, and Control (GNC) architectu...

Towards Autonomous Crop-Agnostic Visual Navigation in Arable Fields

Autonomous navigation of a robot in agricultural fields is essential for...

Visual Navigation Using Sparse Optical Flow and Time-to-Transit

Drawing inspiration from biology, we describe the way in which visual se...

Sparse Image based Navigation Architecture to Mitigate the need of precise Localization in Mobile Robots

Traditional simultaneous localization and mapping (SLAM) methods focus o...

Design, modelling and control of a novel agricultural robot with interlock drive system

A current problem in the design of small and lightweight autonomous agri...

Learning Local Feature Descriptor with Motion Attribute for Vision-based Localization

In recent years, camera-based localization has been widely used for robo...

UNav: An Infrastructure-Independent Vision-Based Navigation System for People with Blindness and Low vision

Vision-based localization approaches now underpin newly emerging navigat...

I Introduction

The problem of estimating the pose of a robot with respect to its environment, usually denoted as the localization problem, is one of the key challenges imposed on the deployment of autonomous Micro Aerial Vehicles (MAVs) in GPS-denied environments. Also addressed as state estimation, localization is of utmost importance, as the performance of other primary tasks such as autonomous navigation and obstacle avoidance relies on its accuracy. Beside compromising the mission, localization failure is potentially dangerous for humans and buildings in the vicinity, and the robot itself.

MAV localization in GPS-denied environments can be achieved with the aid of external systems such as motion tracking cameras. However, since such systems are not always available or practical, much research effort has been devoted to the development of alternative systems relying on the use ofin using lightweight onboard sensors. The use of vision-based approaches for MAV localization has been the subject of many research works lately, e.g. [1, 2, 3]. MostEffective solutions combine visual estimation algorithms and inertial data within a filtering framework [4, 6, 7, 8]. In particular, recent solutions such as ORB-SLAM [7] and SVO [6] are accurate in cluttered, feature rich environments and high texture scenarios. However, when visual cues are not available, the vision pipeline fails and the algorithm must be reinitialized. Moreover, during the re-initialization step state estimation relies on model propagation and inertial data, compromising localization and map consistency. If visual cues are not detected within few seconds, state estimation error and uncertainty grow fast and the vehicle may get lost, as is illustrated in Fig. 3LABEL:sub@fig:slam_fails.

Fig. 3: Fig. LABEL:sub@fig:slam_fails illustrates theshows problem of the navigation without taking into consideration features in the environment, which can lead to localization failure. Fig. LABEL:sub@fig:slam_works presents theThe proposed solution that drives the vehicle towards a feature-rich area, enhancing localization robustness.

This paper proposes a low-level strategy for MAVs to improve the locally Visual Odometry (VO) performance and map consistency. The solution aims at guiding the vehicle through pathsa path with high quality image features. The method builds uponon top of an Artificial Potential Field (APF) framework, in which each image feature is associated with a corresponding potential level. This yields an additional reference velocity to the vehicle’s control loop aimed at biasing its motion towards a feature-rich region in the current image frame.The control loop considers an additional reference velocity that points towards a feature rich region in the current image frame. Spatial goal and APF-generated velocities are combined such that the vehicle drives toward the goal, while avoiding texture-less and non-static features regions, as illustrated in Fig. 3LABEL:sub@fig:slam_works. This behaviour has the potential tocan reduce the growth of state estimation uncertainty and improve the localization accuracy. In addition, since this method does not rely on a map, it will not experience the same type of errors as in active SLAM solutions and, most important, it can be integrated in a complementary fashion. The proposed solution is specially appealinginteresting for small scale vehicles, e.g. [9], with low memory and limited processing power constraints that preventrestrain large map storage, loop closure detection, and global pose optimization. For these systems, visual SLAM is constrained to a low number of keyframes. Once the allocated memory is full, previous keyframes are discarded as new frames arrive [2]. Yet, local map consistency is required for proper 3D world point initialization.

TheThis paper is organized as follows: Sec. II summarizespresents related work regarding passive localization, active localization, and artificial potential fields. Sec. III introduces some basic notation and definitions. In Sec. IV describes the new method proposed, the proposed method is presented. Sec. V presents simulation and experimental results. Finally, Sec. VI contains theaddresses final remarks. This paper builds upon and extends previous results that were presented in [10] by the authors. In particular, simulations validate the proposed algorithm in different scenarios, the effect of some parameters on the proposed is evaluated, the benefits of an auto-tuning technique are demonstrated, and extensive experiments on a real world scenario demonstrate the significant improvement on VO and SLAM performance for specific cases using the proposed method.

Ii Literature Review

This section provides a general overview of the foundations of the problem tackled in this work. Relevant solution regarding visual odometry, active localization, and artificial potential field are discussed.

Ii-a Visual Odometry

VOVisual odometry estimates the camera motion by taking into consideration the motion of tracked features in two consecutive frames. Thus, it is a dead-reckoning method, i.e. estimates relative, rather than absolute pose. In passive VO [5], interest image regions are initialized and tracked as a consequence of navigation. Recent solutions, e.g. [6, 7, 8], have achieved impressive results. The keypoint method developed in [7] extracts features from salient image regions, to recover camera pose using epipolar geometry. The hybrid method proposed in [6] combines both features and photometric information at high frame rates for a robust pose estimation. The dense strategy in [8] exploits photometric error to recover camera pose, integrating lens and camera parameters. The two last methods operate directly over the pixel intensity, being therefore more robust and accurate in sparsely textured environment. However, as suggested in [8], the performance depends on the camera model, frame rate, and environment brightness. Nonetheless, we are not aware of any VO that does not require a quasi-static scenario and texture for working properly. The method addressed in this paper assures these two underlying assumptions hold during flight.

Ii-B Active Localization

Active localization adds visual information in the control loop, with the goal of minimizing pose uncertainty. The problem is addressed in the literature under different formulations, with significant overlapping: active-SLAM [11, 12, 13], planning under uncertainty [14, 15], or next-best-view [16, 17].

Davison and Murray [11]

define the main goal in active feature selection as building a map of features, which helps localizing the robot rather than an end result in itself. Their solution is not active in the sense that it does not actuate on the camera path. Instead, it chooses the best features to fixate the camera’s view. Vidal-Calleja et al.

[12] extends the former solution proposing a control law that drives the camera to the location that maximizes the expected information gain. Based on information theory, features corresponding to the hardest prediction measurement are chosen. A formal approach that relates SLAM observability and vehicle motion is addressed by Bryson and Sukkarieh [13]. The proposed on-line path planner decides whether explore new regions or revisit known features to improve localization. Trajectories that excite locally unobservable modes are preferred.

Considering a known map (given a priori), the navigation task can be improved by planning routes that favor texture-rich areas. Achtelik et al. [14] addressed a Rapid-exploring Random Belief Tree (RRBT) framework that incorporate MAV dynamics and pose uncertainty. The method fails in a non-static scenario, or under illumination changes. In [15], Rapid-exploring Random Trees (RRT*) is extended, to take into account the pose uncertainty. Most informative trajectories are selected using Fischer informative matrix. Previous non-mapped or non-static regions have impact on local edges and vertexes affected by new information. While local changes in the nodes handle small discrepancies between map and environment, these methods are not scalable since an offline path planner requires an accurate map.

Sadat et al. [16] propose a scoring function that takes into account the expected number of features for a given camera’s viewpoint, using a mesh of triangles. The cost function of the optimized path planer includes path length and expect number of features. Mostegel et al. [17]

propose a set of measurements, including geometric point quality and recognition probability, to analyze the impact of possible camera motions, and avoid localization loss. The solution relies on SLAM key-frames data. A local planner assures the vehicle reaches the destination in unexplored maps.

Fig. 4: Comparison between the method proposed in this paper (here denoted as Active VO) against most of the state-of-the-art alternatives (identified as Active SLAM).Passive and active localization for mobile robots.

Ii-C Artificial Potential fields

Early work on artificial potential fields was done by Khatib [18] for real-time obstacle avoidance. In this set-up, the spatial goal was associated to an attractive force, while obstacles, detected on-the-fly, were associated to a repulsive force. In contrast to prior works that tackled obstacle avoidance as a high-level planning problem, APF proposed a low level real-time local solution. Furthermore, the author suggested integrating both high-level and low-level for better performance. The popularity of the potential field´s approach has grown steadily and, lately, it has found new applications besides obstacle avoidance, such as in navigation [19], loop-closure detection [20] and mobile robot exploration [21].

Fig. 4, inspired in [22]

, illustrates both active and passive localization frameworks. Most active localization solutions addressed in the literature are built on top of SLAM, and are therefore relatively complex and map dependent. In contrast, our method can be classified as active VO: it does not require a map, but only features selected as inliers in the current frame.

Iii Notations and definitions

Let and denote the 3D body fixed frame attached to the vehicle and the 2D image plane frame attached to image plane, respectively. The origin of coincides with the center of gravity of the vehicle and the origin of

corresponds to the top-left image pixel. Vectors are described in lower case bold and a leading superscript indicates its coordinate frame. The homogeneous coordinate of vector

is denoted as . When a vector is described in , the leading superscript is omitted. Matrices are written in upper case and sets in calligraphic letter. The transformation from body to image frame is known, where and denote the rotation and translation from body frame to the downward looking camera, respectively, and K is the intrisic parameter matrix of the camera. For the sake of simplicity, the camera frame is omitted, as usually a static or a known transformation relates camera and body frames.

A high level positioning controller computes , a velocity reference that drives the vehicle to the spatial goal. Notice that only planar motion is considered, that is, the vehicle keeps a constant height. Also, let be the undistorted coordinates of an image feature [23]. In particular, is the set of features tracked and selected as inliers in the current image frame.

Iv Low-level Active Visual Navigation

Fig. 5: Block diagram representing the proposed APF framework, and its communication with the other modules.

This section describes the method proposed to compute the velocity reference that drives the vehicle towards a spatial goal, while avoiding low feature areas. This is achieved by adding a component to the goal velocity vector so as to favor rich regions regarding features. A block diagram shown in Fig. 5 allows visualizing the information flow addressed in the remainder of this section. The proposed method is fed by a visual odometry algorithm, which provides inlier features, and a pose controller, e.g. path-following algorithm, that computes the goal velocity reference. The desired vehicle velocity is computed by the proposed low level active visual navigation method. A higher level system, such as a mission controller, may dynamically tune the parameters of the method proposed, which will be presented and discussed next.

Iv-a Features to charge

To each feature, an attractive or neutral potential energy is associated. Associating similar potential energy to every feature in the image frame is not adequate, since the vehicle could be easily trapped at a local minimum or subject to sudden changes when new features are extracted. Instead, taking advantage of the fact that the camera provides bearing information, the proposed method considers the orientation of each feature w.r.t. the direction of , i.e. the projection of the goal velocity in the image frame. As expected in a potential field framework, the final goal itself plays a role in the local decision making process.

Fig. 6: Attractive potential energy increase in the direction of the blue arrow. Charges are neutral within the circle segment defined by . Attractive and neutral charges are represented by blue and green dots, respectively.

Let be the camera optical center obtained in the camera calibration step be a point that belongs to the image frame, and consider that the feature based velocity shall be computed at that point. For each feature , compute


where and . Notice that only the direction of is taken into account for computing the feature-based velocity vector111The last coordinate of its homogeneous form must be set to ..

Fig. 7: Case study for the feature based potential field. Current image frame and inlier features (a). Considering as the optical center, the charge map is built and represented as a heat-map. (b). The potential field map shows the action derived when evaluating at different (c). Goal-friendly region is shown in yellow and feature-friendly in red (more detail about these regions is shown in Sec. IV-C). (d) shows commanded reference velocity as the combination of goal and feature velocity. (, pixel, pixel).

Without loss of generality, features in the image plane can be confined within the boundary of a circle centered at , as shown in Fig. 6. Let the central angle define a circular segment in the circle and . A charge is represented as the tuple , where is its corresponding potential energy. The charging policy is defined below:


Based on the current frame information, the considered charging policy associates high attractive potential energy to features localized in the goal direction. Features localized away from the goal direction, on the region defined by the circle segment, have a neutral charge.

Iv-B Vector Field

Each charge exerts a force at a point that belongs to the imag frame, given by


where is the distance in pixels that a charge must be from the evaluated point to exert any force on it; is the spread in pixels of the potential field, and and are computed as follow


The total force on the point can be computed as , which can be normalized and transformed into a feature based velocity command . Its direction in homogeneous coordinates is given as


Finally, the proposed reference velocity takes the form


where is a weight factor and is a normalized velocity. The velocity can be transformed from the image frame to the body frame and scaled accordingly.

Iv-C Discussion

Artificial potential fields frameworks usually take into consideration the robot position when computing forces - obstacles exert repulsive force and the goal an attractive force. In the proposed framework, features can be either attractive or neutral accordingly to their position w.r.t. the point being considered and the goal direction.

A case study is illustrated in Fig. 7. In particular, Fig. 7(a) shows a frame and extracted features classified as inliers. The goal velocity is directed towards the top-right pixel in the image frame - a poor zone regarding the number of features. Figure  7(b) depicts the potential energy associate with each feature when evaluating the action induced at the central pixel of the camera, . The potential field map (see Fig. 7(c)) shows the corresponding field for different values of . For each point in the map, (1)–(8) must be computed. However, for visualization, is not normalized. Observe that the map can be classified in two different zones, which are labelled as goal-friendly and feature-friendly actuation zones. Both regions are shown in the potential field map in the yellow (goal-friendly) and the red (feature-friendly) background. Suppose ; then, according to (8) the vehicle follows . If is within a feature-friendly region, the vehicle favors more the features than the goal. As a matter of fact, the vehicle will move away from the goal. On the contrary, if is within a goal-friendly region, the vehicle will move towards the goal. The absolute value of the angle between the goal velocity and the feature based velocity determines these regions – an acute angle indicates a goal-friendly zone and an obtuse angle is related to a feature-friendly zone. The radius of the charge determines whether features close to the point affect the solution or not. Meanwhile, the spread determines the strength of each charge. The larger the spread is, the more influence the charges away from the point evaluated will have on the feature driven action. Thus, it limits the prediction horizon based on the local frame, that is, the belief that a feature on an edge indicates that there will be more features on that direction.

Iv-D Auto-tuning

The value of determines whether the vehicle shall follow the feature-based velocity or the goal velocity. The main motivation to follow the latter rather than the former is the possibility of localization failure. Therefore, the value of shall be selected considering a given performance index of the localization algorithm being employed. In this paper, the number of inlier features tracked across consecutive frames () is considered. If the number of tracked features is above an upper threshold (), is set to . Conversely, if the number of tracked features is below a lower threshold (), is set to . If the number of tracked features is in between these two thresholds, then is proportional to the number of features. The auto-tuning method is summarized as


where and are constants such that is continuous.

V Results

This section presents simulation results and real world experiments for multiple flight tests. For both simulation and real world experiments, images and commands are published and received within the ROS (Robot Operating System) environment [24]. The feature-based potential field runs online as the decision making process, sending desired velocity commands to the velocity controller. A first order low pass filter, with a cutoff frequency at  Hz, provides smooth control reference for the feature driven action. When a new image frame is received, Shi-Tomasi features [25] are extracted. In addition to a minimum quality threshold, only features with the highest response are selected. Then, features are tracked across two consecutive frames using the Lucas-Kanade Tracker (LKT) [26]. By resorting to a RANSAC [27]

framework, the 8-point algorithm classifies features as inliers or outliers. The solution is robust in the presence of few false inliers, such that a matching step is unnecessary. Both Shi-Tomasi and LKT are implemented in the OpenCV library 


V-a Simulation

The proposed solution was validated in different scenarios using the simulator V-REP by Coppelia Robotics. Each scenario has at least one path to the goal which is rich in features. The values of were similar throughout the simulations. Fig. 8(a-c) illustrate scenarios for which flying in a straight path to the goal is potentially risky for localization and map consistency. Therefore, in such scenarios, the proposed method is an asset, as it drives the vehicle through feature rich zones. Fig. 8(d) considers a well textured scene. Although the potential-field framework does not benefit the navigation, it does not compromise reaching the goal. For 10 trials, the trajectory was  m longer than flying in a straight line, which corresponds to of the  m straight path length. Fig. 8(e-f) correspond to local-minima scenarios. The proposed method struggles to overcome a symmetric bifurcation (Fig. 8(e)). Such scenario with equally distributed features in opposite direction is not likely to happen in a real world environment, the vehicle finds a way out. Finally, Fig. 8(f) presents a scenario for which the method may eventually fail due to the local nature of potential field based solutions. Analyzing the simulations, if the goal velocity and the feature based velocity are pointing in opposite directions, the vehicle has reached a local minimum. This problem could be addressed by a high-level SLAM, which memorizes paths already travelled and selects an alternative path to the goal. If SLAM is not available – due memory or computational power constraints – increasing the value of would enable to overcome the local minimum.

Fig. 8: Simulations for evaluation of the proposed algorithm. The task consists in flying a 10 m length path from the red square (right) to the green square (left). (a-e) illustrate scenarios that have one or more feature-rich paths to the goal and (f) depicts a scenario where the proposed algorithm fails due to local minima. Parameters were kept constant: , pixel, pixel.
Fig. 9: Effect of parameter : (a) , (b) , (c) and (d) auto-tuning, on the experimental setup of Fig. 8(b).
Fig. 10: Value of parameter using auto-tuning technique. Path travelled is as shown in Fig. 9(d).

Figure 9 addresses the effect of varying . If is low (Fig. 9(a)) the vehicle takes a path more conservative with respect to features. In contrast, as increases, the vehicle follows a trajectory closer to the straight line that links the start and goal positions (Fig. 9(c)). An evaluation of the auto-tuning technique described in Section IV-D is shown in Fig. 9(d). and were set to and

of the maximum number of features extracted per frame, respectively. The value of

throughout the simulation of Fig. 9(d) can be seen in Fig. 10. Online tuning had a positive effect on the overall performance of the task. In Fig. 9(a) the vehicle travelled m to the goal, while. Meanwhile, using the auto-tuning technique, the path travelled was m length.

A total of 30 simulations using the scenario depicted in Fig. 8 were performed. For half of these simulations, the potential field framework was turned off. For the other half, it was turned on. In the first situation, vision-based localization failed after flying a fourth of the the straight line path. Afterwards, within an EKF framework, state estimation was purely based on inertial measurements propagation. IMU covariance error was extracted from the bagfile described in [6]. Meanwhile, the proposed method ensured visual localization worked properly during the entire flight. For both methodologies, based on its state estimation, the vehicle eventually reached the estimated goal position. For each trial, the distance between the final position vehicle and the real goal position was measured. While when using the proposed active pipeline our method reached a distance of  m, the passive technique got  m, showing that the proposed technique improves the overall location estimation.As shown in Fig. LABEL:fig:distance2goal, assuring visual pipelines does not fail boost state estimation performance.Notice that such comparison is only possible on simulations, as flying a real quadrotor based purely on inertial data is not a reliable approach.

V-B Experimental Setup with Real Data

Figure 11 presents the main components employed in the experimental validation of the proposed method. The algorithm was tested in the mini-quadrotor Crazyflie 2.0222, manufactured by BitCraze,. This open source vehicle allows a maximum take-off weight of  g. It is equipped with accelerometers, gyroscopes, magnetometer, pressure sensor, bluetooth, and a low-latency/long-range radio that allows for the robot to be operated (and log data) from a remote PC. The manufacturer’s radio - Crazyradio - interfaces communication with a notebook, which computes the proposed action - collective thrust and attitude commands. are sent to the vehicle using the package developed in[29]. A customized PID controller ensures that the vehicle follows the desired reference velocity. We attached to the vehicle a  g mini transmitter camera module FX798T equipped with field of view lens - the camera faces downward. This module broadcasts images using an embedded  GHz transmitter. The frame-rate is  Hz for a image resolution. The proposed algorithm runs off-board computer at frame-rate speed. The images are transmitted via radio. Although variable, the measured delay upon image reception on the remote computer is about  ms. The flying arena is equipped with an Optitrack333 motion capture system manufactured by the company NaturalPoint. The system records the motion of retro-reflective markers using the optical-passive technique. Infrared cameras capture markers’ position allowing, which estimates the pose of the vehicle with millimetric precision. The motion capture system provides feedback for the control loop, while the feature-based potential field works online as the decision making process, sending desired velocity commands to the controller.

Fig. 11: Experimental setup. Clockwise, starting from the top: the crazyflie equipped with the mini transmitter camera module FX798T, image receiver, and capture card; a notebook that runs the code; and an antenna for the communication with the vehicle.
Fig. 12: Fig. (a) shows the first real scenarioScenario for evaluation of proposed algorithm(a). Fig. (b) shows that the localizationLocalization fails in the absence of features (passive technique)(b).In Fig. (c), we show results with our method, i.e. a featureFeature based velocity vector drives the vehicle to the goal through a region rich in features(c).

V-C Real Data Experiments Evaluation

Multiple flights were performed in a home alike arena employed in European Robotics League (ERL)444 In the first scenario, shown in Fig. 12(a), the robot was autonomously controlled to fly from the starting position to the goal position as indicated by the two carpets. As seen from the image, tThe straight path that links both carpets is a regular floor that does not contain good visual cues. The localization task for a trial is said to succeed if ORB-SLAM [7] manages to keep track of the pose of the vehicle and, therefore, a consistent map is built. A similar qualitative metric is employed in [16]. We stress again that one of the advantages of the proposed technique is that it can work under a SLAM framework, rather than on top (as most previous active perception strategies), ensuring a better localization and mapping result. Notice that we run ORB-SLAM offline, just for evaluation purposes, not for online localization. Multiple trials showed that taking the straight path towards the goal always results in localization failure due to the lack of visual features. Fig. 12(b) shows one of the trials where the localization estimation fails when a straight path is taken. Alternatively, using the proposed active method, the robot was always capable of reaching the goal while maintaining localization. Fig. 12(c) shows the path the robot takes, as well as the estimated state throughout the time in one of the experiments. Fig. 12(c) also shows the trajectory travelled for different trials with the proposed method where the robot reaches the goal while keeping track of its location.

Fig. 13: Algorithm evaluation for different values of and . It was considered the experimental results presented in Fig. 12(a). means localization succeeded, means goal task succeeded, and means both task succeed.

For the same scenario, the method was also evaluated for different values of and , while keeping constant. The goal task is said to if the vehicle reaches the goal within less than . Localization task success is defined as before. Results are shown in Fig. 13, where means localization task succeed, means the goal task succeed and means both task succeed. There were a range of parameters values that led to success both tasks. The localization task fails when the weight of the goal velocity vector is large, i.e. is large, such that the vehicle takes a path similar to the straight one. Meanwhile, the goal task fails when the feature based potential field falls in a local minimum. Analyzing the image sequence, it is possible to observe that the vehicle hovers over a region that new features are not initialized and features contrary to the goal direction exert some attractive force. Consequently, the potential field force is not able to push the vehicle in the direction of the goal. As the value of increases, the attractive potential energy associated to features opposing the goal direction decreases. Therefore, the vehicle is no longer pulled backwards w.r.t. to the goal and reaches the final destination.

Fig. 14: Passive and active flight on a home alike arena as employed in ERL. The vehicle visits waypoints in the order: 1, 2, 3, 1, and 4.
Fig. 15: Comparison of the visual odometry error (consecutive frames) for both the passive and active techniques using ORB-SLAM, for the experimental setup presented in Fig. 14. The dash red line indicates that the robot is lost.

To further evaluate the method in more realistic environments, we performed several long flights using all rooms of the home alike arena (Fig. 14). The vehicle must visit waypoints (WP) selected by a human operator to explore different rooms of the apartment. As shown, the proposed method deviates the vehicle from straight paths that link consecutive waypoints. The VO performance is depicted in Fig. 15 with (active mode) or without (passive mode) the proposed low level algorithm. The VO is obtained from ORB-SLAM, which requires proper 3D point initialization to propagate the relative scale. As long as enough visual cues are within the field of view of the camera, the error is small for both active and passive method. However, flying straight to the goal leads to poor feature regions closed to the bedroom entrance at , where the passive estimator gradually fails to estimate the position. Before completely failing (dashed segment) visual odometry degrades quickly. The SLAM localization error (Fig. 16) presents a similar behaviour - error grows large before failing, but it is kept small when the map is concise. On the way out of the bedroom, the vehicle revisits some poorly initialized features leading to large errors during the recovering phase. If feedback was not provided by the motion capture system, such pose estimation failure would require to abort the mission and land in an open loop manner. In particular, for the active method flight tests, the value of was computed using the auto-tuning method and it is shown in Fig. 17. Overall, we performed about 150 meters of flight per method, corresponding to more than 8000 frames. With the proposed strategy, SLAM manages to estimate the pose for of the frames whereas only for was estimated using passive navigation. Therefore, thanks to the feature based strategy, VO and SLAM error are more likely to be kept small throughout the entire flight.

Fig. 16: Curves representing the SLAM estimated position error, for both passive and active techniques while doing the path presented in Fig. 14. Points in the graph represent the interval of estimation.
Fig. 17: Representation of the ’s auto-tuning of the method presented in this paper, while making the path shown in Fig. 14 (blue curve).

Vi Conclusions

This paper proposes a simple, fast, low complex and map-independent solution for the active visual localization problem. The method, based on artificial potential fields, associates a potential energy to features that are classified as inliers in the current image frame. The goal direction is used to determine the corresponding charge intensity of these image features. Simulations and experimental results using a micro aerial vehicle, equipped with a downward looking camera, showed that the method can effectively drive the vehicle towards the goal, while avoiding no or poor featured regions. This active behaviour can greatly improve the localization performance and prevent common localization failures that are caused by low quality image features. The proposed active solution does not rely on a map and hence can be integrated within a SLAM framework to improve the accuracy and robustness of localization and mapping. Finally, the proposed method could benefit from the many techniques developed for potential fields, such random perturbations for escaping local minima.


  • [1] G. Loianno, M. Watterson, and V. Kumar ”Visual inertial odometry for quadrotors on SE(3),” IEEE Int’l Conf. on Robotics and Automation (ICRA), pp. 1544-1551, 2016.
  • [2] S. Weiss, M. W. Achtelik, S. Lynen, M. C. Achtelik, L. Kneip, M. Chli, and R. Siegwart, ”Monocular Vision for Long-term Micro Aerial Vehicle State Estimation: A Compendium,” Journal of Field Robotics, vol. 30, no. 5, pp. 803-831, 2013.
  • [3] J. Engel, J. Sturm, and D. Cremers ”Accurate Figure Flying with a Quadrocopter Using Onboard Visual and Inertial Sensing,” In Proc. of the Workshop on Visual Control of Mobile Robots, 2012.
  • [4] G. Klein and D. Murray, ”Parallel Tracking and Mapping for Small AR Workspaces,” IEEE/ACM Int’l Symposium on Mixed and Augmented Reality (ISMAR), pp. 225-234, 2007 .
  • [5] D. Scaramuzza and F. Fraundorfer, ”Visual Odometry [Tutorial],” IEEE Robotics Automation Magazine, vol. 18, no. 4, pp. 80-92, 2011.
  • [6] C. Forster, M. Pizzoli, and D. Scaramuzza ”SVO: Fast Semi-Direct Monocular Visual Odometry,” IEEE Int’l Conf. on Robotics and Automation (ICRA), pp. 15-22, 2015.
  • [7] R. Mur-Artal, J. M. M. Montiel, and J. D. Tardós, ”ORB-SLAM: A Versatile and Accurate Monocular SLAM System,” IEEE Trans. on Robotics, vol. 31, no. 5, pp. 1147-1163, 2015.
  • [8] J. Engel, V. Koltun, and D. Cremers, ”Direct Sparse Odometry,” arXiv:1607.02565 [cs.CV].
  • [9] C. De Wagter, S. Tijmons, B. D. W. Remes, and G. C. H. E. de Croon, ”Autonomous flight of a 20-gram Flapping Wing MAV with a 4-gram onboard stereo vision system,” IEEE Int’l Conf. on Robotics and Automation (ICRA), pp. 4982-4987, 2014.
  • [10] R. T. Rodrigues, M. Basiri, A. P. Aguiar, and P. Miraldo, ”Feature Based Potential Field for Low-level Active Visual Navigation,” Robot’17: Third Iberian Robotics Conference: Advances in Robotics, 2017.
  • [11] A. J. Davison and D. W. Murray, ”Simultaneous localization and map-building using active vision,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 865-880, 2002.
  • [12] T.A. Vidal-Calleja, A. Sanfeliu, and J. Andrade-Cetto, ”Action selection for single-camera SLAM,” IEEE Trans. on Systems Man and Cybernetics, Part B (Cybernetics), vol. 40, no. 6, pp. 1567-1581, 2010.
  • [13] M. Bryson and S. Sukkarieh, ”Observability analysis and active control for airborne SLAM,” IEEE Trans. on Aerospace and Electronic Systems, vol. 44, no. 1, pp. 261-280, 2008.
  • [14] M. W. Achtelik, S. Lynen, S. Weiss, M. Chli, and R. Siegwart, ”Motion- and Uncertainty-aware Path Planning for Micro Aerial Vehicles,” Journal of Field Robotics, vol. 31, no. 4, pp. 676-698, 2014.
  • [15] G. Costante, C. Forster, J. Delmerico, P. Valigi, and D. Scaramuzza ”Perception-aware Path Planning,” arXiv:1605.04151 [cs.RO] (conditionally accepted for IEEE Trans. on Robotics).
  • [16] S. A. Sadat, K. Chutskoff, D. Jungic, J. Wawerla, and R. Vaughan, ”Feature-rich path planning for robust navigation of MAVs with Mono-SLAM,” IEEE Int’l Conf. on Robotics and Automation (ICRA), pp. 3870-3875, 2014.
  • [17] C. Mostegel, A. Wendel, and H. Bischof, ”Active monocular localization: Towards autonomous monocular exploration for multirotor MAVs,” IEEE Int’l Conf. on Robotics and Automation (ICRA), pp. 3848-3855, 2014.
  • [18] O. Khatib, ”The Potential Field Approach And Operational Space Formulation In Robot Control,” Adaptive and Learning Systems: Theory and Applications, 367-377, 1986.
  • [19] E. Rimon and D. E. Koditschek, ”Exact robot navigation using artificial potential functions,” IEEE Transactions on Robotics and Automation, vol. 8, no. 5, pp. 501-518, 1992.
  • [20] J. Vallvé and J. Andrade-Cetto, ”Potential information fields for mobile robot exploration,” Robotics and Autonomous Systems, vol. 15, pp. 68-79, 2015.
  • [21] V. A. M. Jorge, R. Maffei, G. S. Franco, J. Daltroz, M. Giambastiani, M. Kolberg, and E. Prestes, ”Ouroboros: Using potential field in unexplored regions to close loops,” IEEE Int’l Conf. on Robotics and Automation (ICRA), pp. 2125-2131, 2015.
  • [22] R. Siegwart and I. R. Nourbakhsh, Introduction to Autonomous Mobile Robots, Bradford Co., Scituate, MA, USA, 2004.
  • [23]

    R. I. Hartley and A. Zisserman, ”Multiple View Geometry in Computer Vision,” Cambridge University Press, 2004.

  • [24] M. Quigley, B. Gerkey, K. Conley, J. Faust, T. Foote, J. Leibs, E. Berger, R. Wheeler, A. Ng, ”ROS: an open-source Robot Operating System,” ICRA workshop on open source software, 2009.
  • [25]

    J. Shi and C. Tomasi, ”Good features to track,” IEEE Conference on Computer Vision and Pattern Recognition. pp. 593-600, 1994.

  • [26]

    B. D. Lucas and T. Kanade, ”An Iterative Image Registration Technique with an Application to Stereo Vision,” Proc. of the 7th International Joint Conference on Artificial Intelligence, vol. 2, pp. 674-679, 1981.

  • [27] M. A. Fischler and R. C. Bolle, ”Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography,” Commun. ACM, vol 24, no. 6, pp. 381-395, 1981.
  • [28] G. Bradski and A. Kaehler, ”Learning OpenCV: Computer Vision with the OpenCV Library,” O’Reilly Media, 2008.
  • [29] W. Hoenig, C. Milanes, L. Scaria, T. Phan, M. Bolas, and N. Ayanian, ”Mixed Reality for Robotics,” IEEE/RSJ Int’l Conf. Intelligent Robots and Systems. pp. 5382-5387, 2015.