RUR53: an Unmanned Ground Vehicle for Navigation, Recognition and Manipulation

11/23/2017 ∙ by Nicola Castaman, et al. ∙ Università di Padova 0

This paper describes RUR53, the unmanned mobile manipulator robot developed by the Desert Lion team of the University of Padova (Italy), and its experience in Challenge 2 and the Grand Challenge of the first Mohamed Bin Zayed International Robotics Challenge (Abu Dhabi, March 2017). According to the competition requirements, the robot is able to freely navigate inside an outdoor arena; locate and reach a panel; recognize and manipulate a wrench; use this wrench to physically operate a valve stem on the panel itself. RUR53 is able to perform these tasks both autonomously and in teleoperation mode. The paper details the adopted hardware and software architectures, focusing on its key aspects: modularity, generality, and the ability of exploiting sensor feedback. These features let the team rank third in the Gran Challenge in collaboration with the Czech Technical University in Prague, Czech Republic, the University of Pennsylvania, USA, and the University of Lincoln, UK. Tests performed both in the Challenge arena and in the lab are presented and discussed, focusing on the strengths and limitations of the proposed wrench and valve classification and recognition algorithms. Lessons learned are also detailed.



There are no comments yet.


page 5

page 6

page 13

page 14

page 15

page 16

page 17

page 18

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The Mohamed Bin Zayed International Robotics Challenge (MBZIRC)111

is an international robotics competition fostering research in advanced robotics applications like operations in disaster scenarios, oil and gas maintenance, manufacturing, construction, and housework. Unstructured and dynamic environments require robots to be autonomous and able to collaborate and interact with other robots and humans. This calls for a combination of sensorized robots and advanced artificial intelligence techniques able to fulfill object perception, motion planning, grasping, and manipulation tasks. The first MBZIRC edition took place in Abu Dhabi in March 2017 and consisted of three challenges and a triathlon type Grand Challenge. Challenge 1 required an Unmanned Aerial Vehicle (UAV) to locate, track, and land on a moving vehicle. Challenge 2 required an Unmanned Ground Vehicle (UGV) to locate and reach a panel, and physically operate a valve stem on the panel. Challenge 3 required a team of UAVs to collaborate to search, locate, track, pick and place a set of static and moving objects. The Grand Challenge required a team of robots (UAVs and UGVs) to compete the combination of Challenges 1, 2, and 3 simultaneously.

MBZIRC was attended by 25 teams selected from all over the world. Our team, the Desert Lion team of the Intelligent Autonomous Systems Laboratory (IAS-Lab) of the University of Padova (Italy), competed in Challenge 2 and in The Grand Challenge, ranking third in the latter in collaboration with Czech Technical University in Prague, University of Pennsylvania, and University of Lincoln. This paper describes the robotics system we proposed and the algorithm we designed and developed. Three are its key aspects:

  • Modularity: A good modularization facilitates the debugging process, favors system’s stability, and facilitates code re-usability. Our system is modular: a finite state machine where each block is based on the Robot Operating System (ROS) [Quigley et al., 2009] open-source middleware (see Section 3).

  • Generality: For robotics research, finding general solutions is desirable because they can be applied to a wide range of problems. However, in the case of a robotics competition, organizers usually specify additional constraints in order to facilitate competitors in solving assigned tasks. This means that a reasonable trade-off should be defined in order to guarantee the system’s re-usability while not focusing on unnecessary or even unfeasible sub-problems. Focusing on perception, the hardware configuration of our robot is task specific, while the recognition routine can be easily trained and used to identify other objects, besides wrenches and valve stems. Focusing on manipulation, the robot is equipped with a 3-finger gripper even if a custom one was enough to pick up the wrench. Developing the grasping routine for a 3-finger gripper is more complex than developing a grasping routine for a custom tool; however, the achieved solution is flexible to changes in the geometry and dimension of the object to be grasped. With respect to navigation, the robot is able to freely move while avoiding obstacles; at the same time, a set of waypoints can be given as input: this lets force the robot trajectory once the panel location region is given (see Sections 4.1 and 4.2).

  • Feedback management: The robot must approach the panel and manipulate the proper tool. Having a robust feedback from perception and force sensors permits to be more robust to failures. Our approach combined 2D and 3D perception by mounting two laser scanners on the mobile base and two stereo vision cameras on the robotic manipulator. This redundancy lets the system be robust to light variations and perception errors. Because of budget limitations, our robot was not equipped with force sensors. We went beyond this limit by verifying the correctness of grasps through the output of gripper’s motor encoders: we compared assigned joint positions with the ones provided by the encoders (see Sections 4.2 and 4.3).

The remainder of the paper is organized as follows. Section 2 gives a review of existing robotic solutions in terms of similar challenges and problems. Section 3 details the design of the proposed system focusing both on its hardware and software architecture. In Section 4 an accurate description of every software module is depicted. Section 5 summarizes performed experiments and achieved results, both in the lab and in the arena. Finally, Section 6 details lessons learned during the competition and Section 7 contains conclusions and future works.

2 Related Work

Competitions are increasingly spreading across the robotics community. The reason is that they encourage innovation in the resolution of complex real-world problems while going beyond the typical research laboratory scenarios: very challenging experimental setups are usually provided, characterized by uncontrolled and not easily reproducible conditions. Participants have to face these challenges by applying a combination of state of the art algorithms, innovative methods, ideas and/or prototype systems from industrial R&D [Dias et al., 2016]. An example is the DARPA Robotics Challenge (DRC) [Iagnemma and Overholt, 2015]: it aimed to develop semi-autonomous ground robots, specifically humanoids, able to perform complex tasks in dangerous, degraded environments, such as disaster or emergency-response scenarios. Another interesting scenario is the one proposed by the Amazon Picking Challenge (APC) [Correll et al., 2016]: it assigned robots simplified versions of tasks that humans commonly face in warehouses, such as picking items from shelves and putting them into containers, and vice-versa.

As for the DARPA Grand Challenge, in the Challenge 2 of 2017 MBZIRC the robot had to operate outdoors: it had to navigate inside a 5060 m arena in order to detect a panel where a set of wrenches and a valve were located. In the MBZIRC Grand Challenge, while performing this task, the robot had to avoid collisions with robots and objects located in its workspace. Navigation is a complex problem, especially if obstacles populate the environments [Castaman et al., 2016]. Moreover, in unknown environments robots have to generate the map of their surroundings and localize themselves in this map. This problem is known as Simultaneous Localization and Mapping (SLAM) [Dissanayake et al., 2000]

and it is complex because, for localization, the robot needs a consistent map and, for acquiring the map, the robot requires a good estimate of its location. This mutual dependency between the pose and the map estimates makes the SLAM problem hard, and requires searching for a solution in a high-dimensional space 

[Grisetti et al., 2005]. Focusing on map representation, one popular approach is the occupancy grid [Moravec, 1988]

. Whereas such grid-based approach is computationally expensive, it provides detailed representations of the environment. Focusing instead on the robot’s pose estimation, Extended Kalman Filters (EKFs) are among the most used approaches: they estimate a fully correlated posterior over landmark maps and robot poses. However, if no assumption is made on both the robot motion model and the sensor noise, and if no landmark is assumed to be uniquely identifiable, then the filter will diverge

[Frese and Hirzinger, 2001]. An approach facing these issues is [Murphy, 1999]: it employs the Rao-Blackwellized Particle Filters (RBPF) and uses each particle of RBPF to represent a possible robot trajectory and a map. Several improvements to this approach were proposed: [Montemerlo and Thrun, 2003, Montemerlo et al., 2002] extended it for approaching the SLAM problem with landmark maps; [Fox et al., 2003], instead, improved it for reducing the number of required particles. In order to solve the SLAM problem of Challenge 2, we used an occupancy grid representation together with an improved version of [Fox et al., 2003]. The package is available in ROS as the GMapping package222 and allows to solve SLAM for laser range data through a highly efficient RBPF [Grisetti et al., 2007, Grisetti et al., 2005]. As stated in [Grisetti et al., 2005], instead of using a fixed proposal distribution, the algorithm computes on the fly an improved proposal distribution on a per particle base. This allows to directly use most of the information obtained from the sensors, while generating the particles. In this way, the algorithm draws the particles in a more effective way. Moreover, the highly accurate proposal distribution allows to utilize the number of effective particles as a robust indicator to decide whether a resampling should be carried out.

Working outdoors means dealing with light variations. This is a known problem for outdoors applications and is even more crucial when detecting obstacles. In our case, besides the light variations, another issue has to be considered: the panel to be detected is black. It should be noticed that a surface that appears black does not reflect the visible light. As stated in [Lacroix and Chatila, 1997], in order to detect the panel in such conditions, other techniques have to be adopted instead of exploiting vision data. For these reasons, we decided to exploit the laser range data (see Section 4.1).

With respect to the detection, recognition, and localization of both the right wrench and the valve, two are the main issues: the task is outdoor, confirming the above mentioned issues, and wrenches and valve are metallic, meaning that they cause strong reflections. Several methods can be exploited for tackling such object detection task. Shape-based approaches [Zhu et al., 2008] demonstrated strong performance when the object shape is accurately known. Hybrid approaches combining together a shape-based method and an object detector were also presented in the literature [Ferrari et al., 2010]

, that demonstrated a very high accuracy. However, such methods are often based on iterative algorithms and are computationally expensive. Simpler methods can be efficiently exploited in the context of the challenge, in order to reduce the time needed for the detection, thus enhancing the robot reactivity. We used a Cascade of Boosted Classifier working with Local Binary Patterns (LBP) features 

[Felzenszwalb et al., 2010]. Haar features [Viola and Jones, 2001, Lienhart and Maydt, 2002] are another common choice. Both approaches represent a good compromise between accuracy and computational workload. However, we chose LBP because both the training and the detection procedures are faster then Haar (typically in the order of seconds or minutes).

Once detected the wrench, the manipulation routine can start. The task reminds APC [Correll et al., 2016]. The main difference is that in APC objects were unknown, of changing size and shape, characterized by properties that make the grasping difficult, like deformation, transparency or reflectivity. In our case, objects are reflective but known. This means that the manipulation procedure can be precomputed and only the perception routine has to be carefully implemented in order to precisely retrieve the position and orientation of every object while minimizing sensor noise.

3 Robot System Design

Our robot is a UGV able to operate in autonomous mode as well as in teleoperated mode. Indeed, despite the innovation introduced by autonomy, human intervention through teleoperation in the manual mode guarantees the possibility to recover the robot routine when facing unforeseen situations or sensor failures, frequent scenarios of a robotics competition. For each modality, the description of both the hardware and software architecture follows.

3.1 Hardware Architecture

The robot (see Figure 1) is composed of a mobile Summit XL HL (Robotnik Automation S.L.L.333 The mobile base has been modified in order to install, on its top, a Universal Robot UR5 manipulator arm (Universal Robots A/S444, equipped with a Robotiq Adaptive Gripper with 3 fingers (Robotiq555 We called it RUR53 because of the initials of each robot part manufacturer (Robotnik Summit XL HL - Universal Robot UR5 - Robotiq 3-Finger Adaptive Gripper). All these hardware components are fully supported by ROS.

We chose the Summit XL HL robot for the following reasons: i) speed: the mobile base can run up to 10.8 km/h (depending on the load), resulting faster than other competitors like the ClearPath Robotics Husky666 (3.6 km/h); ii) autonomy: Summit XL HL has an autonomy of more than 3 hours while maintaining good performances; iii) cost: the robot is cheaper with respect to its competitors, e.g., the ClearPath Robotics Husky, the ClearPath Robotics Grizzly 777 or the Segway RMP 440888; iv) customizability: the robot lets us customize most of its interfaces and add new ones; v) payload: the payload of the mobile base is 65 kg, less than the payloads of its competitors (75 kg for the Husky, 180 kg for the Segway RMP 440, and 600 kg for the Grizzly robot) but still enough. As the challenge was subject to a time constraint (20 minutes per round) and our budget was limited, we preferred Summit XL HL to its competitors while keeping in mind its payload during the selection of the other components. We equipped the mobile base with rubber band wheels, actuated by 500 W hub brushless motors. Hall Effect sensors and a reduction gear box were mounted inside the aluminum wheels. This mechanical structure lets us carry heavy loads and provides ample traction (see Figure 2 - BASE).

Figure 1: RUR53 hardware configuration in the arena. Fig. 0(a) depicts RUR53 configuration in the autonomous mode. Fig. 0(b) shows the robot in the manual mode. Fig. 0(c) zooms in on the gripper’s cameras in the manual mode. Fig. 0(d) focuses on the custom Robotiq 3-finger gripper. Cameras in all configurations are surrounded by green boxes as well as gripper’s contours. Fig. 0(d) is taken from the match video.

The manipulator arm has 6 DOFs, a payload of 5 kg at most, and a working radius of 850 mm (see Figure 2 - ARM). As the manipulation tasks of the Challenge involved lightweight objects, choosing a cheap, lightweight robot, with a sufficient working range was the best choice. Moreover, a lightweight robot is less dangerous and easier to handle than heavy robots requiring special moving equipment.

The gripper has a weight of 2.3 kg and generates a grip force in pinch mode ranging from 15 to 60 N 15%, with a maximum recommended payload of 10 kg (see Figure 2 - HAND). Its closing speed ranges from 22 to 110 mm/s. Having a 3-finger gripper permits the grasp of objects different geometry and dimension, guaranteeing the generality of the implemented manipulation approach. In order to better grasp the wrench, we customized the finger ends (see Figure 0(d)) by creating a groove able to canalize the grasped tool.

In order to retrieve the panel, the mobile base has been equipped with two 2D laser scanners (SICK Sensor Intelligence999 with an aperture angle of 270°, an angular resolution of 0.25°- 0.5°, and an operating range of 0.5-50 m. The combination of these sensors lets us inspect almost all the arena when the robot is moving (see Figure 2 - BASE).

Figure 2: RUR53 connection diagram

With respect to the processing power, Summit XL HL is equipped with a JETWAY JNF9J-Q87 board with 4 GB RAM. This board was not powerful enough to provide the necessary computational power; for this reason, an external computer with an i7-6700 CPU, 16 GB of RAM and a dedicated NVidia GTX 1070 was added to the configuration and mounted on the mobile base.

While recent years have seen the success of efficient and low-cost RGB-D sensors, as the Microsoft Kinect v2, controlled by efficient drivers [Carraro et al., 2016a, Carraro et al., 2016b], we decided to perform the visual processing by using two Grasshopper3 2.8MP Mono USB3 cameras (Point Grey Research, Inc.101010 (see Figure 2 - HEAD). This vision system was more reliable with respect to the Challenge environment (e.g., the outdoor light). The only characteristic that makes the robot’s autonomous configuration different from the teleoperation one is the location of the cameras. For the autonomous mode, the two Grasshopper3 cameras are located next to each other on a fixed support mounted above the gripper (see Figure 0(a)). This configuration guarantees the tool recognition even if the end-effector is very close to the panel. Moreover, exploiting the combined output of two cameras ensures a greater recognition accuracy. Teleoperation, instead, requires an optimal situation awareness during the robot motion. To this scope, one camera was mounted on the mobile base (see Figure 0(b)) and two cameras were mounted under the gripper (see Figure 0(c)). The first camera gives a panoramic view of the robot surroundings, it allows to observe the movement of the arm (and gripper) of the robot, as well as its interaction with the environment. Last, it allows to monitor the state of the wheels. The cameras under the gripper can be used to visually verify the correct grasp. Since they are mounted at the robot’s end-effector, which can be moved, they can be used to extend the view of the operator, for example to view a scene from another perspective if the view from the base is occluded.

Besides the keyboard teleoperation setup, the robot is equipped with a USB joystick. It lets the user imprint manual movements such as directions and speed controls. Its usage is more user friendly and lets us easily move the RUR53 outside the arena.

MBZIRC specifications constrained competitors to a 5 GHz network. We decided to equip the mobile base with a TP-Link Archer C7 AC1750 WiFi router, which allows 2.4 and 5 GHz transmission with up to 300 Mbit/s. Figure 2 details the connection diagram: lasers, hand, arm, and the external computer are Ethernet connected to the mobile base through this router. The joystick is USB connected to the mobile base. Cameras are connected to the external PC via USB 3.0.

In case of undesirable actions or emergency, RUR53 can be stopped through two dedicated switches. One is the emergency button of the SUMMIT XL HL mobile base and the other is the emergency button of the UR5 manipulator robot. Emergency buttons let disable drivers and stop the mobile base or the manipulator if activated.

RUR53 receives the power supply from a LiFePO4 battery pack. The pack is composed of sixteen 3.2 V LiFePO4 cells and a protection circuit module. With this set of batteries, the robot is able to operate up to 3 hours or more, depending on the robot usage. Moreover, it is possible to charge the robot and keep working at the same time without any problem.

3.2 Communication

In order to teleoperate the robot and to know its status, the robot should be able to provide feedback, while the operator should be able to send data to the robot. This dual-channel communication was complicated by the fact that the Challenge rules prohibited teams to directly connect to the robot networks. As depicted in Figure 3, the communication system is divided into two sub-parts: the robot internal network and the one provided by the organizers. The robot and all the components in its network use the ROS middleware. All the components have been assigned with static IPs, in the 192.168.53.X addressing space, in order to be easily reachable and to allow the connection to the roscore module running on the mainboard. The dual-channel connection is enabled by a router with two different WAN interfaces: a wired one connected to the robot Intranet and a wireless one connected to the provided MBZIRC network. Given the limitations imposed by most of the router stock firmwares (e.g., the possibility of using Wireless interfaces as WAN interfaces), we decided to flash the router with the last release of the open DD-WRT111111 firmware. In this way, we were able to configure the router as if it were a normal computer equipped with an open operating system (e.g., the Linux operating system). As said in Section 3.1, we implemented our communication connection by using a TP-Link Archer C7 AC1750 router.

Using this solution, robot packets were correctly delivered to us by traversing the two networks. However, the other communication direction (i.e., the communication between us and the robot) was still problematic since the MBZIRC network router did not know any information about the RUR53 network. Given that we did not have access to this access point, the problem was fixed by setting a static route to the computer we used in the MBZIRC network. In this way, all the packets for the RUR53 network were redirected directly to the MBZIRC WAN IP of the RUR53 router, bypassing the default gateway (i.e. the MBZIRC access point).

Figure 3: The communication diagram.

3.3 Software Architecture

We decided to formally model RUR53 operation mode as a Finite State Machine where: , the set of states, is defined as the set of tasks the robot has to fulfill in order to complete the challenge; , the input alphabet, namely the set of input letters, is the set of preconditions and effects of the actions to be performed in order to accomplish these tasks; and , the state transition function

, maps every task to a new one depending on the effect of its actions. At any moment, RUR53 is supposed to be in some state

, the machine reads an input letter , and makes a transition to the state . After that, the machine is ready to accept another input letter.

The main modes are two: the autonomous mode (Figure 4 - right) and the manual one (Figure 4 - left). In the autonomous mode, RUR53 navigates within the arena searching for the panel. Once the panel is found, the robot approaches and inspects it searching for two Regions of Interest (ROI), corresponding to the wrenches and the valve stem, respectively. If no tool or wrench is found, RUR53 moves around the panel and inspects its other side. Otherwise, it examines the ROI in order to recognize and classify the right wrench. Once the wrench is found, RUR53 executes the actions managing the grasping and valve manipulation tasks. A recovery plan has been designed in order to manage the fault condition, called wrench lost in Figure 4. In this case, RUR53 repeats the wrench recognition and grasping steps in order to detect another wrench suitable for the valve manipulation. According to specifications, only two wrenches could operate the valve: we implemented the recovery routine taking into consideration this information. In the teleoperation mode, instead, the keyboard is used to send velocity commands to the robot, which executes the commands until the task is completed or the teleoperation terminates.

The structure has been optimized in order to be as modular as possible and allow the user the possibility of splitting the code and independently reusing its sub-modules. The machine is coded as a ROS framework with one ROS master (running on the mainboard) and one ROS action for each sub-task. An emergency stop routine has been implemented: it stops all the processes when needed. All the code is C++ based.

Figure 4: RUR53 Finite State Machine.

4 Software Modules

4.1 Navigation and Docking

The arena in which the robot has to find the panel is well defined by the Challenge rules. However, we only know that the panel could be located along a curve in the farthest side of the arena (see Figure 5) and the Grand Challenge specifications do not exclude the presence of obstacles to be avoided (e.g. drones, the landing van). In order to cope with these problems, we adopted a semi-autonomous approach for the exploration of the arena. The robot has to visit a set of manually pre-defined way-points }, which are periodically visited until the panel is found. The navigation path is depicted in Figure 5. Between each couple of points, in order to prevent any collision, we implemented an exploration node exploiting the move_base121212 package planners and the GMapping algorithm [Grisetti et al., 2007, Grisetti et al., 2005] with the laser scan and the robot odometry as inputs.

Figure 5: The navigation path in the challenge arena. The robot tries to reach the way-points }. They are visited periodically until the panel can be found.

While the exploration task is running and the robot is following its path, another task is in charge of finding the panel. The input data is composed of the merged Laser Scan coming from the two range sensors. The fusion of laser data is done by means of the iralab-package-tools131313 given two laser scans (and the transformation between them) as input, the package outputs a merged laser scan referred to the base link of the robot. The reference frames of the lasers have been semi-manually aligned by means of ICP [Zhang, 1992] while showing a known and fixed-sized object in the laser shared FOVs. In order to locate the panel in the scan, we take advantage of the fact that we know its size (from the challenge rules). Therefore, we proceed to an euclidean clustering and a classification phase. Given a set of clusters from the initial laser scan, we obtain a subset where each contains a set of points which is similar to a line. This set is identified by fitting a line model using the RANSAC [Fischler and Bolles, 1981]

algorithm on each cluster. The clusters with too many outliers are not considered in the

set. The final step is to sort the subset so that is the most similar cluster to the panel theoretical size, is the second most similar and so on. To obtain the correct ordering we consider the size of the Oriented Bounding Box (OBB) of each cluster. Given the generic cluster , we obtain the sizes , and of its OBB by considering the minimum and maximum point in the cluster expressed in the new reference system generated by the principal components obtained from the covariance matrix of each cluster . Let us assume N as the number of points in the generic cluster , then Equation 1 defines the mean and the covariance matrix of :


Equation 2 shows how to calculate the principal components of :


Equation 3 shows how to calculate the transformation . This transformation brings each from the original reference frame (i.e. the base_link of the robot) to the new reference frame expressed by the principal components of .


Finally, Equation 4 shows how to calculate the sizes of the final OBB of , given the minimum and maximum points in the new reference system obtained by applying the transformation to each point in the cluster.


As we do not know the orientation of the panel, we sort the dimensions of each and we compare them to the ground truth panel dimensions by computing a similarity score. The list of clusters ordered by this similarity score is passed to the state manager: the cluster with the highest score represents the panel. Once the panel is found, the navigation task is completed and we can ask the robot to dock parallel to the panel, at its right (considering the manipulator robot at the head of RUR53). Once docked, the robot looks for the wrenches and, if it is not able to find them, the state manager will move the robot to change side.

Even if the robot is programmed to dock at a pre-determined position with respect to the panel, the perfect alignment of robot and panel is not guaranteed in practice. Hence, the actual docking angle between robot and panel, , is measured and fed to the manipulation modules. The panel angle measurement procedure takes as input the laser scan and the robot transformation tree, and consists of four steps: i) laser scan filtering; ii) line fitting with RANSAC [Fischler and Bolles, 1981]; iii) calculation of , which is the docking angle between 0°and 90°and iv) calculation of , which is the docking angle in a wider and more useful range between 0°and 180°. In particular, given that the robot position with respect to the panel is approximately known, part of the laser points framing the panel can be easily retrieved by angle filtering. Then, can be calculated with the line-plane intersection formula, in which the line with coefficients is the panel line estimated in the previous step and the plane with coefficients is the robot side facing to the panel:


In any case, the dot product of the plane vector and the line vector provides the complementary to 90°. Hence, to provide an angle in a wider range

to the grasping module, a further step is required. First, the intersection between the panel line and the robot side plane is calculated. Then, the laser points are projected to the robot side plane. This way, we can distinguish between and by comparing the intersection point coordinates with the closest projected point coordinates.

4.2 Perception

Once docked, RUR53 starts looking for wrenches possibly turning around the panel if the detection fails. Once wrenches are detected, the robot has to recognize the right one, pick it up, and use it to operate the valve. Implementing this module is challenging: i) both wrenches and valve are metallic ii) the challenge is outdoors. Non-negligible reflections follow together with the need of an approach stable to light variations. Moreover, ISO 7738, cited by the Challenge’s rules, defines the dimension of the wrench head but it does not fix the overall length of the wrench itself. To address these difficulties, we developed an approach based on both 2D and 3D data where left and right camera images are used to reconstruct the point cloud by using the Semi-Global Block Matching (SGBM) [Hirschmuller, 2008] algorithm, provided by the image_pipeline141414 ROS package. Wrenches and valve are recognized by an object detector and an accurate pose estimator lets retrieve their poses for manipulation.

4.2.1 Wrench and Valve Classifiers

We trained two Cascade of Boosted Classifiers based on LBP features [Felzenszwalb et al., 2010]: one for the wrenches and one for the valve. During the training we used the Hard Negative Mining [Liao et al., 2007] technique in order to increase the number of samples in the datasets by repeatedly adding false positive and false negative samples in order to improve the performance of the classifiers.

With respect to the wrench classifier, we focus on the head of the wrench, since the handle is the most variable part of the wrench and may jeopardize the performance of the classifier. Moreover, lab tests highlighted that sensory data is more noisy where the wrenches are attached to the panel than at the head where there is just void space.

4.2.2 Wrench Detection and Pose Estimation

First, the robot has to detect the area where wrenches lie. We use the wrench classifier described in Section 4.2.1 to detect this area and the robotic arm aligns to it, in order to get the best possible field of view from the stereo cameras. Then, the robot has to recognize the right wrench. Six different wrenches are placed on the panel in a random order and orientation, but only two of them can be used for manipulating the valve. Our approach focuses on the detection of one of the two right wrenches, leaving the detection of the other one as a backup when the first is lost. The procedure is composed of four phases: i) wrench detection; ii) wrench analysis; iii) median accumulation and pose estimation; iv) wrench sorting and selection. The process exploits images generated by the left camera and the 3D Point Clouds reconstructed by applying the SGBM algorithm on the images generated by left and right cameras.

In detail, we apply the wrench detector in order to find the head of all the wrenches — the classifier provides a ROI for each head detected (see Figure 6(a)). For every detected wrench, we estimate: i) the pose of the grip center of the head; ii) the wrench orientation; iii) the pose of the handle centroid (see Figures  6(c) and 8). In order to be robust to wind, reflections, and noise, all the three values have been estimated after accumulating the single-view values over 10 frames. In particular, we consider the median of the 10 different values because of its robustness to outliers.

Wrench analysis is divided into two parallel parts (see Figure 6): one focused on the handle, in order to find the grasping point, and the other focused on the head of the wrench, to estimate the wrench orientation and the pose of the center of the head.

The analysis of the head centers starts from the ROI returned by the detector. Experimental tests confirmed that we can estimate the region containing the handle of a wrench by extending the region containing the head along the vertical axis. This means that, if the detected head bounding box is , than the bounding box of the handle becomes: .

The grasp point is then evaluated starting from the wrench handle. We use the bounding box around the handle to segment the corresponding region on the Point Cloud. As a first step, we extract all the points that have a maximum distance of 1 m from the camera. Then, in order to remove points belonging to the panel or other outliers, we calculate the mean distance of points from the camera , where is the number of points in the cluster and is the coordinate perpendicular to the plane of the camera. Then, we remove from the cluster all the points that have a distance larger than 1.5 cm from the mean value (this value being experimentally chosen). In the end, the plane in which the handle of the wrench lies is estimated by fitting a plane modeled using the RANSAC [Fischler and Bolles, 1981] algorithm on it. All points fitting the model must have a maximum Euclidean distance of 1 cm from the plane; outliers are then discarded. Once the segmentation of the point cloud corresponding to the handle is achieved, we calculate the grasp point as the centroid of these points. We consider the coordinate of the grasp point perpendicular to the panel. The panel orientation is estimated after the docking as described in 4.1.

Figure 6: Wrench (3D center, orientation, grasp point) data extraction.
Figure 7: Wrenches detection, recognition, and grasp point estimation. Figure 6(a) depicts the result of the detection routine: wrenches heads are surrounded by red squares. Figure 6(b) depicts the output obtained after the triangulation of a wrench head: the grip center of the head (red circle). Figure 6(c) shows the point where the wrench has to be grasped and the grip center of its head.
(a) Front view.
(b) Side view.
Figure 8: Wrenches orientation axes: (green, red, blue) colors represent (x, y, z) axes, respectively.
Figure 9: Results of the wrench classification phase on the Challenge panel.

The wrench head needs to be detected in order to plan how it will operate on the valve. Orientation needs to be accurately evaluated. This processing is performed on the ROI extracted by the wrench detector. First, we binarize the image by using the Otsu’s thresholding algorithm 

[Otsu, 1979] as in the upper part of Figure 6(b). Then, we retrieve the contours of the binarized image by using the algorithm developed by Suzuki [Suzuki et al., 1985]. We find the convex hull of the contours using the Sklansky’s algorithm [Sklansky, 1982]. This allows us to estimate the 2D coordinates of the grip center by calculating the centroid of the green triangle visible in Figure 6(b). We calculate the x- and y- orientations (green and red lines of Figure 8) of the grip center by calculating the angle of the line passing through the green point and the grip center itself. The formula for this calculation is reported in Equation 6:


where are the coordinates of the deep point (in green) and are the coordinates of the center point (in red). The z-orientation, as for the handle of the wrench, is considered perpendicular to the panel (blue line).

Finally, to find the 3D coordinates of the grip center, we project the 2D coordinates of the center onto the plane as follows. First, we find the 3D ray in the world coordinate system corresponding to the 2D coordinates of grip center. Then, we find the intersection of the ray with as in Equation 7 and Equation 8:


where are the coefficients of the plane and are the coefficients of the 3D ray.

Figure 9 shows the result of the detection algorithm working on the real panel used for the Challenge.

4.2.3 Valve Detection and Pose Estimation

Figure 10: Valve detection pipeline.
Figure 11: The valve detection and pose estimation. (a) shows the contours extracted from the initial image, (b) shows the best combination of 4 segments that respects the constraints and (c) shows the final pose.
Figure 12: Valve orientation axes: (green, red, blue) colors represent (x, y, z) axes, respectively.

Once the right wrench is grasped, we need to lock the wrench around the valve stem. In order to successfully complete this task, it is necessary to locate the valve position and orientation. The valve has a large metal surface and its stem generates many reflections that cause a noisy 3D reconstruction, hence the obtained Point Cloud is unsuitable for an accurate processing. For this reason, we decided not to use a dense 3D reconstruction, but rather the Triangulation [Hartley and Sturm, 1997] algorithm on the interest points extracted from the stereo images. We define a point of interest as a point on the center of the valve.

As a first step, we align the cameras to the valve by moving the robotic arm. This movement is possible since the distance between the wrenches and the valve stem is known and fixed. By positioning the cameras in front of the valve, we can considerably reduce noise and shadows due to reflections.

To estimate the center of the valve and its orientation, we focus our analysis on outer edge of the stem, that we call the head of the stem. The proposed approach, depicted in Figure 10, considers left and right camera images in order to find the center of the stem in both. First, the valve detector extracts the region of the image that contains the valve, then the Canny Edge Detector [Canny, 1986] algorithm is used to examine this ROI and find the valve edges. Finally, we extract all line segments that may belong to the head of the stem using the Probabilistic Hough Transform [Matas et al., 2000]. The result is shown in Figure 10(a). Given the set of lines and taking into consideration the fact that the shape of the valve is a square of known edges (from the Challenge rules), we search for four perpendicular segments defining its head: first, we remove segments which are too short; then, we generate all the possible combinations of 4 segments. For each combination of 4 segments we check if: i) there exists an edge of a segment close to the edge of another segment (only in this case their combination forms a vertex of the square); ii) there exists a segment parallel to another segment; iii) there exists a segment perpendicular to another segment. If we cannot find a combination of segments satisfying all the three constraints, we repeat the check on all possible groups of 3 segments. Figure 11 shows this process. Once the lines are found, we calculate the center of the valve and compute its orientation angle by using Equation 9, normalized between 0°and 90°. As of before, the z-orientation of the valve is considered perpendicular to the panel, whose orientation is estimated after the docking as described in Section 4.1.


Finally, the centers of the valve obtained in left and right images are triangulated using the Hartley’s algorithm [Hartley and Sturm, 1997] and the 3D position is obtained. Figure 10(c) shows the result of the triangulation and Figure 12 shows the result of the algorithm on the panel used for the Challenge.

4.3 Manipulation

Figure 13: The autonomous wrench grasping routine.

The manipulation routines are in charge of picking up the right wrench and operating the valve. Once the wrench area is located on the panel (Section 4.2.2), the end-effector is aligned to its center. In this way, cameras are aligned with the detection ROI. The detection routine starts and once the right wrench is recognized and its grasp point is computed (Section 4.2.2), two oriented reference systems are defined: one on the grasp point and the other on the grip center of the head (See Figure 8). The wrench grasping routine starts (See Figure 13): the end-effector is aligned to the grasp point (fingers opened), it approaches the wrench, closes the fingers, and comes back to its initial pose.

Since the panel geometry is known, and therefore the position of the valve on it, we can approach it with the gripper. Again, once the gripper is aligned, the detection module (Section  4.2.3) is triggered and the pose of the valve is estimated. An oriented reference system is placed on the center of the valve (See Figure 12). The robot inserts the wrench on the valve by aligning the reference systems of valve and wrench grip center and a 360°rotation trajectory is imprinted. This trajectory is computed by sampling a set of way-points on a radius around the valve, with the distance between the center of the valve (equal to the wrench grip center) and the wrench grasp point.

4.4 Teleoperation

In this Section, we present the three main software components of our teleoperation module: i) mobile base teleoperation; ii) robotic arm teleoperation; iii) gripper teleoperation. These modules run in parallel and map robot movements and speed settings to different keyboard keys. This way, the pilot can control every robot movement by means of the keyboard only and the human-robot interface can be as efficient and capable as possible.

Our mobile base teleoperation is provided by the TurtleBot teleoperation package in ROS151515, which allows to control it in velocity and can be easily adapted to work with our mobile base. This interface proved to be more effective than sending goals in position coordinates to the ROS navigation stack even if this can be done through the handy graphical interface provided by the rviz ROS package161616 Indeed, this second modality does not allow the required precision during the panel docking phase when the pilot cannot see the robot because it is hidden by the panel itself.

The arm teloperation is provided by our own package. It allows to move the robotic arm, in this case its tool center point, with velocity commands with respect to a custom reference system, i.e. the arm base. This way, we could move the arm before grasping the wrench and to reach the valve. This module is also in charge of rotating the wrench around the valve. We performed the 360° turn of the valve with many small rotations of about 10°. The number of rotation degrees is settable online. This proved to be essential; we made smaller rotations (5°) when the grasp was not much firm and risked to be lost, while bigger rotations for firm grasps (15°). The Robotiq ROS industrial package 171717 was used to open and close the gripper fingers.

Given that the working area was far from the team base station, where the pilot teleoperates the robot, and that both the wrenches and the valve were not facing towards it, the pilot was not able to clearly see the robot. Thus, we exploited the robot cameras for the task: two gray-scale cameras and an additional fish-eye color camera, the latter not being required in the autonomous mode. As already displayed in Figure 0(b), one gray-scale camera was positioned in such a way that the arm and the environment were clearly visible. The other gray-scale camera and the fish-eye camera, see Figure 0(c), were positioned just above the hand so as to see the hand and the grasp clearly with an inside-out perspective. Despite the lack of 3D data, having three different sources of feedback allowed the pilot to make the right decisions when accurately positioning robot, arm and gripper. A well-known limitation of video feedback is the bandwidth requirement. We overcame it by streaming compressed videos in Theora codec format.

5 Experimental Results

During the challenge we could test only our perception routine (see Lessons Learned of Section 6). All other experiments were performed in the lab.

5.1 Docking

The goal is to approach the robot as close and parallel as possible to the panel. We evaluated: i) the z-distance separating the mobile base from the panel (in meters); ii) the x-offset separating the two reference systems (in meters); iii) the docking angle (in degrees). See Figure 14.

Desired Value Average Median Max Min d (m) 0.80 0.78 0.78 0.81 0.77 o (m) 0.00 -0.15 -0.17 -0.08 -0.20 (deg) 0.0 9.4 7.8 15.5 5.4
Figure 14: Docking performances. In detail, is the distance (in meters) between the robot and the panel expressed along the z axis (blue line); is the offset (in meters) separating the two reference systems along the x axis (green line); is the docking angle: the angle between the robot and the panel (in degrees).

The table in Figure 14 shows the results obtained in 10 trials. These trials were executed inside our laboratory on a smooth tiled floor. An average error of 0.02 m can be noticed with respect to , confirming that the proximity constraint is almost always achieved. With respect to , an average error of 0.15 cm has been computed, with a maximum error of 0.2 m. This value is high and, most of the time, is negative. We justify it with the high load that the robot has to transport. Focusing on , an average error of 9° is observed, with a maximum error of 15°. In this case, we think that the error is due to the friction of the wheels which have to carry a high load. Figure 15 shows a trial inside the lab.

Figure 15: On the left, the robot while it is docking in an indoor scenario. On the right, a portion of the laser scan that can be retrieved from the robot. In the center of the right image the reference frames of the robot can be seen, while the white lines are the laser data retrieved.

5.2 Wrench and Valve Classifiers

5.2.1 Wrench and Valve Datasets

We used two cameras, a Bumblebee2181818 and a Grasshopper3, in order to acquire a wrench and a valve dataset. Images were taken under different lighting conditions, indoors and outdoors, and at different hours of the day, to include multiple reflection conditions and shadows. We used the Hard Negative Mining [Liao et al., 2007] technique in order to increase the number of samples in the datasets by repeatedly adding false positive and false negative samples. Increasing the dimension of our datasets let us improve the performance of the classifier.

Wrench Dataset

The wrench dataset is composed of 830 positive samples and 7350 negative samples. While acquiring the images, we used four different sets of wrenches to avoid overfitting of a specific wrench model. Positive samples are represented by 100100 pixel images. This format guarantees a good balance between high resolution and reduced training time. Negative samples, instead, depict backgrounds and other objects, such as wrench shadows on the panel, metallic supports of the panel to which wrenches are attached, and the valve.

Valve Dataset

The valve dataset was generated by taking images of different orientations of the valve and different rotations of the stem. A set of 450 100100 pixel valve images represents the collection of positive samples. 7490 background images, instead, define the negative samples.

5.2.2 Performance Measurement

In order to evaluate the performance of the trained classifiers, we analyzed the following elements:


where , , , are the true positive, true negative, false positive, and false negative counts. The score [Rijsbergen, 1979]

is an harmonic average of precision and recall promoting high recall, i.e., a low number of false negatives.

Classifier Accuracy Precision Recall
Wrench 94.1% 97.0% 93.6% 94.3%
Valve 95.1% 98.2% 94.1% 94.9%
Table 1: Performances of the wrench and the valve classifiers on their test sets.

Tests were performed on 25 test sets per type: 25 sets for the wrench images and 25 for the valve images. Every test set was obtained by randomly splitting the corresponding dataset into two sub-parts: the 70% of the dataset was used as training set and the remaining 30% as test set. Table 1 shows the obtained results. The wrench classifier is characterized by a good performance: a high precision (97.0%) and an high recall (93.6%) let achieve a score of 94.3%. Also the valve classifier has good performance: it is characterized by a precision of 98.2% and a recall of 94.1%, reflecting a score of 94.9%. We applied the same procedure to the 255 wrench samples and the 40 valve samples collected during the Challenge. Table 2 shows the obtained results. The wrench classifier has a true positive rate of 100% and a false positive rate of 0%. With respect to the valve classifier, we observed a true positive rate of 90% and a false positive rate of 10%.

Classifier True Positive False Positive Samples
Wrench 100% 0% 225
Valve 90% 10% 40
Table 2: Performances of the wrench and the valve classifiers obtained during the challenge.

5.3 Wrench and Valve Recognition and Manipulation

5.3.1 Wrench Recognition and Grasping

We evaluated the wrench recognition and grasping routine by asking the robot to correctly recognize and pick up a target wrench for 50 trials. In order to acquire resentative data, we hung random sets of 6 wrenches to the panel. These tools were selected among 10 different types of wrench. Different sizes were considered. Their poses (positions and orientations) were modified at each trial as well as the type of the target wrench.

According to the Challenge regulations, points are assigned to the team depending on whether the robot correctly recognizes and grasps the wrench, even if the grasp does not let the robot manipulate the valve, or correctly recognizes and grasps the wrench, and the grasp lets the robot operate on the valve. For this reason, we classified the obtained results into: i) Correct Recognition: the robot recognizes the right wrench; ii) Correct Grasp: the robot recognizes the right wrench, grasps it, and the grasp lets the robot operate on the valve; iii) Grasp: the robot recognizes the right wrench, grasps it, but the grasp is not accurate; iv) Loss: the robot loses the wrench during grasping. Table 3 summarizes the performance obtained in the lab. Our system correctly recognizes the wrench 92% of the time and it correctly grasps it 70% of the time. 16% of time, instead, the robot picks up the wrench, but the tool is not correctly grasped, making it impossible to operate on the valve. We can conclude that 86% of time we are able to score points.

Correct Recognition Correct Grasp Grasp Loss
92% 70% 16% 14%
Table 3: Performances of the wrench detection and grasping routine on 50 reps.

5.3.2 Valve Detection and Manipulation

The valve manipulation task requires the robot to insert the wrench into the stem of the valve and rotate the valve clockwise by 360°. In order to insert the wrench into the valve, we have to estimate the orientation angle of the stem. We performed 50 tests making the robot estimate four different stem angles : 0°, 15°, 30°, and 45°.

Table 4 shows the obtained results. We evaluate these results in terms of average error , where is the number of target angles and is the i-th target angle. If represents the average angle, than is equal to . The maximum average error is equal to and the minimum average error is .

5.3.3 Execution time

Table 5 summarizes the execution time of the wrench and valve recognition algorithms. Input images are black and white and have a resolution of 964724 px. Tests were performed on a computer with an i7-6700 CPU, 16 GB of RAM and a dedicated NVidia GTX 1070 configured with Ubuntu 14.04 and ROS Indigo. The wrench detector needs 3.338 s to find the right wrench (average time computed on 50 trials). In the worst case, 3.770 s are required to recognize the right wrench. The valve detector needs 0.059 s (average time computed on 50 trials), 0.071 s in the worst case.

(deg) Average Median Maximum Minimum
0 0.21 0.01 0.73 0.00
15 16.51 16.39 17.53 15.96
30 30.79 30.65 33.69 29.20
45 45.24 45.00 46.01 45.00
Table 4: Performances of the valve detection routine. Measures are in degrees.
Average Median Maximum Minimum Std. deviation
Wrench detection time (s) 3.338 3.305 3.770 3.266 0.106
Valve detection (s) 0.059 0.059 0.071 0.052 0.005
Table 5: Execution times of wrenches and valve detection algorithms. Measures are in seconds.

6 Challenge Performances and Lessons Learned

In Abu Dhabi, during the first rounds of the challenge, the robot was unbalanced since we almost reached the maximum mobile base payload. This problem complicated the autonomous navigation and panel detection routines: the laser scan plane was not parallel to the ground. Thus, when searching for the panel, the barriers of the arena and the floor led to candidates of size comparable to the panel. Furthermore, our laser could not reliably sense the panel because of its appearance and the challenge environmental conditions. The material of the panel, its black color, and the high temperature (about 40°C) caused a really noisy detection of the panel laser scan: only part of the panel was visible and not from all the view points. Thus, we decided to face the Grand Challenge in manual mode. This decision let us rank third in this last match in collaboration with Czech Technical University in Prague, University of Pennsylvania, and University of Lincoln.

The obtained result taught us that harmonizing technical requirements with the necessity of a low cost robot is challenging. RUR53 unbalancing could be solved with a mobile base with a higher payload: having for example a larger support polygon would have minimized the need for balance control. Issues in retrieving the black panel, instead, could have been solved by mounting an additional laser, capable of working with dark surfaces, or through a multi-view algorithm. Finally, we learned that having omnidirectional wheels would have allowed faster and more precise correction movements in front of manipulation tasks: our robot needed to turn in order to move sideways — however, an unpaved arena (not suitable for omnidirectional wheels) was expected at the time the robot for the competition was chosen.

7 Conclusion

In this paper we presented RUR53, the mobile manipulator robot that we developed in order to face Challenge 2 and the Grand Challenge of 2017 MBZIRC. The robot had to navigate inside an arena, reach a panel, recognize a wrench on it, pick it up and use it in order to operate on a valve stem located on the panel itself. The paper detailed both the hardware and software architectures that we developed in order to let the robot achieve the assigned task both in an autonomous and in a manual mode. An evaluation of its performances is also included.

Even if the autonomous perception system is characterized by good performace, the payload of the robot and its laser sensors showed some issues finding the panel and docking next to it. Thus, during the Gran Challenge, we utilized the teleoperation mode. This choice let us rank third in the Grand Challenge.


We would like to thank Mohamed Bin Zayed International Robotics Challenge (MBZIRC) organizing committee, and the Khalifa University (Abu Dhabi, UAE), for the financial support to our project (contract number: 2016-MBZIRC-13), by giving us the opportunity to participate in this event. Thanks to IT+Robotics Srl (Padova, Italy), EXiMotion Srl (Padova, Italy) for the financial support and to SICK Sensor Intelligence (Milan, Italy), and NVIDIA (USA) for providing some of the hardware equipment. Last but not least, a special thanks to Enrica Rossi, M.Sc., and all the students who have worked on the project: Alex Badan, Luca Benvegnù, Tobia Calderan, Riccardo Fantinel, Thomas Gagliardi, Leonardo Pellegrina, Matteo Tessarotto and Marco Zaghi.


  • [Canny, 1986] Canny, J. (1986). A computational approach to edge detection. IEEE Transactions on pattern analysis and machine intelligence, (6):679–698.
  • [Carraro et al., 2016a] Carraro, M., Munaro, M., and Menegatti, E. (2016a). Cost-efficient rgb-d smart camera for people detection and tracking. Journal of Electronic Imaging, 25(4):041007–041007.
  • [Carraro et al., 2016b] Carraro, M., Munaro, M., and Menegatti, E. (2016b). A powerful and cost-efficient human perception system for camera networks and mobile robotics. In International Conference on Intelligent Autonomous Systems, pages 485–497. Springer, Cham.
  • [Castaman et al., 2016] Castaman, N., Tosello, E., and Pagello, E. (2016). A sampling-based tree planner for navigation among movable obstacles. In ISR 2016: 47st International Symposium on Robotics; Proceedings of, pages 292–299. VDE.
  • [Correll et al., 2016] Correll, N., Bekris, K. E., Berenson, D., Brock, O., Causo, A., Hauser, K., Okada, K., Rodriguez, A., Romano, J. M., and Wurman, P. R. (2016). Lessons from the amazon picking challenge. CoRR, abs/1601.05484.
  • [Dias et al., 2016] Dias, J., Althoefer, K., and Lima, P. U. (2016). Robot competitions: What did we learn?[competitions]. IEEE Robotics & Automation Magazine, 23(1):16–18.
  • [Dissanayake et al., 2000] Dissanayake, G., Durrant-Whyte, H., and T., B. (2000). A computationally efficient solution to the simultaneous localisation and map building (slam) problem. In In Proc. of the IEEE Int. Conf. on Robotics & Automation (ICRA), page 1009–1014, San Francisco, CA, USA. IEEE.
  • [Felzenszwalb et al., 2010] Felzenszwalb, P. F., Girshick, R. B., McAllester, D., and Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9):1627–1645.
  • [Ferrari et al., 2010] Ferrari, V., Jurie, F., and Schmid, C. (2010). From images to shape models for object detection.

    International journal of computer vision

    , 87(3):284–303.
  • [Fischler and Bolles, 1981] Fischler, M. A. and Bolles, R. C. (1981). Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381–395.
  • [Fox et al., 2003] Fox, D., Hähnel, D., Burgard, W., and Thrun, S. (2003). An efficient fastslam algorithm for generating maps of large-scale cyclic environments from raw laser range measurements. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’03), pages 206–211.
  • [Frese and Hirzinger, 2001] Frese, U. and Hirzinger, G. (2001). Simultaneous localization and mapping - a discussion.
  • [Grisetti et al., 2005] Grisetti, G., Stachniss, C., and Burgard, W. (2005). Improving grid-based slam with rao-blackwellized particle filters by adaptive proposals and selective resampling. In Robotics and Automation, 2005. ICRA 2005. Proceedings of the 2005 IEEE International Conference on, pages 2432–2437. IEEE.
  • [Grisetti et al., 2007] Grisetti, G., Stachniss, C., and Burgard, W. (2007). Improved techniques for grid mapping with rao-blackwellized particle filters. IEEE transactions on Robotics, 23(1):34–46.
  • [Hartley and Sturm, 1997] Hartley, R. I. and Sturm, P. (1997). Triangulation. Computer vision and image understanding, 68(2):146–157.
  • [Hirschmuller, 2008] Hirschmuller, H. (2008). Stereo processing by semiglobal matching and mutual information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2):328–341.
  • [Iagnemma and Overholt, 2015] Iagnemma, K. and Overholt, J. (2015). Introduction to special issue: Darpa robotics challenge (drc). Journal of Field Robotics, 32(3):313–314.
  • [Lacroix and Chatila, 1997] Lacroix, S. and Chatila, R. (1997). Motion and perception strategies for outdoor mobile robot navigation in unknown environments, pages 538–547. Springer Berlin Heidelberg, Berlin, Heidelberg.
  • [Liao et al., 2007] Liao, S., Zhu, X., Lei, Z., Zhang, L., and Li, S. Z. (2007).

    Learning Multi-scale Block Local Binary Patterns for Face Recognition

    , pages 828–837.
    Springer Berlin Heidelberg, Berlin, Heidelberg.
  • [Lienhart and Maydt, 2002] Lienhart, R. and Maydt, J. (2002). An extended set of haar-like features for rapid object detection. In Image Processing. 2002. Proceedings. 2002 International Conference on, volume 1, pages I–900–I–903. IEEE.
  • [Matas et al., 2000] Matas, J., Galambos, C., and Kittler, J. (2000). Robust detection of lines using the progressive probabilistic hough transform. Computer Vision and Image Understanding, 78(1):119–137.
  • [Montemerlo and Thrun, 2003] Montemerlo, M. and Thrun, S. (2003). Simultaneous localization and mapping with unknown data association using fastslam. In 2003 IEEE International Conference on Robotics and Automation, volume 2, pages 1985–1991.
  • [Montemerlo et al., 2002] Montemerlo, M., Thrun, S., Koller, D., and Wegbreit, B. (2002). Fastslam: A factored solution to the simultaneous localization and mapping problem. In In Proceedings of the AAAI National Conference on Artificial Intelligence, pages 593–598. AAAI.
  • [Moravec, 1988] Moravec, H. (1988). Sensor fusion in certainty grids for mobile robots. AI Mag., 9(2):61–74.
  • [Murphy, 1999] Murphy, K. (1999). Bayesian map learning in dynamic environments. In In Neural Info. Proc. Systems (NIPS, pages 1015–1021. MIT Press.
  • [Otsu, 1979] Otsu, N. (1979). A threshold selection method from gray-level histograms. IEEE transactions on systems, man, and cybernetics, 9(1):62–66.
  • [Quigley et al., 2009] Quigley, M., Conley, K., Gerkey, B., Faust, J., Foote, T., Leibs, J., Wheeler, R., and Ng, A. Y. (2009). Ros: an open-source robot operating system. In ICRA workshop on open source software, volume 3, page 5. Kobe.
  • [Rijsbergen, 1979] Rijsbergen, C. J. V. (1979). Information Retrieval. Butterworth-Heinemann, Newton, MA, USA, 2nd edition.
  • [Sklansky, 1982] Sklansky, J. (1982). Finding the convex hull of a simple polygon. Pattern Recognition Letters, 1(2):79–83.
  • [Suzuki et al., 1985] Suzuki, S. et al. (1985). Topological structural analysis of digitized binary images by border following. Computer vision, graphics, and image processing, 30(1):32–46.
  • [Viola and Jones, 2001] Viola, P. and Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, volume 1, pages I–511–I–518.
  • [Zhang, 1992] Zhang, Z. (1992). Iterative point matching for registration of free-form curves.
  • [Zhu et al., 2008] Zhu, Q., Wang, L., Wu, Y., and Shi, J. (2008). Contour context selection for object detection: A set-to-set contour matching approach. Computer Vision–ECCV 2008, pages 774–787.