RobotX competition has brought together the top schools among the nations around oceans, and enhanced the developments of both hardware designs and software algorithms for perceptions and propulsions of unmanned surface vehicles. Perceptions play an important role together with state estimation and motion planning for autonomous vehicles. The participated teams have been working toward robustness to degradation caused by motion, scale and perspective transformation from different viewing positions, warp and occlusion, and variants of color from light condition, and speed and accuracy to support real-time decision-making.
Nevertheless, Building an autonomous unmanned marine system is challenging in many aspects. The hardware should be robust enough to maintain functionalities in different weather conditions. Waterproof design, compact electronics, together with cooling system are crucial for weather conditions of heavy rain, strong winds and the blistering sun. Due to the resource constrained of power and computation, realtime algorithms should be adapted for overall performance. Those challenges remain due to the uncertainties of the marine environments, and the lack of baselines and standardized evaluations because benchmarking in marine environments is hard. More importantly, there are still big questions arisen from the previous RobotX competitions.
What are the basic principles to manage multiple tasks and to advance the developments, instead of just a set of clever hacks on individual tasks?
Many of the tasks have some hand-tuned parameters that were optimized for the tasks during the competition. How could the developed system/techniques be generalized to other scenarios, on other robots, or in other environments?
Can we use simulation environments to facilitate the developments? What are the gaps between real and virtual environments and how to bring the gaps closer?
Team NCTU’s technical approaches are motivated from the challenges and big questions above. The recent mega trends of AI have fostered researchers in robotics, machine learning, and other fields together. In particular, we wish to adopt the principles of the AI Driving Olympics (AI-DO)[aido, aido-duckietown] hosted in NIPS 2018 into RobotX competition. The AI-DO is developed by the Duckietown community. The project was initiated in MIT in 2016 and is now offered as university courses in ETH Zurich, University of Montreal, and Toyota Technological Institute at Chicago, and National Chiao Tung University (NCTU) in 2017. Duckietown [duckietown_mit]
is an open, reproducible, and inexpensive robotic education and research platform. A team of vehicles are built upon Robot Operation System (ROS) and include an onboard monocular camera and an embedded computer. A miniaturized city (Duckietown) with roads, signage, and obstacles is designed to tackle the problems of autonomy. The Duckietown platform started to embrace Docker and deep learning in 2018 and host the competitions of a few tasks via learning approaches such as Convolution Neural Network (CNN) and Deep Reinforcement Learning. Here we wish to transform such methodologies into the domain of unmanned surface vehicles.
We summarize our contributions as follows:
With the principles of containerization (Docker), we designed and built an autonomous unmanned marine system with the hardware and software that is compatible for two commonly-used middlewares: ROS and MOOS. The containerization allows to a) develop under different middlewares, b) deploy on real and simulation environments, and c) modularize and compare algorithms for the RobotX tasks.
The uniformed AI agent framework with observations and actions facilitate comparisons between classic and learning-based algorithms. In particular, we tackle the problems of a) the placard detection and 3D object recognition carried out in classic feature-based vs. learning-based methods, b) obstacle avoidance via classical methods vs. deep reinforcement learning approaches.
We built an outdoor on-surface motion capture system based on 3D LiDAR and 2D vision-based approaches as benchmarks for further research purposes.
Ii Literature Review
Ii-a RobotX 2014 and 2016
RobotX has already been held for 2 times in 2014 and 2016. The 2014 championship team MIT-Olin’s [anderson2016overview] uses IvP-helm [benjamin2003interval] to achieve multi-objective optimization, such as transit to target waypoint while avoiding obstacles. The MOOS [benjamin2009overview] framework operates the speed and heading which are generated by IvP Helm and also process data, for instance, GPS, object map, PID controller, and so on. The 2016 championship team Team NaviGator AMS [frank2016university] utilize ROS [quigley2009ros] as middleware. That year added a new mission about underwater shape identification. The team designed underwater vehicle Anglerfish [gray2016anglerfish] by themselves which can be controlled by the NaviGator ASV via a 30-metered tether providing power and ethernet. Most of the teams in the past two competitions apply the algorithms about detection and motion without using machine learning methods.
Ii-B Learning Approaches for Mobile Robots
Recently, deep Convolutional Neural Networks (CNN) have been used to achieve autonomous trail or lane following. Giusti et al. [giusti2016machine]
tackled autonomous forest or mountain trail-following using a single monocular camera mounted on a mobile robot, such as a micro-aerial vehicle. Unlike the previous literature, they focused on trail segmentation and used low-level features to develop a supervised learning approach using a deep CNN classifier. The trained CNN classifier was shown to follow unseen trails using a quadrotor. Deep driving[chen2015deepdriving] categorizes the autonomous driving work into three paradigms. Behavior Reflex is known as a low-level approach for constructing a direct mapping from the image/sensory inputs to produce a steering motion. This is done by means of a deep CNN trained by labels generated from human driving along a road or in virtual environments. Mediated perception is the recognition of driving-relevant objects, e.g., lanes, traffic signs, traffic lights, cars, or pedestrians. The recognition results are then combined into a consistent world representation of the cars and immediate surroundings. Direct perception falls between mediated perception and behavior reflex. It proposes to learn a mapping from an image to estimate several meaningful states of the road situation, such as the angle of the car relative to the road and the lateral distance to lane markings. With the state estimation, other filters or FSMs and controllers can be applied. The forest-trail-following vehicle in [giusti2016machine] belongs to behavior reflex, whereas the Duckietown falls into the direct perception paradigm.
In the context of deep learning, the most commonly used method for simulation to real environment is transfer learning[peng2015learning, su2015render]. By gathering data from the target domain in addition to the source domain closes the gap between simulation and the reality. [tzeng2015towards, tzeng2015simultaneous]
uses data alignment to tackle with simulation to real problems for robotics arm pose estimation.[zhang2015towards, zhu2017target] proved that doing simple but precise tasks in simulations can be transferred to real world robots for more complex and general tasks. [rusu2016progressive] connected the layers of deep learning models trained with simulation data and real world data together. It resulted features trained in virtual environments usable in real world scenarios.
Iii Design Strategy
Iii-a Enabling Hardware Systems for AI Computing
For the first time building such a sophisticated system, it is very crucial to do every possible evaluation beforehand. It just happens that we’ve been working on a project called ”Duckietown”, a miniaturized AI self-driving car platform that runs on ROS. The cars a.k.a ”duckiebots” use AI to navigate around and communicate with other duckiebots. We think it would be interesting to apply the AI technology to our Autonomous Maritime System(AMS). The duckiebots could provide a miniaturized environment for testing our algorithms.
Iii-B From Simulation to Real Environments
In addition to the duckietown platform as an simulation, using software simulation is also important for our proof of concept. We use ROS as middleware, so Gazebo [koenig2004design] is the obvious solution due to the compatibility to ROS. Using the advantage of ROS, we could simply interface with simultation and real robots by publishing and subsribing corresponding messages. A few parameters should be adjusted to fit the real environment.
Iii-C Containerized Algorithms for Deployable Softwares
To deploy a variety of different algorithms on various environments, software dependencies may be troubling. A decent solution for it is to containerize algorithms. By using Docker, every algorithm is like a building block, which is easy to switch to one another. The plug and play feature on docker containers provide us with simple deployment on both simulation and real environments.
Iii-D Comparing Deep Learning to Classic Approaches
Deep learning has influenced the robotic research in the past few years. The challenge has always been proving deep learning is better than classic approaches. We believe that both are good but in different aspects. Classic approaches may deal with a problem with a really great performance, on the other hand deep learning methods might be more reliable to unexpected environment changes. Our goal is to compare the two and discuss the pros and cons of each method.
Iv Vehicle Design
The WAM-V system has one multi-enclosures on payload which can be seen in Fig. 3, which split into two power enclosures, one sensor enclosure, one controller enclosure, and one hydronphone enclosure.
Two forward thrusts are used for the differential drive system. Each motor is capable of 80 lbs of thrust. It is the simplest to implement with the least amount of cost. We once considered holonomic drive which enables the mobility of the USV, but it requires at least three thrusters and it has much more complicated control motion. In addition, we think there must be some reason that no holonomic drive USV is currently working in the ocean.
Iv-B Sensor System
The sensor system is build as an sensor tower and an independent sensor for underwater acoustic array for the competition. The sensor tower has 4 levels, each mounted with different sensors. (shown as Fig. 3) One 3D LiDAR, three depth cameras, an IMU, and a GPS are used to sense the environment. These sensors could localize our USV and target objects. There is also an underwater hydrophone array for detecting underwater pingers.
This paper presents a multi-computing-unit system. A Industrial Personal Computer(IPC) served as the main computing unit. Through ethernet it communicates with all other computing units, such as Nvidia Jetson TX2, and Raspberry Pi 3 for different purposes. The Nvidia Jetson TX2 is used to gather depth camera sensor data and sent back to the IPC. The Raspberry Pi 3 is used to control motors. Every unit in the system is connected by ethernet and communicate with ROS.
|ASUS Laptop (GX501)||1||GPU||GeForce GTX 1050 Ti|
|NVidia Jeson TX2||3||GPU||Pascal 256 CUDA cores|
|Raspberry Pi 3 (Ne-||9||VPU||Myriad 2 Vision|
|ural Compute Stick)||Processing Unit|
|Raspberry Pi 3||1||Propulsion|
|Raspberry Pi 3||1||Visual Feedback|
|Raspberry Pi 3||1||Launching and Recovery|
|Raspberry Pi 3||1||Detect and Deliver|
V Software system design
V-a Overall System
The software is built upon the ROS interface, several nodes are built for distinct functionalites and they communicate with each other via ROS messages and services. The ROS package tree contains nodes shown in Figure 4
There are nodes that deal with path planning, localization, control, perception, classification, etc. Some of the ROS nodes are contained in docker for easier deployment.
V-B AI Agent
To generalize the usage of this system, we introduce the concept of AI agent. The idea is to build it so its compatible for OpenAI gym [brockman2016openai].
Figure 5 shows that the agent should output observations and rewards for each input action. The design of the agent should output higher rewards for correct actions. Then the algorithm tries to interact with the agent and learn what the AI agent wants it to learn. So for each task there is at least one AI agent and each agent could have different observation from various of sensors.
V-C MOOS and ROS
The Mission Oriented Operating System (MOOS) [benjamin2009overview] and the Robot Operating System (ROS) [quigley2009ros] are both robotic frameworks which provide the communication between different robots and components.
ROS is an open source project which is compatible of using C++ and Python as development languages. In ROS, processing units are called ”Nodes” and they communicate with each other by ”Messages” and ”Services”. A central monitor program named ”ROS Master” controls all the status of nodes and handles all the ”messages” and ”services” exchanged among the ROS ”nodes”. Since there are majority of users and thousands of modules have been developed, ROS already became the most commonly-used middleware for robotics. Lots of robots such as PR2, Atlas, UR5, Turtlebot use this open source software as their software framework.
MOOS-IvP is the combination of two open source projects: MOOS and Ivp Helm. MOOS is developed by the University of Oxford which and is designed to be the core of autonomy middleware. While IvP Helm is developed by MIT used for multi-objective optimization between competing behaviors. Most of the MOOS-IvP projects are applied in the field of unmanned surface vehicle and underwater acoustics and large influence in the field of marine robots for years.
This paper [west2011overview] from Georgia Tech presented some advantages and disadvantages of these two middlewares. MOOS is more oriented towards onboard publish-subscribe architecture by using its community database ”MOOSDB” and more lightweight. In addition, there are hundreds of executable behaviors, simulations, and MOOS Apps developed by the marine robotics community. This leads to easy development of your own algorithms for USVs. On the other hand, ROS is more general and provides multi-platform services which is less painful for system integration. Moreover, it has many of the low-level device control interfaces and components int the hardware abstraction layer. ROS also included Gazebo, a 3D simulation for general robotics application, on the contrary MOOS only offers 2D simulation. However, development in Gazebo is much complicated than in MOOS.
We chose to use both frameworks for taking advantages from each. In order to cooperate the two frameworks, we use MOOS-ROS Bridge [demarco2011implementation] to communicate these two robotics middlewares. We modified and added a few features for better performance in applications needed by our system.
One of the critical things to do is to obtain the vessel’s pose. Without the poses, the vessel wouldn’t know where it is not to mention navigate to a position. We implement the localization with a Hector GPS and a microstrain 9 DoF IMU. GPS datum and IMU datum are fused together using a Gaussian filter. This filter stablizes the localizaton output. Posistion and Orientation are calculated separately.
V-E Control and Navigation
Another crucial feature is to control the vessel and to navigate from one point to another. We designed our WAM-V’s with a differential drive motion model, therefore linear and angular could be controlled separately. We use a PID controller for heading control and a cascade PID controller for the position. We chose the cascade PID controller to control position because with both position and velocity feedback, it could provide better performance for tasks i.e. station keeping.
For navigation, we implement the pure pursuit algorithm for waypoint navigation. The pure pursuit algorith makes the trajectory smoother by reducing sharp turns. Figure 6 shows the trajectory for our WAM-V executing waypoint navigation. The waypoints are a the points of a square with a 15 meter edge. It show the turns on the corners are smoothed out.
V-F Object Classification
Due to the precarious weather and changeable lighting conditions which can strongly impact the color of the object, we chose to use the pointclouds gathered from the LIDAR for object classification. A brief introduction of the algorithm pipeline is as follows. First, for the preprocessing stage, we apply RANSAC and noise filter to remove the points from the sea level and random noises, leaving the objects remaining in the pointcloud. After the preprocessing process, we apply a clustering algorithm to separate each object for each another. Then we project each object’s pointcloud to X-Y, Y-Z and X-Z planes (according to the LIDAR’s coordinate frame) and saved each plane projection to one of the RGB channels of an image. (Figure 7) We now obtain the RGB image and named it as the ”flattened pointcloud”. This ”flattened pointcloud” remains some of the 3D information and at the same time is a more compact datatpe compared to the raw pointcloud.
However, the projection is related to the LiDAR’s orientation, different rotations of the WAM-V cause the ”flattened pointcloud” of the same object to differ a lot. This may lead to bad performance for classification afterwards. To cope with this problem, we simply transform the coordinate frame to the object’s frame which sets the origin to the center of the object and the x axes. This resulted the flatted image be consistent as the WAM-V change the orientation. Figure 8.
For each class we gather approximately 700 ”flatten pointcloud” images, with 500 images from Gazebo virtual environment and 200 images from the real world (Bamboo lake in NCTU, Taiwan). We have 5 different classes, which are obstacle buoys, totem buoys, the dock and the box for the detect and deliver task.
After collecting our dataset, we train them with CaffeNet and got an accuracy of 95.4% in gazebo testing dataset Figure 9 and 87.7% in real world testing dataset. The decrease of accuracy in the real world testing dataset is mainly because the object is too far from the LIDAR. The obtained sparse pointcloud leads to worse performance. But overall, it is still a decent algorithm for the tasks in RobotX.
V-G Placard Detection
For the placard detection algorithm in the task identify the dock, most of the traditional methods based on feature matching suffered from illumination variance, perspective transformation, and occlusion. And the methods were limited to scaling up with more types of placards. To overcome those challenges, a deep CNN classifier was utilized to identify the placards. The training data was collected from virtual and semi-realistic environments which are more accessible. We collect data by driving the vehicle in different trajectories shown in Fig.10, the data contained different perspective transformation, and some with occlusion. Our CNN model is based on CaffeNet [jia2014caffe] but using network surgery techniques to reconfigure it for having 10 output classess. The 10 classes represent nine different types of placards and one background. After training from virtual and semi-realistic environments, the CNN classifier is applied to perform placard identification with MSER region proposals input in real-world environment. As shown in Fig. 11, the green, black, and white boxes represent the green circle class, background class, and predictions under threshold respectively.
V-H Docking Motion
Inspired by the previous work [chuang2018deep]
, an end-to-end deep CNN model was deployed to predict three motion probabilities with single RGB image input. The imitation learning process is to gather training data by contolling a vehicle mounted with three different heading cameras torwards a docking bay. Shown in the left of Fig.12, the data collected from three cameras was automatically labelled into three classes: turn left, go straight, turn right. The probabilities of three motion classes outputs were then calculated to the motor commands of the differential-driven vessel. The right Fig. 12 showed the images from a single camera mounted on ASV with different heading angles to the docking bay in real-world environments. And the images could be predictied into correct motion classes for the following docking motion control.
V-I Totem Circling
For the totem circling task, we implemented it in the classical way. Totem detection is achieved by the object detection discussed in previous subsection therefore we could get the spatial information of the totems. Then we formulate the circling problem as figure 13. The two variables d and phi are used to describe the vehicle’s position. If the vehicle is performing a circulating action, d will be the rotating radius R and phi would be zero. Therefore, by using 2 PID controllers to control d and phi, the circulation motion is done.
This algorithm is implemented in both the Duckietown and Gazebo simulations(Figurefigure 13). Both of them are robust enough to complete at least 30 rounds.
V-J Obstacle Avoidance
Obstacle avoidance is one of the basic features for mobile robots and there are plenty of different algorithms to solve this problem. In this section, we are discussing three algorithms for this problem: MOOS obstacle manager, minimum angle real-time path planning and deep reinforcement learning. Different techniques may all solve the problem, but the meaning behind it could be quite different. The first method is the obstacle avoidance feature included in MOOS. It is mainly composed of three parts. The pFeatureTracker App manages objects’ convex polygon, position, and map. The pObstacleMgr App sends the OBSTACLE_ALERT to the behavior. The BHV_AvoidObstacle is the behavior function in MOOS-IvP Helm which maneuvers the vehicle to avoid the obstacles. In response to different situations, the user has many configuration parameters to adjust for the AvoidObstacle Behavior. This method is based on a priority policy to avoid obstacles which may be useful in many scenarios.
The second method we implemented the minimum angle real-time path planning, which is based on [Zhuang2005real]. Basically, the algorithm forms a line segment by connecting the start point and the goal point, then checks whether this line collides with objects or not. If collision occurs, then it finds the points on the right side and left side of the obstacle as candidates. Then the turning angles of both the candidates are calculated. The candidate with a smaller angle will be picked and considered as a new starting point. Then we repeat the process of checking the new line segment that is connecting the new starting point and goal point. By doing this iteratively until the starting point is close to the goal point, we could find a path that connects the original starting point to the goal point without hitting obstacles.
The last method we implemented for obstacle avoidance by Reinforcement learning. This is a well known problem for reinforcement learning. Work [petereinforcementobs] has been done to solve this problem using a 2D LiDAR as input with two classic reinforcement learning algorithms: Q learning and SARSA. Results are very promising in simply built simulation environments. However, with a more complex sensor input, it is common that there are thousands or even millions of robot states. This resulted in the traditional reinforcement learning almost impossible to train. Deep reinforcement learning overcomes this problem by replacing policy tables with Deep Neural Networks(DNN). We’ve designed our software as an AI agent(mentioned previously), so it is simple to implement it for deep reinforcement learning. For this task, first we set the action space as ⟨go_straight, turn_left, turn_right ⟩. Then we downsampled the LiDAR data as the observation. Finally, we set the reward as r a function of observation o and action a.The design of the reward function is simply penalize collisions and rewarding more for straight movement compared to turns for preventing the agent spinning on the same spot. By training iteratively, the DNN learns the policy. On an average of 100 episodes the agent learns to avoid obstacles in 100 steps. While this method seems workable and simple. However, some strange behaviors sometimes occur on some agents, for example, some agents only turn right to avoid obstacles. The agents only learn from the reward functions, so it makes total sense it would learn this result. Therefore it is important to carefully design the reward function.
V-K Underwater Manipulation
For the underwater manipulation task we modified the inspector 2 from Seadrone Inc as our AUV. The AUV is equipped with 5 thrusters which provided 3 DoF in translation and 1 dof in orientation(yaw) movement. Onboard sensors include a pressure sensor, leak sensors, IMU and a camera with extra led lighting. We added an underwater gripper from Blue Robotics for grasping rings for the task.
Using classic image processing algorithms, we are able to detect the pose of the ring, steer the AUV to grasping point using PID control and retrieve the ring to the surface.
Vi Experimental Results
All the tests and trials of our AMS are done in the lake located in our campus. This is our first year doing this so we did a lot to get our WAM-V in the lake. We managed to build a launching platform which consists of a slope with a winching system. Neither of us had any experience so it was a trial-and-error process. We also built the equipments from totems buoys to docks by hand.
After the WAM-V is launched into the lake, we did some test for basic functionalities(e.g. localization, motion control and perception), the problem we encountered is that we couldn’t do a quantitative analysis for all the experiments related to localization. We could only do qualitative analysis and just come up to a more general conclusion. For instance, we did a trial of 10 rounds for the WAM-V performing obstacle avoidance and it completed 9 rounds.
Therefore, later we decided to build a vision-based localization system Figure. 15 by placing apriltags [olson2011apriltag] on floating pontoon cubes anchored with concrete blocks. This could provide us more quantitative results for our AMS localization, giving us more prove whether our algorithms work better.
Our development of the whole system for the RobotX 2018 competition is meaningful for us being part of the unmanned autonomous vessel research. We focus not only on making our system work in the competition but also building a foundation for further researches. Our final goal is to bring a better platform and modularized software for easier development and deployment that every developer could use. This could bring the community closer together by sharing work based on the same framework and also let researchers focus on parts they care about. In our case we brought some deep learning / deep reinforcement learning algorithms and hope to bring more cutting-edge deep learning technologies to the field of marine robotics.
The research was supported by National Chiao Tung University Innovative Creative Technology (NCTU-ICT), Ministry of Science and Technology, Taiwan (grant numbers 107-2623-E-009 -005 -D and 105-2511-S-009-017-MY3), . We are also grateful for the help by Santani Teng, Robert Katzschmann, Ilenia Tinnirello, and Daniele Croce.