We propose a strategy for a team of autonomous vehicles to collaborate on information based planning for sampling dynamic environments that are dangerous for the humans. Some of the example applications include – collecting thermal imagery in a wildfire-affected site to assist with detecting spot fires, fire mapping and monitoring fire progression (Fig.0(b)), visual inspection and monitoring of coral reefs by planning on aerial image from an UAV and learning to navigate (Fig.0(f)) tough underwater environments with an Autonomous Underwater Vehicle (AUV) (Fig.0(e)).
Each year over 5 million hectares are burned due to forest fires in Canada and the United States. A profound number of people, ground-vehicles and aerial-vehicles are used to detect, suppress and extinguish forest fires. Wildfire suppression is a complex task that involves the dispatch of limited resources to the most at-risk fires where detection is often done through thermal imaging (Fig.0(a)) and human observations (Fig.0(b)). Manned helicopters are used to survey areas for hot-spots both before and after a wildfire which can spread beyond control. Several techniques are used for fire suppression, but the most notable is Initial Attack which can contain/control over 90% of wildfires111https://www.nifc.gov/PUBLICATIONS/redbook/2000/Chapter9.pdf. Several factors determine the initial attack strategy including:
proximity to human-life and property
fuel type (i.e. trees or grass)
terrain topography (flatness, rivers, mountain slope or valley)
fire size (height and area)
Initial Attack techniques include extinguishing fires directly by firefighters using hand tools for small fires such as those under one meter or smouldering in the ground. Slightly larger fires which are too intense for firefighters are directly extinguished using bulldozers and fire engines or by dropping retardant. For large fires that cannot be attacked directly, a perimeter of natural (such as roads) or artificial barriers (created by limiting fuel sources) are used to control the fire and is allowed to burn out.
Recent advances in aerial and ground robotics could one day be used to assist in the detection and suppression of fires [2, 15]. Low-cost UAVs can replace much of the current work being done by manned helicopters by carrying thermal cameras to detect hot-spots at a high altitude as well as take local temperature measurements on the ground where accessible. Further, UGVs can be used to perform in situ measurements, digging up hot-spots, extinguishing by spraying retardant, bulldozing fuel sources to create perimeters and even supplying resources to manned crews. In literature, collaborative teams of UAVs and UGVs have been used for localizing each other , or for shared coverage, or in search and rescue scenarios or for target tracking  and surveillance . We focus this work on using an heterogeneous robotic team for coordinated information gathering in outdoor environments.
In this work, we propose a methodology for global and local path planning of an UGV using imagery from an aerial vehicle. We seek to generate efficient paths for the UGV by segmenting the aerial image based on textures and labeling each class with a score associated to its drivability. As the ground-vehicle navigates, the performance in terms of progress towards the destination and incidence of obstacles measures the drivability of the terrain class and the drivability score is updated. Robots can play an important role in the process through aerial thermal imaging. In this work we focus on the global and local planning for an UGV for sampling applications. The presented methodology of collaborative sampling can find use in various real world applications, such as search and rescue operations, sampling large-scale environmental scientific data, and large-scale coral reef monitoring using an UAV and an AUV.
Ii Our Approach
The flowchart in Fig.2 presents an overview of our strategy for collaborative coverage and sampling on an outdoor scene. We split our approach into two planning phases: Global planning, where the high-level planner chooses locations that need to be visited to achieve good sampling and plans an efficient path (encoded as a discrete set of waypoints) to visit these locations; and Local planner, which handles short-term navigation between the waypoints (preferring good drivable terrains) and small deviations caused by obstacles.
A collaborative phase is responsible for labeling the terrain classes (used by the global planner) with the drivability feedback from the UGV. This update will result in an improved global plan and re-planning is necessary when the local planner on the UGV is unable to circumnavigate an obstacle. We discuss the building blocks of our system in the following subsections and later present some preliminary results from the evaluation of these building blocks.
Ii-a Texture classification and learning the scoremap
Our texture classification (as shown in Fig.2(a)) classifies patches of the image into a number of distinct texture classes. The texture classifier uses several Gabor filters to describe the texture . The filters are convolved with the image to produce robust energy statistics, and several filters allow the classifier to be rotation and scale invariant.
The distribution of hue content is parameterized as a histogram. The hue histogram parameters and Gabor order statistics are combined into a feature vector, which are fed into a K-means classifier to discriminate between a preset number of texture classes. Fig.2(c) illustrates the texture based segmentation on an aerial image in Fig.2(b) using our classification pipeline.
Ii-B Policy Gradient based global path planning
We use policy search on aggregated state space  to generate paths for the UGV such that the drivable regions are covered. In this approach  a continuous two-dimensional sampling region is discretized into uniform grid-cells, such that the robot’s position can be represented by a pair of integers . Each grid-cell is assigned a score indicating the expected drivabillity in that cell. The goal is to maximize the total accumulated score over a trajectory within a fixed amount of time . To specify the robot’s behavior we use a parametrized policy that maps the current state of sampling to a distribution over possible actions . The aim here is to automatically find good parameters , after which the policy can be deployed without additional training on new problems. We use a multi-resolution feature representation centered around the robot as explained in . In this representation, the feature cells grow in size along with the distance from the robot. This results in high resolution features close to the robot and lower resolution features further from the robot’s current position.
In this equation, the gradient is based on sampled trajectories with horizon and state at the time-step of the sampled roll-outs. Furthermore,
is a variance-reducing baseline. In our experiments, we set the baseline to the observed average reward.
Ii-C Vision based local path planning for Ugv
uses vision-based navigation which steers the vehicle towards the next way-point while avoiding obstacles and preferring paths which are good for driving and provide good texture for visual state estimation.
A Convolutional Neural Network (CNN) is used to predict steering angles that avoid obstacles and leads to driving on good terrain. To train the CNN, we perform behavioural cloning not unlike the DAGGER algorithm. The UGV is remotely controlled in both good and poor configurations such as very close to and far from obstacles. A Resnet-18 based CNN is trained to predict steering angle from a set C=, where
is a discrete number of steering angles in each the left and right direction. From the data collected in the remotely controlled step, a set of images and one-hot encoded steering angles make up a dataset. The cost function for training the network is as follows:
where the regularization term corresponds to the KL-divergence term in . The prediction loss for yaw steering actions was given by
where first term corresponds to the cross-entropy between the network predictions and the smoothed labels, and the second term is the penalty for overconfident predictions, which aims to maximize the entropy of the predictive distributions. The hyper-parameters and determine the weights of the KL-divergence regularization and the entropy penalty and are selected manually.
Iii Preliminary Results
We evaluated the building blocks of our system on both marine and terrestrial setups. In the marine example, the application is to cooperatively collect better visual samples of the coral reefs in a limited time budget using the UAV for global planning and an AUV for local planning and execution. The presented results are from experiments conducted in field at Folkestone Marine Reserve in Barbados, over a region known to have several coral outcrops. The AUV used in these experiments is a Aqua class underwater robot . The AUV has a vision based local planner to swim between reefs collecting high-resolution visual samples. We used the aerial images from the UAV to plan trajectories for the AUV (as presented in Fig.3(a), Fig.3(b), and Fig.3(c)), such that the information gain about the coral reefs is maximized with a limited time budget. In the terrestrial example, we used the aerial images from the UAV presented in Fig.2(b) to generate a scoremap indicating the drivability of the terrain as illustrated in Fig.3(d). The policy gradient based path planner then plans a trajectory for the UGV (as presented in Fig.3(e) and Fig.3(f)), such that the coverage of drivable terrain is maximized with a limited time.
In one of our recent work , we evaluated our policy gradient based path planner in the marine domain to monitor coral reefs. The results presented in Fig.3(g) illustrate that we are able to achieve higher discounted rewards in comparison to a uniform sampling technique. The goal here is to maximize the discounted rewards as the time budget is limited for the robotic vehicle.
We have performed preliminary validation of our local planner in the same marine environment as we validated the global planner. In ,  we performed long-distance autonomous underwater navigation in close-proximity to coral reefs (on average 43 cm from coral) without collision. We manually labelled approximately 13,000 images taken in varying configurations and several lighting configurations with expected yaw and pitch and used them to train a CNN. Our experimental validation resulted in over one km of collision-free navigation in the open sea, while maintaining good coverage of coral which was our target observation. Fig. 5 shows the percentage of the forward-facing images that were covered with coral, showing that our robot maintained consistent coverage of the target terrain (coral). On average, 33% of the image showed the presence of coral indicating the effectiveness of navigating coral region as opposed to barren regions. Since the horizon is located in the center of image, we expect the coral in view to be much lower than 50%. In an upcoming field trial, we plan to evaluate all the building blocks and the whole pipeline.
In this paper we have outlined a system for cooperative path planning using information acquired from an UAV to plan paths for an UGV. Drivability performance on the UGV is used to update a scoremap to plan the path iteratively. We demonstrate the effectiveness of the subsystems in our preliminary experiments in a marine domain with collaboration between an aerial and an underwater vehicles. In our current and future work, we are tightening the interaction between the UAV and UGV
along with a unified sampling goal. We expect that this work will lead to a robust and generalized framework for deploying cooperative aerial and ground robots in complex, unstructured and dynamic environments. The active-learning framework allows the system to work from previously seen environments, but also incorporate new unseen information into the model for improved planning and execution.
- Unmanned Aerial Vehicle
- Unmanned Ground Vehicle
- Convolutional Neural Network
- Autonomous Underwater Vehicle
Baxter and Bartlett 
Jonathan Baxter and Peter L Bartlett.
Infinite-horizon policy-gradient estimation.
Journal of Artificial Intelligence Research, 15:319–350, 2001.
- Beachly et al.  Evan Beachly, Carrick Detweiler, Sebastian Elbaum, Brittany Duncan, Carl Hildebrandt, Dirac Twidwell, and Craig Allen. Fire-aware planning of aerial trajectories and ignitions. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 685–692. IEEE, 2018.
- Deisenroth et al.  Marc Peter Deisenroth, Gerhard Neumann, and Jan Peters. A survey on policy search for robotics. Foundations and Trends® in Robotics, 2(1–2):1–142, 2013.
- Fogel and Sagi  I. Fogel and D. Sagi. Gabor filters as texture discriminator. Biological Cybernetics, 61(2):103–113, 1989. ISSN 0340-1200. doi: 10.1007/BF00204594. URL http://dx.doi.org/10.1007/BF00204594.
- Gal et al.  Yarin Gal, Jiri Hron, and Alex Kendall. Concrete Dropout. In Advances in Neural Information Processing Systems 30 (NIPS), 2017.
- Grocholsky et al.  Ben Grocholsky, James Keller, Vijay Kumar, and George Pappas. Cooperative air and ground surveillance. IEEE Robotics & Automation Magazine, 13(3):16–25, 2006.
- Kober et al.  Jens Kober, J Andrew Bagnell, and Jan Peters. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11):1238–1274, 2013.
- Manderson et al. [2018a] Travis Manderson, Ran Cheng, Dave Meger, and Gregory Dudek. Navigation in the Service of Enhanced Pose Estimation. In International Symposium on Experimental Robotics (ISER), 2018a.
- Manderson et al. [2018b] Travis Manderson, Juan Camilo Gamboa Higuera, Ran Cheng, and Gregory Dudek. Vision-based autonomous underwater swimming in dense coral for combined collision avoidance and target selection. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1885–1891. IEEE, 2018b.
- Manjanna et al. [2018a] Sandeep Manjanna, Herke van Hoof, and Gregory Dudek. Policy search on aggregated state space for active sampling. In 2018 International Symposium on Experimental Robotics (ISER), pages 1–7, 2018a.
- Manjanna et al. [2018b] Sandeep Manjanna, Herke van Hoof, and Gregory Dudek. Reinforcement learning with non-uniform state representations for adaptive search. In 2018 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), pages 1–7. IEEE, 2018b.
Ross et al. 
Stéphane Ross, Geoffrey Gordon, and Drew Bagnell.
A reduction of imitation learning and structured prediction to no-regret online learning.In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 627–635, 2011.
- Sattar et al.  Junaed Sattar, Gregory Dudek, Olivia Chiu, Ioannis Rekleitis, Philippe Giguere, Alec Mills, Nicolas Plamondon, Chris Prahacs, Yogesh Girdhar, Meyer Nahon, et al. Enabling autonomous capabilities in underwater robotics. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3628–3634, 2008.
- Sutton et al.  Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems, pages 1057–1063, 2000.
- Torresan et al.  Chiara Torresan, Andrea Berton, Federico Carotenuto, Salvatore Filippo Di Gennaro, Beniamino Gioli, Alessandro Matese, Franco Miglietta, Carolina Vagnoli, Alessandro Zaldei, and Luke Wallace. Forestry applications of uavs in europe: A review. International Journal of Remote Sensing, 38(8-10):2427–2447, 2017.
- Vaughan et al.  Richard T Vaughan, Gaurav S Sukhatme, Francisco J Mesa-Martinez, and James F Montgomery. Fly spy: Lightweight localization and target tracking for cooperating air and ground robots. In Distributed autonomous robotic systems 4, pages 315–324. Springer, 2000.
- Yu et al.  Huili Yu, Randal W Beard, Matthew Argyle, and Caleb Chamberlain. Probabilistic path planning for cooperative target tracking using aerial and ground vehicles. In Proceedings of the 2011 American Control Conference, pages 4673–4678. IEEE, 2011.