Supplementary Material
The datasets and our C++ implementation are available at:
I Introduction
Cooperation between aerial and ground robots undoubtedly offers benefits to many applications, thanks to the complementarity of the characteristics of these robots [25]. This is especially useful in robotic systems applied to precision agriculture, where the areas of interest are usually vast. A uav allows rapid inspection of large areas [26], and then share information such as crop health or weeds distribution  indicators of areas of interest with an agricultural ugv. The ground robot can operate for long periods of time, carry high payloads, perform targeted actions, such as fertilizer application or selective weed treatment, on the areas selected by the uav. The robots can also cooperate to generate 3D maps of the environment, e.g., annotated with parameters, such us crop density and weed pressure, suitable for supporting the farmer’s decision making. The uav can quickly provide a coarse reconstruction of a large area, that can be updated with more detailed and higher resolution map portions generated by the ugv visiting selected areas.
All the above applications assume that both uavs and ugvs can share information using a unified environment model with centimeterlevel accuracy, i.e. an accurate shared map of the field. There are two classes of methods designed to generate multirobot environment representations: (i) multirobot slam (slam) algorithms (e.g., [21, 18]), that concurrently build a single map by fusing raw measurements or small local maps generated from multiple robots; (ii) map registration algorithms (e.g., [3, 5]) that align and merge maps independently generated by each robot into a unified map. On the one hand, the lack of distinctive visual and 3D landmarks in an agricultural field, along with the difference in the robots’ pointofviews (e.g., Fig. 2), prevent direct employment of standard multirobot slam pipelines, either based on visual or geometric features. On the other hand, merging maps independently generated by the uavs and ugvs in an agricultural environment is also a complex task, since maps are usually composed of similar, repetitive patterns that easily confuse conventional data association methods [17].
Furthermore, due to inaccuracies in the map building process, the merged maps are usually affected by local inconsistencies, missing data, occlusions, and global deformations such as directional scale errors, that negatively affect the performance of standard alignment methods. Geolocation information associated with (i) sensor readings or (ii) maps often can’t solve the limitations of conventional methods in agricultural environments, since the location and orientation accuracy provided by standard reference sensors^{1}^{1}1gpss (gpss) and ahrss (ahrss) [23] is not suitable to prevent such system from converging towards suboptimal solutions (see Sec. V)
In this paper, we introduce AgriColMap, an AerialGround Collaborative 3D Mapping pipeline, which provides an effective and robust solution to the cooperative mapping problem with heterogeneous robots, specifically designed for farming scenarios. We address this problem by proposing a nonrigid map registration strategy able to deal with maps with different resolutions, local inconsistencies, global deformations, and relatively large initial misalignments. We assume that both a uav and a ugv can generate a colored, geotagged point cloud of a target farm environment (Fig. 1
). To solve the data association problem between the input point clouds, we propose a global, dense matching approach. The key intuition behind this choice is that points belonging to a cloud locally share similar “displacement vectors” that associate such points with points in the other cloud. Thus, by introducing a smoothness
^{2}^{2}2The smoothness is related to displacement vectors of neighboring elements. term in the dense, regularized matching, we penalize the displacement discontinuities in each point neighborhood. With this formulation, good correspondences are iteratively improved and spread through cooperative search among neighboring points.This approach has been inspired by the ldof (ldof) problem in computer vision and, actually, we cast our data association problem as a ldof problem. To this end, we convert the colored point clouds into a more suited, multimodal environment representation that allows one to exploit twodimensional approaches and to highlight both the semantic and the geometric properties of the target map. The former is represented by a vegetation index map, while the latter through a dsm (dsm). More specifically, we transform each input point cloud into a grid representation, where each cell stores (i) the exg index (exg) and (ii) the local surface height information (e.g., the height of the plants, soil, etc.). Then, we use the data provided by the gps and the ahrs to extract an initial guess of the relative displacement and rotation between grid maps to match. Hence, we compute a dense set of pointtopoint correspondences between matched maps, exploiting a modified, stateoftheart ldof system
[22], tailored to the precision agriculture context. To adapt this algorithm to our environment representation, we propose to use a different cost function that involves both the ExG information and the local structure geometry around each cell. We select, using a voting scheme, the bigger subset of correspondences with coherent, similar flows, to be used to infer a preliminary alignment transformation between the maps. In order to deal with directional scale errors, we use a nonrigid pointset registration algorithm to estimate an affine transformation. The final registration is obtained by performing a robust pointtopoint registration over the input point clouds, pruned from all points that do not belong to vegetation. A schematic overview of the proposed approach is depicted in Fig. 3.We report results from an exhaustive set of experiments (Sec. V
) on data acquired by a uav and a handheld camera, simulating the ugv, on crop fields in Eschikon, Switzerland. We show that the proposed approach is able to guarantee with a high probability a correct registration for an initial translational error up to 5 meters, an initial heading misalignment up to 11.5 degrees, and a directional scale error of up to 30%. We found similar registration performance across fields with three different crop species, showing that the method generalizes well to across different kinds of farms. We also report a comparison with stateoftheart pointtopoint registration and matching algorithms, showing that our approach outperforms them in all the experiments.
Ia Related Work
The field of multirobot cooperative mapping is a recurrent and relevant problem in literature and, as previously introduced, several solutions have been presented by means of either multirobot slam algorithms or map merging/map registration strategies, in both 2D ([3, 4, 30]) and 3D ([5, 14, 24]) settings. Registration of point cloud based maps can also be considered as an instance of the more general point set registration problem [9, 12]. In this work, we mainly review methods based on map registration, since the heterogeneity of the involved robots and the lack of distinctive visual and geometrical features on an agricultural environment prevent the employment of standard multirobot slam methods; a comprehensive literature review about this class of methods can be found [31].
Map registration is a challenging problem especially when dealing with heterogeneous robots, where data is gathered from different pointsofview and with different noise characteristics. It has been intensively investigated, especially in the context of urban reconstruction with aerial and ground data. In [32], the authors focus on the problem of georegistering groundbased multiview stereo models by proposing a novel viewpointdependent matching method. Wang et al. [35] deal with aligning 3D structurefrommotion point clouds obtained from internet imagery with existing geographic information sources, such as noisy geotags from input Flickr photos and geotagged city models and images collected from Google Street View and Google Earth. BódisSzomorú et al. [6] propose to merge low detailed airborne point clouds with incomplete streetside point clouds by applying volumetric fusion based on a 3D tetrahedralization (3DT). Früh et al. [15] propose to use dsms obtained from a laser airborne reconstruction to localize a ground vehicle equipped with 2D laser scanners and a digital camera, detailed groundbased facade models are hence merged with a complementary airborne model. Michael et al. [33] propose a collaborative uavugv mapping approach in earthquakedamaged contexts. They merge the point clouds generated by the two robots using a 3D icp (icp) algorithm, with an initial guess provided by the (known) uav takeoff location; the authors make the assumption that the environment is generally described by flat planes and vertical walls: also called the “Manhattan world” assumption. The icp algorithm has also been exploited in [13] and [19]. Forster et al. [13] align dense 3D maps obtained by a ugv equipped with an RGBD camera and by a uav running dense monocular reconstruction: they obtain the initial guess alignment between the maps by localizing the uav with respect to the ugv with a Monte Carlo Localization method applied to heightmaps computed by the two robots. Hinzmann et al. [19] deal with the registration of dense lidarbased point clouds with sparse imagebased point clouds by proposing a probabilistic data association approach that specifically takes the individual cloud densities into consideration. In [16], Gawel et al. present a registration procedure for matching lidar pointcloud maps and sparse vision keypoint maps by using structural descriptors.
Despite the extensive literature addressing the problem of map registration for heterogeneous robots, most of the proposed methods make strong contextbased assumptions, such as the presence of structural or visual landmarks, “Manhattan world” assumptions, etc. Registering 3D maps in an agricultural setting, in some respects, is even more challenging: the environment is homogeneous, poorly structured and it usually gives rise to strong sensor aliasing. For these reasons, most of the approaches mentioned above cannot directly be applied to an agricultural scenario. Localization and mapping in an agricultural scenario is a topic that is recently gathering great attention in the robotics community [36, 11, 23].
Most of these systems, however, deal with a single robot, and the problem of fusing maps built from multiple robots is usually not adequately addressed and little, very recent research exists on this topic. Dong et al. [10] propose a spatiotemporal reconstruction framework for precision agriculture that aims to merge multiple 3D field reconstructions of the same field across time. They use single row reconstruction as a starting point for the data association, that is actually performed by using standard visual features. This method uses images acquired by a single ugv that moves in the same field at different times and, being based on visual features, cannot manage drastic viewpoint changes or large misalignments when matching aerial and ground maps. A local feature descriptor designed to deal with large viewpoint changes has been proposed by Chebrolu et al. in [8], where the almost static geometry of the crop arrangement in the field has been exploited to propose a descriptor that encodes the local plant arrangement geometry. Despite the promising results, this method suffers from the presence of occluded areas when switching from the uav to the ugv pointofviews.
IB Contributions
Our contributions are the following: (i) A map registration framework specifically designed for heterogeneous robots in an agricultural environment; (ii) To the best of our knowledge, we are the first to apply a ldof based 3D map alignment; (iii) Extensive performance evaluations that show the effectiveness of our approach; (iv) An opensource implementation of our method and three challenging datasets with different crop species with ground truth.
Ii Problem Statement and Assumptions
Given two 3D colored point clouds of a farmland and (Fig. 3, first column), built from data gathered from a uav and a ugv, respectively, our goal is to find a transformation that allows to accurately align them. and can be generated, for instance, by using an offtheshelf photogrammetrybased 3D reconstruction software applied to sequences of geotagged images. Our method makes the following assumptions:

The input maps built form uavs and ugvs data can have different spatial resolutions but they refer to the same field, with some overlap among them;

The data used to build the maps were acquired at approximately the same time;

The maps are roughly geotagged, possibly with noisy locations and orientations;

They can be affected by local inconsistencies, missing data, and deformations, such as directional scale errors.

is not affected by any scale inconsistencies.
Hypothesis 4) implies the violation of the typical rigidbody transformation assumption between the two maps: for this reason, we represent as an affine transformation that allows anisotropic (i.e., nonuniform) scaling between the maps. Hypothesis 5) is an acceptable assumption, since the map created by the uav is usually wider than , and generated by using less noisy GPS readings, so the scale drift effect tends to be canceled: hence, we look for a transformation that aligns with by correcting the scale errors of with respect to .
Iii Data Association
In order to estimate the transformation that aligns the two maps, we need to find a set of point correspondences, between and , that represent points pairs belonging to the same global 3D position. As introduced before and shown in the experiments (see Sec. V), conventional sparse matching approaches based on local descriptors are unlikely to provide effective results due to the big amount of repetitive and nondistinctive patterns spread over farmlands. Instead, inspired by the fact that when the maps are misaligned, points in locally share a coherent ”flow“ towards corresponding points in , our method casts the data association estimation problem as a dense, regularized, matching approach. This problem resembles the dense optical flow estimation problem for RGB images: in this context, global methods (e.g., [20]) aim to build correspondences pixel by pixel between a pair of images by minimizing a cost function that, for each pixel, involves a data term that measures the pointwise similarity and a regularization term that fosters smoothness between nearby flows (i.e., nearby pixel to pixel associations).
Iiia Multimodal Grid Map
Our goal is to estimate by computing a ”dense flow“ that, given an initial, noisy alignment between the maps provided by a gps and a ahrs (Fig. 3, second column), associates points in with points in . Unfortunately, conventional methods designed for RGB images are not directly applicable to colored point clouds: we introduce here a multimodal environment representation that allows to exploit such methods while enhancing both the semantic and the geometrical properties of the target map. A cultivated field is basically a globally flat surface populated by plants. A dsm^{3}^{3}3A dsm is a raster representations of the height of the objects on a surface. can well approximate the field structure geometry, while a vegetation index can highlight the meaningful parts of the field and the visual relevant patterns: in our environment representation, we exploit both these intuitions. We generate a dsm from the point cloud; for each cell of the dsm grid, we also provide an exg index that, starting from the RGB values, highlights the amount of vegetation. More specifically, we transform a colored point cloud into a two dimensional grid map (Fig. 3, third column), where for each cell we provide the surface height and the exg index, with the following procedure:

We select a rectangle that bounds the target area by means of minimummaximum latitude and longitude;

The selected area is discretized into a grid map of cells, by using a step of meters. In practice, each of the cells represents a square of meters. Each cell is initialized with pairs.

Remembering that is geotagged (see Sec. II), we can associate each 3D point of to one cell of .

For each cell with associated at least one 3D point: (a) We compute the height as the weighted average of the coordinates of the 3D points that belong to such cell; (b) We compute the the exg index as the weighted average of the the exg indexes of the 3D points that belong to such cell, where for each point we have:
(1) with , and
the RGB components of the point. Both the averages use as weighting factor a circular, bivariate Gaussian distribution with standard deviation
: points with coordinates close to center of the cell get a higher weight.
IiiB Multimodal ldof
We generate from both the and the corresponding multimodal representations and . In the ideal case, with perfect geotags and no map deformations, a simple geotagged superimposition of the two maps should provide a perfect alignment: the ”flow“ that associates cells between the two maps should be zero. Unfortunately, in the real case, due to the inaccuracies of both the geotags and the 3D reconstruction, non zero, potentially large displacements are introduced in the associations. These offsets are locally consistent but not constant for each cell, due to the reconstruction errors. To estimate the offsets map, we employ a modified version of the cpm (cpm) framework described in [22]. cpm is a recent ldof system that provides cutting edge estimation results even in presence of very large displacements, and is more efficient than other stateoftheart methods with similar accuracy.
For efficiency, cpm looks for the best correspondences of some seeds that are refined by means of a dense, iterative neighborhood propagation: the seeds are a set of points regularly distributed within the image. Given two images and a collection of seeds at position , the goal of this framework is to determine the flow of each seed , where is the corresponding matching position in for the seed in . The flow computation for each seed is performed by a coarsetofine random search strategy by minimizing the cost function:
(2) 
where denotes the match cost between the patch centered at in and the patch centered in in . For a comprehensive description of the flow estimation pipeline, we refer the reader to [22].
Our goal is to use the cpm algorithm to compute the flow between and . To exploit the full information provided by our grid maps (see Sec. IIIA), we modified the cpm matching cost in order to take into account both the height and exg channels. We split the cost function in two terms:
(3) 
is the DAISY [34] based match cost as in the original cpm algorithm: in our case the DAISY descriptors have been computed from the exg channel of and . is a match cost computed using the height channel.
We chose the fpfh (fpfh) [29] descriptor for this second term: the fpfh descriptors are robust multidimensional features
which describe the local geometry of a point cloud, in our case they are computed from the organized point cloud^{4}^{4}4An organized point cloud is a cloud that resembles a matrix like structure. generated from the height channel of and . The parameters and are the weighting factors of the two terms. As in [22], the patchbased matching cost is chosen to be the sum of the absolute difference over all the 128 and 32 dimensions of the DAISY and fpfh flows, respectively, at the matching points. The proposed cost function takes into account both the visual appearance and the local 3D structure of the plants.
Once we have computed the dense flow between and (Fig. 3, fourth column), we extract the largest set of coherent flows by employing a voting scheme inspired by the classical Hough transform with discretization step ; these flows define a set of pointtopoint matches that will be used to infer a preliminary alignment (Fig. 3, fifth column).
Iv NonRigid Registration
The estimation of the nonrigid transformation between the maps is addressed in two steps. A preliminary affine transformation is computed by solving a nonrigid registration problem with known pointtopoint correspondences. We compute by solving an optimization problem with cost function the sum of the squared distances between corresponding points (Fig. 3, sixth column):
(4) 
with , the cardinality of , and the rotation matrix and the translation vector, and is a scaling vector. To estimate the final registration, we firstly select from the input colored point clouds and two subsets, and , that includes only points that belong to vegetation. The selection is performed by using an exg based thresholding operator over and . This operation enhances the morphological information of the vegetation, while reducing the size of the point clouds to be registered. We finally estimate the target affine transformation by exploiting the cpd (cpd) [27] point set registration algorithm over the point clouds and , using as initial guess transformation.
V Experiments
In order to analyze the performance of our system, we acquired datasets on fields of 3 different crop types in Eschikon (Switzerland)  soybean, sugarbeet, and winter wheat. For each crop species we collected: (i) one sequence of GPSIMU tagged images over the entire field from a uav flying at 10 meters altitude; (ii) 46 sequences of GPS/IMUtagged images of small portions of the field from a ugv pointofview. Additionally, for the sugarbeet field, we acquired an additional aerial sequence of images from 20 meters altitude. More comprehensive details regarding the acquired datasets are reported in Table II.
The uav datasets were acquired using a DJI Mavic Pro uav equipped with a 12 MP color camera, while the ugv datasets were acquired moving the same camera by hand with a forwardlooking pointofview, simulating data acquisition by a ground robot. The collected images are first converted into 3D colored point clouds using Pix4D Mapper [1], a professional photogrammetry software suite, which are then aligned using the proposed registration approach. To analyze the performance of the proposed approach, we make use of the following error metrics:
(5)  
(6) 
where stands for the elementwise division operator and are, respectively, the translational, the rotational, and the scale error metrics. We report the AgriColMap related parameters we used in all the experiments in Tab. I.
Parameter  
Value 
Crop Type  Name  # Images  Crop Size (avg.)  Global Scale Error  Recording Height (approx.) 
Soybean  sugv A  16  6 cm  1 m  
sugv B  19  6 cm  1 m  
sugv C  22  6 cm  1 m  
suav  89  6 cm  10 m  
Sugar Beet  sbugv A  25  5 cm  1 m  
sbugv B  26  5 cm  1 m  
sbugv C  27  5 cm  1 m  
sbuav A  213  5 cm  10 m  
sbuav B  96  5 cm  20 m  
Winter Wheat  wwugv A  59  25 cm  1 m  
wwugv B  61  25 cm  1 m  
wwuav  108  25 cm  10 m  
Va Performance Under Noisy Initial Guess
This experiment is designed to show the robustness of the proposed approach under different noise conditions affecting the initial guess, and different directional scale discrepancies. For each ugv point cloud, we estimate an accurate ground truth nonrigid transform by manually selecting the correct pointtopoint correspondences with the related uav cloud. We generate random initial alignments between maps by manually adding noise, with different orders of magnitude, to the ground truth alignment heading, translation, and scale. Then, we align the clouds with the sampled initial alignments by using (i) the proposed approach, (ii) a nonrigid standard icp, (iii) the cpd (cpd) method [27], (iv) a stateofthe art goicp (goicp) [37], and with standard sparse visual feature matching approaches [2, 28, 7], applied as a data association frontend to our method in place of the proposed ldof based data association (Sec. IIIB): in the last cases, we exploit only the exg channel of the grid maps (Sec. IIIA). An alignment is considered valid if: , , and .
crop type  approach  registration err. (trans/ros/scale) scale error 0%  registration err. (trans/ros/scale) scale error 5%  registration err. (trans/ros/scale) scale error 10%  registration err. (trans/ros/scale) scale error 15%  registration err. (trans/ros/scale) scale error 20%  registration err. (trans/ros/scale) scale error 25%  registration err. (trans/ros/scale) scale error 30% 
Soybean  AgriColMap  
icp  fail  fail  fail  fail  
cpd [27]  fail  fail  fail  
goicp [37]              
SURF [2]  fail  fail  fail  fail  
[28]  fail  fail  fail  fail  
FAST+BRIEF [7]  fail  fail  fail  fail  
Sugabeet 10m  AgriColMap  
icp  fail  fail  fail  fail  
cpd [27]  fail  fail  fail  
goicp [37]              
SURF [2]  fail  fail  fail  fail  
ORB [28]  fail  fail  fail  fail  
FAST+BRIEF [7]  fail  fail  fail  fail  
Sugabeet 20m  AgriColMap  
icp  fail  fail  fail  fail  fail  
cpd [27]  fail  fail  fail  
goicp [37]              
SURF [2]  fail  fail  fail  fail  
ORB [28]  fail  fail  fail  fail  
FAST+BRIEF [7]  fail  fail  fail  fail  
Winter Wheat  AgriColMap  
icp  fail  fail  fail  fail  
cpd [27]  fail  fail  fail  
goicp [37]              
SURF [2]  fail  fail  fail  fail  
ORB [28]  fail  fail  fail  fail  
FAST+BRIEF [7]  fail  fail  fail  fail 
The results are illustrated in Fig. 4. The proposed approach significantly outperforms the other approaches, ensuring an almost success registration rate up to a scale error of , and a high probability of succeeding even with a scale error. The ICPbased registration methods [27, 37], due to the absence of structural 3D features on the fields, fall into local minima with high probability. The closest methods, in terms of robustness, are based on local feature matching [2, 28, 7], succeeding in the registration procedure up to a scale error magnitude of . While analyzing the results, however, we verified that, unlike our method, these methods provide a larger number of wrong, incoherent point associations, and such a problem is clearly highlighted for increasing scale deformations above 20% and rotations above 0.1 radians. The superior robustness is also confirmed for noisy initial guesses: unlike the other methods, our approach guarantees a high successful registration rate for a translational error up to meters, and an initial heading error up to degrees, enabling it to deal with most errors coming from a GPS or ahrs sensor. Our method generalizes well over the different datasets, showing the capability to deal with different crop species, crop growth stages (i.e., the winter wheat crop is in an advanced growth stage compared to the soybean and sugarbeet), soil conditions, and point cloud resolution (from different uav altitudes).
In Table IV, we report a comparison between the inliers percentages when using both visual (i.e., the exg) and geometric, or just a single term in the cost function of Eq. (3
). It is clear that most of the information is carried by the visual term. It is noteworthy that, even if the geometric term used alone is not able to provide valid results, when it is combined with the visual term the inliers percentage increases quite significantly, especially for the sugarbeet dataset, as compared to using solely the visual information. In such cases, the geometric term acts as an outlier rejection term, slightly improving the robustness properties of the registration procedure.
Descriptor Type (% inliers)  
Crop Type  ExG  Depth  ExG + Depth 
Soybean  
Sugarbeet  
Winter Wheat 
VB Accuracy Evaluation
The second experiment is designed to evaluate the accuracy of the proposed registration approach. To this end, we compare our results with the ground truth parameters and, by using all the successful registrations, we compute the average accuracy for each crop type and approach. The results are summarized in Tab. III, and are sorted in increasing order of initial scale error.
On average, our method results in a lower registration error as compared to all the other evaluated methods for the same scale error. The difference in the registration error is even more pronounced when comparing the Sugabeet 10m against Sugabeet 20m datasets. Indeed, due to the higher sparseness of the points in the latter, all the other methods tend to perform slightly worse than they perform with the Sugabeet 10m. Conversely, our method results in almost the same registration error magnitudes, showing that it correctly deals with different density of the initial colored point clouds. We also report some qualitative results in Fig. 5
VC Runtime Evaluation
We recorded the average, maximum, and minimum computational time for all tested methods over 100 successful registrations, reporting these values in Tab. V. The method requiring the biggest computational effort is goicp. The proposed approach requires half the computational time as compared to goicp, but turns out to be quite slow compared to the custombuilt icp, to the cpd, and to all the common sparse feature detection and matching approaches. Fig. 6 shows the runtime percentages for the proposed approach. The biggest component of the computational effort is required to extract the geometric features (i.e., the fpfh features), meaning that the total computational time might be reduced by switching to a less time consuming 3D feature or by solely using the visual term.
Vi Conclusions
In this paper we addressed the cooperative uavugv environment reconstruction problem in agricultural scenarios by proposing an effective way to align 3D maps acquired from aerial and ground robots. Our approach is built upon a multimodal environment representation that utilises the semantics and the geometry of the target field, and a data association strategy solved as a ldof problem, adapted to the agricultural context. We reported a comprehensive set of experiments, proving the superior robustness of our approach against other standard methods. An opensource implementation of our system and the acquired datasets are made publicly available with this paper.
Vii Acknowledgements
The authors would like to thank Hansueli Zellweger from the ETH Plant Research Station in Eschikon, Switzerland for preparing the fields, managing the plant lifecycle and treatments during the entire growing season. The authors would also like to thank Dr. Frank Liebisch from the Crop Science Group at ETH Zürich for helpful discussions.
References
 [1] Point clouds generated using Pix4Dmapper by Pix4D. [Online]. Available: http://www.pix4d.com/
 [2] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speededup robust features (SURF),” Comput. Vis. Image Underst., vol. 110, no. 3, pp. 346–359, 2008.
 [3] A. Birk and S. Carpin, “Merging occupancy grid maps from multiple robots,” Proceedings of the IEEE, vol. 94, no. 7, pp. 1384–1397, 2006.
 [4] J. L. Blanco, J. González, and J.A. FernándezMadrigal, “A robust, multihypothesis approach to matching occupancy grid maps,” Robotica, vol. 31, pp. 687–701, 2013.
 [5] T. M. Bonanni, B. Della Corte, and G. Grisetti, “3D map merging on pose graphs,” IEEE Robotics and Automation Letters (RAL), vol. 2, no. 2, pp. 1031–1038, 2017.

[6]
A. BódisSzomorú, H. Riemenschneider, and L. V. Gool, “Efficient volumetric
fusion of airborne and streetside data for urban reconstruction,” in
Proc. of the International Conference on Pattern Recognition (ICPR)
, 2016, pp. 3204–3209.  [7] M. Calonder, V. Lepetit, C. Strecha, and P. Fua, “Brief: Binary robust independent elementary features,” in Europ. Conf. on Computer Vision (ECCV), 2010, pp. 778–792.
 [8] N. Chebrolu, T. Läbe, and C. Stachniss, “Robust longterm registration of UAV images of crop fields for precision agriculture,” IEEE Robotics and Automation Letters (RAL), vol. 3, no. 4, pp. 3097–3104, 2018.
 [9] H. Chui and A. Rangarajan, “A new point matching algorithm for nonrigid registration,” Comput. Vis. Image Underst., vol. 89, no. 23, 2003.
 [10] J. Dong, J. G. Burnham, B. Boots, G. Rains, and F. Dellaert, “4D crop monitoring: Spatiotemporal reconstruction for agriculture,” in IEEE Intl. Conf. on Robotics & Automation (ICRA), 2017.
 [11] A. English, P. Ross, D. Ball, and P. Corke, “Vision based guidance for robot navigation in agriculture,” in IEEE Intl. Conf. on Robotics & Automation (ICRA), 2014.
 [12] A. W. Fitzgibbon, “Robust registration of 2D and 3D point sets,” in British Machine Vision Conference, 2001, pp. 662–670.
 [13] C. Forster, M. Pizzoli, and D. Scaramuzza, “Airground localization and map augmentation using monocular dense reconstruction,” in IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2013.
 [14] C. Frueh and A. Zakhor, “Constructing 3D city models by merging groundbased and airborne views,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2003.
 [15] C. Fruh and A. Zakhor, “Constructing 3D city models by merging aerial and ground views,” IEEE Computer Graphics and Applications, vol. 23, no. 6, pp. 52–61, 2003.
 [16] A. Gawel, T. Cieslewski, R. Dubé, M. Bosse, R. Siegwart, and J. Nieto, “Structurebased visionlaser matching,” in IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2016.
 [17] A. Gawel, R. Dubé, H. Surmann, J. Nieto, R. Siegwart, and C. Cadena, “3d registration of aerial and ground robots for disaster response: An evaluation of features, descriptors, and transformation estimation,” in Proc. of the IEEE SSRR, 2017.
 [18] A. Gil, Ó. Reinoso, M. Ballesta, and M. Juliá, “Multirobot visual SLAM using a raoblackwellized particle filter,” Robotics and Autonomous Systems, vol. 58, no. 1, pp. 68 – 80, 2010.
 [19] T. Hinzmann, T. Stastny, G. Conte, P. Doherty, P. Rudol, M. Wzorek, E. Galceran, R. Siegwart, and I. Gilitschenski, “Collaborative 3D reconstruction using heterogeneous UAVs: System and experiments,” in Proc. of the Intl. Sym. on Experimental Robotics (ISER), pp. 43–56.
 [20] B. K. P. Horn and B. G. Schunck, “Determining optical flow,” Artificial Intelligence, vol. 17, no. 13, pp. 185–203, 1981.
 [21] A. Howard, “Multirobot simultaneous localization and mapping using particle filters,” Intl. Journal of Robotics Research (IJRR), vol. 25, no. 12, pp. 1243–1256, 2006.
 [22] Y. Hu, R. Song, and Y. Li, “Efficient coarsetofine patch match for large displacement optical flow,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2016.
 [23] M. Imperoli, C. Potena, D. Nardi, G. Grisetti, and A. Pretto, “An effective multicue positioning system for agricultural robotics,” IEEE Robotics and Automation Letters (RAL), 2018.
 [24] J. Jessup, S. N. Givigi, and A. Beaulieu, “Robust and efficient multirobot 3D mapping with octree based occupancy grids,” in Proc. of the IEEE Intl. Conf. on Systems, Man, and Cybernetics (SMC), 2014.
 [25] R. Käslin, P. Fankhauser, E. Stumm, Z. Taylor, E. Mueggler, J. Delmerico, D. Scaramuzza, R. Siegwart, and M. Hutter, “Collaborative localization of aerial and ground robots through elevation maps,” in Proc. of the IEEE SSRR, 2016.
 [26] R. Khanna, M. Möller, J. Pfeifer, F. Liebisch, A. Walter, and R. Siegwart, “Beyond point clouds3d mapping and field parameter measurements using uavs,” in IEEE ETFA, 2015.
 [27] A. Myronenko and X. Song, “Point set registration: Coherent point drift,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 12, pp. 2262–2275, 2010.
 [28] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “Orb: An efficient alternative to sift or surf,” in IEEE Intl. Conf. on Computer Vision (ICCV), 2011, pp. 2564–2571.
 [29] R. B. Rusu, N. Blodow, and M. Beetz, “Fast point feature histograms (FPFH) for 3D registration,” in IEEE Intl. Conf. on Robotics & Automation (ICRA), 2009.
 [30] S. Saeedi, L. Paull, M. Trentini, and H. Li, “Multiple robot simultaneous localization and mapping,” in IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2011.
 [31] S. Saeedi, M. Trentini, M. Seto, and H. Li, “Multiplerobot simultaneous localization and mapping: A review,” Journal of Field Robotics (JFR), vol. 33, no. 1, pp. 3–46.
 [32] Q. Shan, C. Wu, B. Curless, Y. Furukawa, C. Hernandez, and S. M. Seitz, “Accurate georegistration by groundtoaerial image matching,” in 2nd Int. Conf. on 3D Vision, 2014.
 [33] N. M. et al., “Collaborative mapping of an earthquakedamaged building via ground and aerial robots,” Journal of Field Robotics (JFR), vol. 29, no. 5, pp. 832–841, Sept 2012.
 [34] E. Tola, V. Lepetit, and P. Fua, “Daisy: An efficient dense descriptor applied to widebaseline stereo,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 5, pp. 815–830, 2010.
 [35] C. Wang, K. Wilson, and N. Snavely, “Accurate georegistration of point clouds using geographic data,” in 2013 International Conference on 3D Vision  3DV 2013, 2013, pp. 33–40.
 [36] U. Weiss and P. Biber, “Plant detection and mapping for agricultural robots using a 3D lidar sensor,” Robotics and autonomous systems, vol. 59, no. 5, pp. 265–273, 2011.
 [37] J. Yang, H. Li, D. Campbell, and Y. Jia, “GoICP: A globally optimal solution to 3D ICP pointset registration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 11, pp. 2241–2254, 2016.